ViewVC Help
View File | Revision Log | Show Annotations | Download File | View Changeset | Root Listing
root/xmltv_parser/branches/linux/xmltv_parser/xmltv.dtd
Revision: 282
Committed: Wed Jul 29 19:09:14 2015 UTC (7 years, 10 months ago) by william
Content type: application/xml-dtd
File size: 23461 byte(s)
Log Message:

File Contents

# User Rev Content
1 william 21 <!-- DTD for TV listings
2    
3     This is a DTD to represent a TV listing. It doesn't explicitly group
4     programmes by day or by channel, instead broadcast time and channel
5     are attributes of the 'programme' element. Optionally, data about the
6     TV channels used can be stored in 'channel' elements.
7    
8     Data about a TV programme are stored in the subelements of element
9     'programme', but metadata such as when it will be broadcast are stored
10     as attributes.
11    
12     Many of the details have a 'lang' attribute so that you can
13     store them in multiple languages or have mixed languages in a single
14     listing. This 'lang' should be the two-letter code such as 'en' or
15     'fr_FR'. Or you can just leave it out and let your reader take a
16     guess.
17    
18     Unless otherwise specified, an element containing CDATA must have some
19     text if it is written.
20    
21     An example XML file for this DTD might look like this:
22    
23     <tv generator-info-name="my listings generator">
24     <channel id="3sat.de">
25     <display-name lang="de">3SAT</display-name>
26     </channel>
27     <channel id="das-erste.de">
28     <display-name lang="de">ARD</display-name>
29     <display-name lang="de">Das Erste</display-name>
30     </channel>
31    
32     <programme start="200006031633" channel="3sat.de">
33     <title lang="de">blah</title>
34     <title lang="en">blah</title>
35     <desc lang="de">
36     Blah Blah Blah.
37     </desc>
38     <credits>
39     <director>blah</director>
40     <actor>a</actor>
41     <actor>b</actor>
42     </credits>
43     <date>19901011</date>
44     <country>ES</country>
45     <episode-num system="xmltv_ns">2 . 9 . 0/1</episode-num>
46     <video>
47     <aspect>16:9</aspect>
48     </video>
49     <rating system="MPAA">
50     <value>PG</value>
51     <icon src="pg_symbol.png" />
52     </rating>
53     <star-rating>
54     <value>3/3</value>
55     </star-rating>
56     </programme>
57     <programme> ... </programme>
58     ...
59     </tv>
60    
61     This describes two channels and then a programme broadcast on one of
62     the channels, then some more programmes. Almost everything in the DTD
63     is optional, so you can write files which are much simpler than this
64     example.
65    
66     All dates and times in this DTD follow the same format, loosely based
67     on ISO 8601. They can be 'YYYYMMDDhhmmss' or some initial
68     substring, for example if you only know the year and month you can
69     have 'YYYYMM'. You can also append a timezone to the end; if no
70     explicit timezone is given, UTC is assumed. Examples:
71     '200007281733 BST', '200209', '19880523083000 +0300'. (BST == +0100.)
72    
73     Unless specified otherwise, textual element content may not contain
74     newlines - this is to make it easy to convert into line-oriented
75     formats, and to avoid the question of what exactly a newline would
76     mean in the middle of someone's name or whatever. Leading and
77     trailing whitespace in element content is not significant.
78    
79     At present versions of this DTD correspond to releases of the 'xmltv'
80     package, which is a set of programs to generate and manipulate files
81     conforming to this DTD. Written by Ed Avis (ed@membled.com) and
82     Gottfried Szing, thanks to others for suggestions.
83    
84     $Id: xmltv.dtd,v 1.44 2010/04/10 13:11:06 knowledgejunkie Exp $
85    
86     -->
87    
88     <!-- The root element, tv.
89    
90     Date should be the date when the listings were originally produced in
91     whatever format; if you're converting data from another source, then
92     use the date given by that source. The date when the conversion
93     itself was done is not important.
94    
95     To indicate the source of the listings, there are three attributes you
96     can define:
97    
98     'source-info-url' is a URL describing the data source in
99     some human-readable form. So if you are getting your listings from
100     SAT.1, you might set this to the URL of a page explaining how to
101     subscribe to their feed. If you are getting them from a website, the
102     URL might be the index of the site or at least of the TV listings
103     section.
104    
105     'source-info-name' is the link text for that URL; it should
106     generally be the human-readable name of your listings supplier.
107     Sometimes the link text might be printed without the link itself, in
108     hardcopy listings for example.
109    
110     'source-data-url' is where the actual data is grabbed from. This
111     should link directly to the machine-readable data files if possible,
112     but it's not rigorously defined what 'actual data' means. If you are
113     parsing the data from human-readable pages, then it's more appropriate
114     to link to them with the source-info stuff and omit this attribute.
115    
116     To publicize your wonderful program which generated this file, you can
117     use 'generator-info-name' (preferably in the form 'progname/version')
118     and 'generator-info-url' (a link to more info about the program).
119     -->
120     <!ELEMENT tv (channel*, programme*)>
121     <!ATTLIST tv date CDATA #IMPLIED
122     source-info-url CDATA #IMPLIED
123     source-info-name CDATA #IMPLIED
124     source-data-url CDATA #IMPLIED
125     generator-info-name CDATA #IMPLIED
126     generator-info-url CDATA #IMPLIED >
127    
128     <!-- channel - details of a channel
129    
130     Each 'programme' element (see below) should have an attribute
131     'channel' giving the channel on which it is broadcast. If you want to
132     provide more detail about channels, you can give some 'channel'
133     elements before listing the programmes. The 'id' attribute of the
134     channel should match what is given in the 'channel' attribute of the
135     programme.
136    
137     Typically, all the channels used in a particular TV listing will be
138     included and then the programmes using those channels. But it's
139     entirely optional to include channel details - you can just leave out
140     channel elements or provide only some of them. It is also okay to
141     give just channels and no programmes, if you just want to describe
142     what TV channels are available in a certain area.
143    
144     Each channel has one id attribute, which must be unique and should
145     preferably be in the form suggested by RFC2838 (the 'broadcast'
146     element of the grammar in that RFC, in other words, a DNS-like name
147     but without any URI scheme). Then one or more display names which are
148     shown to the user. You might want a different display name for
149     different languages, but also you can have more than one name for the
150     same language. Names listed earlier are considered 'more canonical'.
151    
152     Since the display name is just there as a way for humans to refer to
153     the channel, it's acceptable to just put the channel number if it's
154     fairly universal among viewers of the channel. But remember that this
155     isn't an official statement of what channel number has been
156     allocated, and the same number might be used for a different channel
157     somewhere else.
158    
159     The ordering of channel elements makes no difference to the meaning of
160     the file, since they are looked up by id and not by their position.
161     However it makes things like diffing easier if you write the channel
162     elements sorted by ASCII order of their ids.
163     -->
164     <!ELEMENT channel (display-name+, icon*, url*) >
165     <!ATTLIST channel id CDATA #REQUIRED >
166    
167     <!-- A user-friendly name for the channel - maybe even a channel
168     number. List the most canonical / common ones first and the most
169     obscure names last. The lang attribute follows RFC 1766.
170     -->
171     <!ELEMENT display-name (#PCDATA)>
172     <!ATTLIST display-name lang CDATA #IMPLIED>
173    
174     <!-- A URL where you can find out more about the element that contains
175     it (programme or channel). This might be the official site, or a fan
176     page, whatever you like really.
177    
178     If multiple url elements are given, the most authoritative or official
179     (which might conflict...) sites should be listed first.
180     -->
181     <!ELEMENT url (#PCDATA)>
182    
183     <!-- programme - details of a single programme transmission
184    
185     A show will be exactly the same whether it is broadcast at 18:00 or
186     19:00, and on whichever channel. Technical details like broadcast
187     time don't affect the content of the programme itself, so they are
188     included as attributes of this element. Start time and channel are
189     the two that you must include.
190    
191     Sometimes VCR programming systems like PDC or VPS have their own
192     notion of 'start time' which is different from the actual start time,
193     so there are attributes for that. In practice, stop time will usually
194     be the start time of the next programme, but if you can get it more
195     accurate, good for you. Similarly, you can specify a code for
196     Gemstar's Showview or VideoPlus programming systems.
197    
198     TV listings sometimes have the problem of listing two or more
199     programmes in the same timeslot, such as 'News; Weather'. We call
200     this a 'clump' of programmes, and the 'clumpidx' attribute
201     differentiates between two programmes sharing the same timeslot and
202     channel. In this case News would have clumpidx="0/2" and Weather
203     would have clumpidx="1/2". If you don't have this problem, be
204     thankful!
205    
206     It's intended that start time and stop time, when both are present,
207     make a half-closed interval: a programme is considered to be
208     broadcasting _at_ its start time but to stop just before its stop
209     time. In this way a programme from 11:00 to 12:00 does not overlap
210     with another programme from 12:00 to 13:00, not even for a moment.
211     Nor is there any gap between the two.
212    
213     To do: Some means of indicating breaks between programmes on the same
214     channel. The 'channel' attribute references the 'id' of a channel
215     element, but the DTD doesn't give a way to specify this constraint.
216     Perhaps there is some better XML syntax we could use for that.
217     -->
218     <!ELEMENT programme (title+, sub-title*, desc*, credits?, date?,
219     category*, language?, orig-language?, length?,
220     icon*, url*, country*, episode-num*, video?, audio?,
221     previously-shown?, premiere?, last-chance?, new?,
222     subtitles*, rating*, star-rating*, review* )>
223     <!ATTLIST programme start CDATA #REQUIRED
224     stop CDATA #IMPLIED
225     pdc-start CDATA #IMPLIED
226     vps-start CDATA #IMPLIED
227     showview CDATA #IMPLIED
228     videoplus CDATA #IMPLIED
229     channel CDATA #REQUIRED
230     clumpidx CDATA "0/1" >
231    
232     <!-- Programme title, eg 'The Simpsons'. -->
233     <!ELEMENT title (#PCDATA)>
234     <!ATTLIST title lang CDATA #IMPLIED>
235    
236     <!-- Sub-title or episode title, eg 'Datalore'. Should probably be
237     called 'secondary title' to avoid confusion with captioning!
238     -->
239     <!ELEMENT sub-title (#PCDATA)>
240     <!ATTLIST sub-title lang CDATA #IMPLIED>
241    
242     <!-- Description of the programme or episode.
243    
244     Unlike other elements, long bits of whitespace here are treated as
245     equivalent to a single space and newlines are permitted, so you can
246     break lines and write a pretty-looking paragraph if you wish.
247     -->
248     <!ELEMENT desc (#PCDATA)>
249     <!ATTLIST desc lang CDATA #IMPLIED>
250    
251     <!-- Credits for the programme.
252    
253     People are listed in decreasing order of importance; so for example
254     the starring actors appear first followed by the smaller parts. As
255     with other parts of this file format, not mentioning a particular
256     actor (for example) does not imply that he _didn't_ star in the film -
257     so normally you'd list only the few most important people.
258    
259     Adapter can be either somebody who adapted a work for television, or
260     somebody who did the translation from another language. Maybe these
261     should be separate, but if so how would 'translator' fit in with the
262     'language' element?
263     -->
264     <!ELEMENT credits (director*, actor*, writer*, adapter*, producer*,
265     composer*, editor*, presenter*, commentator*,
266     guest* )>
267     <!ELEMENT director (#PCDATA)>
268     <!ELEMENT actor (#PCDATA)>
269     <!ATTLIST actor role CDATA #IMPLIED>
270     <!ELEMENT writer (#PCDATA)>
271     <!ELEMENT adapter (#PCDATA)>
272     <!ELEMENT producer (#PCDATA)>
273     <!ELEMENT composer (#PCDATA)>
274     <!ELEMENT editor (#PCDATA)>
275     <!ELEMENT presenter (#PCDATA)>
276     <!ELEMENT commentator (#PCDATA)>
277     <!ELEMENT guest (#PCDATA)>
278    
279    
280     <!-- The date the programme or film was finished. This will probably
281     be the same as the copyright date.
282     -->
283     <!ELEMENT date (#PCDATA)>
284    
285     <!-- Type of programme, eg 'soap', 'comedy' or whatever the
286     equivalents are in your language. There's no predefined set of
287     categories and it's okay for a programme to belong to several.
288     -->
289     <!ELEMENT category (#PCDATA)>
290     <!ATTLIST category lang CDATA #IMPLIED>
291    
292     <!-- The language the programme will be broadcast in. This does not
293     include the language of any subtitles, but it is affected by dubbing
294     into a different language. For example, if a French film is dubbed
295     into English, language=en and orig-language=fr.
296    
297     There are two ways to specify the language. You can use the
298     two-letter codes such as en or fr, or you can give a name such as
299     'English' or 'Deutsch'. In the latter case you might want to use the
300     'lang' attribute, for example
301    
302     <language lang="fr">Allemand</language>
303     -->
304     <!ELEMENT language (#PCDATA)>
305     <!ATTLIST language lang CDATA #IMPLIED>
306    
307     <!-- The original language, before dubbing. The same remarks as for
308     'language' apply.
309     -->
310     <!ELEMENT orig-language (#PCDATA)>
311     <!ATTLIST orig-language lang CDATA #IMPLIED>
312    
313     <!-- The true length of the programme, not counting advertisements or
314     trailers. But this does take account of any bits which were cut out
315     of the broadcast version - eg if a two hour film is cut to 110 minutes
316     and then padded with 20 minutes of advertising, length will be 110
317     minutes even though end time minus start time is 130 minutes.
318     -->
319     <!ELEMENT length (#PCDATA)>
320     <!ATTLIST length units (seconds | minutes | hours) #REQUIRED>
321    
322     <!-- An icon associated with the element that contains it.
323     src: uri of image
324     width, height: (optional) dimensions of image
325    
326     These dimensions are pixel dimensions for the time being, eventually
327     this will change to be more like HTML's 'img'.
328     -->
329     <!ELEMENT icon EMPTY>
330     <!ATTLIST icon src CDATA #REQUIRED
331     width CDATA #IMPLIED
332     height CDATA #IMPLIED>
333    
334     <!-- The value of the element that contains it. This is for elements
335     that can have both a textual 'value' and an icon. At present there is
336     no 'lang' attribute here because things like 'PG' are not translatable
337     (although a document explaining what 'PG' actually means would be).
338     It happens that 'value' is used only for this sort of thing.
339     -->
340     <!ELEMENT value (#PCDATA)>
341    
342     <!-- A country where the programme was made or one of the countries in
343     a joint production. You can give the name of a country, in which case
344     you might want to specify the language in which this name is written,
345     or you can give a two-letter uppercase country code, in which case the
346     lang attribute should not be given. For example,
347    
348     <country lang="en">Italy</country>
349     <country>GB</country>
350     -->
351     <!ELEMENT country (#PCDATA)>
352     <!ATTLIST country lang CDATA #IMPLIED>
353    
354     <!-- Episode number
355    
356     Not the title of the episode, its number or ID. There are several
357     ways of numbering episodes, so the 'system' attribute lets you specify
358     which you mean.
359    
360     There are two predefined numbering systems, 'xmltv_ns' and
361     'onscreen'.
362    
363     xmltv_ns: This is intended to be a general way to number episodes and
364     parts of multi-part episodes. It is three numbers separated by dots,
365     the first is the series or season, the second the episode number
366     within that series, and the third the part number, if the programme is
367     part of a two-parter. All these numbers are indexed from zero, and
368     they can be given in the form 'X/Y' to show series X out of Y series
369     made, or episode X out of Y episodes in this series, or part X of a
370     Y-part episode. If any of these aren't known they can be omitted.
371     You can put spaces whereever you like to make things easier to read.
372    
373     (NB 'part number' is not used when a whole programme is split in two
374     for purely scheduling reasons; it's intended for cases where there
375     really is a 'Part One' and 'Part Two'. The format doesn't currently
376     have a way to represent a whole programme that happens to be split
377     across two or more timeslots.)
378    
379     Some examples will make things clearer. The first episode of the
380     second series is '1.0.0/1' . If it were a two-part episode, then the
381     first half would be '1.0.0/2' and the second half '1.0.1/2'. If you
382     know that an episode is from the first season, but you don't know
383     which episode it is or whether it is part of a multiparter, you could
384     give the episode-num as '0..'. Here the second and third numbers have
385     been omitted. If you know that this is the first part of a three-part
386     episode, which is the last episode of the first series of thirteen,
387     its number would be '0 . 12/13 . 0/3'. The series number is just '0'
388     because you don't know how many series there are in total - perhaps
389     the show is still being made!
390    
391     The other predefined system, onscreen, is to simply copy what the
392     programme makers write in the credits - 'Episode #FFEE' would
393     translate to '#FFEE'.
394    
395     You are encouraged to use one of these two if possible; if xmltv_ns is
396     not general enough for your needs, let me know. But if you want, you
397     can use your own system and give the 'system' attribute as a URL
398     describing the system you use.
399     -->
400     <!ELEMENT episode-num (#PCDATA)>
401     <!ATTLIST episode-num system CDATA "onscreen">
402    
403     <!-- Video details: the subelements describe the picture quality as
404     follows:
405    
406     present: whether this programme has a picture (no, in the
407     case of radio stations broadcast on TV or 'Blue'), legal values are
408     'yes' or 'no'. Obviously if the value is 'no', the other elements are
409     meaningless.
410    
411     colour: 'yes' for colour, 'no' for black-and-white.
412    
413     aspect: The horizontal:vertical aspect ratio, eg '4:3' or '16:9'.
414    
415     quality: information on the quality, eg 'HDTV', '800x600'.
416    
417     -->
418     <!ELEMENT video (present?, colour?, aspect?, quality?)>
419     <!ELEMENT present (#PCDATA)>
420     <!ELEMENT colour (#PCDATA)>
421     <!ELEMENT aspect (#PCDATA)>
422     <!ELEMENT quality (#PCDATA)>
423    
424     <!-- Audio details, similar to video details above.
425    
426     present: whether this programme has any sound at all, 'yes' or 'no'.
427    
428     stereo: Description of the stereo-ness of the sound. Legal values
429     are currently 'mono','stereo','dolby','dolby digital','bilingual'
430     and 'surround'. 'bilingual' in this case refers to a single audio
431     stream where the left and right channels contain monophonic audio
432     in different languages. Other values may be added later.
433    
434     -->
435     <!ELEMENT audio (present?, stereo?)>
436     <!ELEMENT stereo (#PCDATA)>
437    
438     <!-- When and where the programme was last shown, if known. Normally
439     in TV listings 'repeat' means 'previously shown on this channel', but
440     if you don't know what channel the old screening was on (but do know
441     that it happened) then you can omit the 'channel' attribute.
442     Similarly you can omit the 'start' attribute if you don't know when
443     the previous transmission was (though you can of course give just the
444     year, etc.).
445    
446     The absence of this element does not say for certain that the
447     programme is brand new and has never been screened anywhere before.
448     -->
449     <!ELEMENT previously-shown EMPTY>
450     <!ATTLIST previously-shown start CDATA #IMPLIED
451     channel CDATA #IMPLIED >
452    
453     <!-- 'Premiere'. Different channels have different meanings for this
454     word - sometimes it means a film has never before been seen on TV in
455     that country, but other channels use it to mean 'the first showing of
456     this film on our channel in the current run'. It might have been
457     shown before, but now they have paid for another set of showings,
458     which makes the first in that set count as a premiere!
459    
460     So this element doesn't have a clear meaning, just use it to represent
461     where 'premiere' would appear in a printed TV listing. You can use
462     the content of the element to explain exactly what is meant, for
463     example:
464    
465     <premiere lang="en">
466     First showing on national terrestrial TV
467     </premiere>
468    
469     The textual content is a 'paragraph' as for <desc>. If you don't want
470     to give an explanation, just write empty content:
471    
472     <premiere />
473     -->
474     <!ELEMENT premiere (#PCDATA)>
475     <!ATTLIST premiere lang CDATA #IMPLIED>
476    
477     <!-- Last-chance. In a way this is the opposite of premiere. Some
478     channels buy the rights to show a movie a certain number of times, and
479     the first may be flagged 'premiere', the last as 'last showing'.
480    
481     For symmetry with premiere, you may use the element content to give a
482     'paragraph' describing exactly what is meant - it's unlikely to be the
483     last showing ever! Otherwise, explicitly put empty content:
484    
485     <last-chance />
486     -->
487     <!ELEMENT last-chance (#PCDATA)>
488     <!ATTLIST last-chance lang CDATA #IMPLIED>
489    
490     <!-- New. This is the first screened programme from a new show that
491     has never been shown on television before - if not worldwide then at
492     least never before in this country. After the first episode or
493     programme has been shown, subsequent ones are no longer 'new'.
494     Similarly the second series of an established programme is not 'new'.
495    
496     Note that this does not mean 'new season' or 'new episode' of an
497     existing show. You can express part of that using the episode-num
498     stuff.
499     -->
500     <!ELEMENT new EMPTY>
501    
502     <!-- Subtitles. These can be either 'teletext' (sent digitally, and
503     displayed at the viewer's request), 'onscreen' (superimposed on the
504     picture and impossible to get rid of), or 'deaf-signed' (in-vision
505     signing for users of sign language). You can have multiple subtitle
506     streams to handle different languages. Language for subtitles is
507     specified in the same way as for programmes.
508     -->
509     <!ELEMENT subtitles (language?)>
510     <!ATTLIST subtitles type (teletext | onscreen | deaf-signed) #IMPLIED>
511    
512     <!-- Rating. Various bodies decide on classifications for films -
513     usually a minimum age you must be to see it. In principle the same
514     could be done for ordinary TV programmes. Because there are many
515     systems for doing this, you can also specify the rating system used
516     (which in practice is the same as the body which made the rating).
517     -->
518     <!ELEMENT rating (value, icon*)>
519     <!ATTLIST rating system CDATA #IMPLIED>
520    
521     <!-- 'Star rating' - many listings guides award a programme a score as
522     a quick guide to how good it is. The value of this element should be
523     'N / M', for example one star out of a possible five stars would be
524     '1 / 5'. Zero stars is also a possible score (and not the same as
525     'unrated'). You should try to map whatever wacky system your listings
526     source uses to a number of stars: so for example if they have thumbs
527     up, thumbs sideways and thumbs down, you could map that to two, one or
528     zero stars out of two. If a programme is marked as recommended in a
529     listings guide you could map this to '1 / 1'. Because there could be many
530     ways to provide star-ratings or recommendations for a programme, you can
531     specify multiple star-ratings. You can specify the star-rating system
532     used, or the provider of the recommendation, with the system attribute.
533     Whitespace between the numbers and slash is ignored.
534     -->
535    
536     <!ELEMENT star-rating (value, icon*)>
537     <!ATTLIST star-rating system CDATA #IMPLIED>
538    
539     <!-- Review. Listings guides may provide reviews of programmes in
540     addition to, or in place of, standard programme descriptions. They are
541     usually written by in-house reviewers, but reviews can also be made
542     available by third-party organisations/individuals. The value of this
543     element must be either the text of the review, or a URL that links to it.
544     Optional attributes giving the review source and the individual reviewer
545     can also be specified.
546     -->
547     <!ELEMENT review (#PCDATA)>
548     <!ATTLIST review type (text | url) #REQUIRED
549     source CDATA #IMPLIED
550     reviewer CDATA #IMPLIED>
551    
552     <!-- (Why are things like 'stereo', which must be one of a small
553     number of values, stored as the contents of elements rather than as
554     attributes? Because they are data rather than metadata. Attributes
555     are used for things like the language or encoding of element contents,
556     or for programme transmission details.) -->