source: trunk/poppler/expat-1.95.8/doc/xmlwf.1 @ 2

Last change on this file since 2 was 2, checked in by Eugene Romanenko, 15 years ago

First import

File size: 7.7 KB
Line 
1.\" This manpage has been automatically generated by docbook2man
2.\" from a DocBook document.  This tool can be found at:
3.\" <http://shell.ipoline.com/~elmert/comp/docbook2X/>
4.\" Please send any bug reports, improvements, comments, patches,
5.\" etc. to Steve Cheng <steve@ggi-project.org>.
6.TH "XMLWF" "1" "24 January 2003" "" ""
7.SH NAME
8xmlwf \- Determines if an XML document is well-formed
9.SH SYNOPSIS
10
11\fBxmlwf\fR [ \fB-s\fR]  [ \fB-n\fR]  [ \fB-p\fR]  [ \fB-x\fR]  [ \fB-e \fIencoding\fB\fR]  [ \fB-w\fR]  [ \fB-d \fIoutput-dir\fB\fR]  [ \fB-c\fR]  [ \fB-m\fR]  [ \fB-r\fR]  [ \fB-t\fR]  [ \fB-v\fR]  [ \fBfile ...\fR]
12
13.SH "DESCRIPTION"
14.PP
15\fBxmlwf\fR uses the Expat library to
16determine if an XML document is well-formed.  It is
17non-validating.
18.PP
19If you do not specify any files on the command-line, and you
20have a recent version of \fBxmlwf\fR, the
21input file will be read from standard input.
22.SH "WELL-FORMED DOCUMENTS"
23.PP
24A well-formed document must adhere to the
25following rules:
26.TP 0.2i
27\(bu
28The file begins with an XML declaration.  For instance,
29<?xml version="1.0" standalone="yes"?>.
30\fBNOTE:\fR
31\fBxmlwf\fR does not currently
32check for a valid XML declaration.
33.TP 0.2i
34\(bu
35Every start tag is either empty (<tag/>)
36or has a corresponding end tag.
37.TP 0.2i
38\(bu
39There is exactly one root element.  This element must contain
40all other elements in the document.  Only comments, white
41space, and processing instructions may come after the close
42of the root element.
43.TP 0.2i
44\(bu
45All elements nest properly.
46.TP 0.2i
47\(bu
48All attribute values are enclosed in quotes (either single
49or double).
50.PP
51If the document has a DTD, and it strictly complies with that
52DTD, then the document is also considered \fBvalid\fR.
53\fBxmlwf\fR is a non-validating parser --
54it does not check the DTD.  However, it does support
55external entities (see the \fB-x\fR option).
56.SH "OPTIONS"
57.PP
58When an option includes an argument, you may specify the argument either
59separately ("\fB-d\fR output") or concatenated with the
60option ("\fB-d\fRoutput").  \fBxmlwf\fR
61supports both.
62.TP
63\fB-c\fR
64If the input file is well-formed and \fBxmlwf\fR
65doesn't encounter any errors, the input file is simply copied to
66the output directory unchanged.
67This implies no namespaces (turns off \fB-n\fR) and
68requires \fB-d\fR to specify an output file.
69.TP
70\fB-d output-dir\fR
71Specifies a directory to contain transformed
72representations of the input files.
73By default, \fB-d\fR outputs a canonical representation
74(described below).
75You can select different output formats using \fB-c\fR
76and \fB-m\fR.
77
78The output filenames will
79be exactly the same as the input filenames or "STDIN" if the input is
80coming from standard input.  Therefore, you must be careful that the
81output file does not go into the same directory as the input
82file.  Otherwise, \fBxmlwf\fR will delete the
83input file before it generates the output file (just like running
84cat < file > file in most shells).
85
86Two structurally equivalent XML documents have a byte-for-byte
87identical canonical XML representation.
88Note that ignorable white space is considered significant and
89is treated equivalently to data.
90More on canonical XML can be found at
91http://www.jclark.com/xml/canonxml.html .
92.TP
93\fB-e encoding\fR
94Specifies the character encoding for the document, overriding
95any document encoding declaration.  \fBxmlwf\fR
96supports four built-in encodings:
97US-ASCII,
98UTF-8,
99UTF-16, and
100ISO-8859-1.
101Also see the \fB-w\fR option.
102.TP
103\fB-m\fR
104Outputs some strange sort of XML file that completely
105describes the the input file, including character postitions.
106Requires \fB-d\fR to specify an output file.
107.TP
108\fB-n\fR
109Turns on namespace processing.  (describe namespaces)
110\fB-c\fR disables namespaces.
111.TP
112\fB-p\fR
113Tells xmlwf to process external DTDs and parameter
114entities.
115
116Normally \fBxmlwf\fR never parses parameter
117entities.  \fB-p\fR tells it to always parse them.
118\fB-p\fR implies \fB-x\fR.
119.TP
120\fB-r\fR
121Normally \fBxmlwf\fR memory-maps the XML file
122before parsing; this can result in faster parsing on many
123platforms.
124\fB-r\fR turns off memory-mapping and uses normal file
125IO calls instead.
126Of course, memory-mapping is automatically turned off
127when reading from standard input.
128
129Use of memory-mapping can cause some platforms to report
130substantially higher memory usage for
131\fBxmlwf\fR, but this appears to be a matter of
132the operating system reporting memory in a strange way; there is
133not a leak in \fBxmlwf\fR.
134.TP
135\fB-s\fR
136Prints an error if the document is not standalone.
137A document is standalone if it has no external subset and no
138references to parameter entities.
139.TP
140\fB-t\fR
141Turns on timings.  This tells Expat to parse the entire file,
142but not perform any processing.
143This gives a fairly accurate idea of the raw speed of Expat itself
144without client overhead.
145\fB-t\fR turns off most of the output options
146(\fB-d\fR, \fB-m\fR, \fB-c\fR,
147\&...).
148.TP
149\fB-v\fR
150Prints the version of the Expat library being used, including some
151information on the compile-time configuration of the library, and
152then exits.
153.TP
154\fB-w\fR
155Enables support for Windows code pages.
156Normally, \fBxmlwf\fR will throw an error if it
157runs across an encoding that it is not equipped to handle itself.  With
158\fB-w\fR, xmlwf will try to use a Windows code
159page.  See also \fB-e\fR.
160.TP
161\fB-x\fR
162Turns on parsing external entities.
163
164Non-validating parsers are not required to resolve external
165entities, or even expand entities at all.
166Expat always expands internal entities (?),
167but external entity parsing must be enabled explicitly.
168
169External entities are simply entities that obtain their
170data from outside the XML file currently being parsed.
171
172This is an example of an internal entity:
173
174.nf
175<!ENTITY vers '1.0.2'>
176.fi
177
178And here are some examples of external entities:
179
180.nf
181<!ENTITY header SYSTEM "header-&vers;.xml">  (parsed)
182<!ENTITY logo SYSTEM "logo.png" PNG>         (unparsed)
183.fi
184.TP
185\fB--\fR
186(Two hyphens.)
187Terminates the list of options.  This is only needed if a filename
188starts with a hyphen.  For example:
189
190.nf
191xmlwf -- -myfile.xml
192.fi
193
194will run \fBxmlwf\fR on the file
195\fI-myfile.xml\fR.
196.PP
197Older versions of \fBxmlwf\fR do not support
198reading from standard input.
199.SH "OUTPUT"
200.PP
201If an input file is not well-formed,
202\fBxmlwf\fR prints a single line describing
203the problem to standard output.  If a file is well formed,
204\fBxmlwf\fR outputs nothing.
205Note that the result code is \fBnot\fR set.
206.SH "BUGS"
207.PP
208According to the W3C standard, an XML file without a
209declaration at the beginning is not considered well-formed.
210However, \fBxmlwf\fR allows this to pass.
211.PP
212\fBxmlwf\fR returns a 0 - noerr result,
213even if the file is not well-formed.  There is no good way for
214a program to use \fBxmlwf\fR to quickly
215check a file -- it must parse \fBxmlwf\fR's
216standard output.
217.PP
218The errors should go to standard error, not standard output.
219.PP
220There should be a way to get \fB-d\fR to send its
221output to standard output rather than forcing the user to send
222it to a file.
223.PP
224I have no idea why anyone would want to use the
225\fB-d\fR, \fB-c\fR, and
226\fB-m\fR options.  If someone could explain it to
227me, I'd like to add this information to this manpage.
228.SH "ALTERNATIVES"
229.PP
230Here are some XML validators on the web:
231
232.nf
233http://www.hcrc.ed.ac.uk/~richard/xml-check.html
234http://www.stg.brown.edu/service/xmlvalid/
235http://www.scripting.com/frontier5/xml/code/xmlValidator.html
236http://www.xml.com/pub/a/tools/ruwf/check.html
237.fi
238.SH "SEE ALSO"
239.PP
240
241.nf
242The Expat home page:        http://www.libexpat.org/
243The W3 XML specification:   http://www.w3.org/TR/REC-xml
244.fi
245.SH "AUTHOR"
246.PP
247This manual page was written by Scott Bronson <bronson@rinspin.com> for
248the Debian GNU/Linux system (but may be used by others).  Permission is
249granted to copy, distribute and/or modify this document under
250the terms of the GNU Free Documentation
251License, Version 1.1.
Note: See TracBrowser for help on using the repository browser.