This document is the primary documentation for FoX, the Fortan/XML library. See below for other sources of documentation. It consists of:
Reference information on versions, standards compliance, and licensing.
Information about how to get up and running with FoX and how to use FoX in an existing project.
Finally, there is full API reference documentation.
This documentation is largely reference in nature. For new users it is best to start elsewhere:
Two workshops, entitled iFaX (Integrating Fortran and XML) have been run teaching the use of FoX, one in January 2007, and one in January 2008. The full documentation and lectures from these may be found at:
Out of the above workshops, some tutorial material has been written, focussing on different use cases. Currently two are available:
There is also tutorial information on the use of WKML here.
These documents describe all publically usable APIs.
Worked examples of the use of some of these APIs may be found in the examples/
subdirectory, and tutorial-style documentaion is available from the links above.
This documentation describes version 4.1 of the FoX library.
This version includes output modules for general XML, and for CML; and a fully validating XML parser, exposed through a Fortran version of the SAX2 input parser and a Fortran mapping of the W3C DOM interface.
This is a stable branch, which will be maintained with important bugfixes.
As of FoX-3.0, there is one user-visible change that should be noted.
In previous versions of FoX, the configure script was accessible as config/configure
. Version 3.0 now follows common practice by placing the script in the main directory, so it is now called as ./configure
.
Previous versions of FoX made it quite hard to compile only portions of the library (eg only the CML output portion; or just the SAX input). This is now possible by specifying arguments to the configuration script. For example,
./configure --enable-wcml
will cause the generated Makefile to only compile the CML writing module and its dependencies.
See Compilation for further details.
You will have received the FoX source code as a tar.gz file.
Unpack it as normal, and change directory into the top-level directory, FoX-$VERSION.
FoX requires a Fortran 95 compiler - not just Fortran 90. All currently available versions of Fortran compilers claim to support F95. If your favoured compiler is not listed as working, I recommend the use of g95, which is free to download and use. And in such a case, please send a bug report to your compiler vendor.
In the event that you need to write a code targetted at multiple compilers, including some which have bugs preventing FoX compilation, please note the possibility of producing a dummy library.
In order to generate the Makefile, make sure that you have a Fortran compiler in your PATH
, and do:
./configure
This should suffice for most installations. However:
You may not be interested in all of the modules that FoX supplies. For example, you may only be interested in output, not input. If so, you can select which modules you want using --enable-MODULENAME
where MODULENAME is one of wxml
, wcml
, wkml
, sax
, dom
. If none are explicitly enabled, then all will be built. (Alternatively, you can exclude modules one at a time with --disable-MODULENAME
) Thus, for example, if you only care about CML output, and not anything else: ./configure --enable-wcml
If you have more than one Fortran compiler available, or it is not on your PATH
, you can force the choice by doing:
./configure FC=/path/to/compiler/of/choice
It is possible that the configuration fails. In this case
By default the resultant files are installed under the objs directory. If you wish them to be installed elsewhere, you may do
./configure --prefix=/path/to/installation
Note that the configure process encodes the current directory location in several places. If you move the FoX directory later on, you will need to re-run configure.
You may be interested in dummy compilation. This is activated with the --enable-dummy
switch (but only works for wxml/wcml currently).
./configure --enable-wcml --enable-dummy
In order to compile the full library, now simply do:
make
This will build all the requested FoX modules, and the relevant examples
In the full version of the FoX library, there are several testsuites included.
To run them all, simply run make check
from the top-level directory. This will run the individual testsuites, and collate their results.
If any failures occur (unrelated to known compiler issues, see the up-to-date list), please send a message to the mailing list (fox-discuss@googlegroups.com) with details of compiler, hardware platform, and the nature of the failure.
The testsuites for the SAX and DOM libraries are very extensive, and are somewhat fragile, so are not distributed with FoX. Please contact the author for details.
A script is provided which will provide the appropriate compiler and linker flags for you; this will be created after configuration, in the top-level directory, and is called FoX-config
. It may be taken from there and placed anywhere.
FoX-config takes the following arguments:
--fcflags
: return flags for compilation--libs
: return flags for linking--wxml
: return flags for compiling/linking against wxml--wcml
: return flags for compiling/linking against wcml--sax
: return flags for compiling/linking against saxIf it is called with no arguments, it will expand to compile & link flags, thusly:
f95 -o program program.f90 `FoX-config`
For compiling only against FoX, do the following:
f95 -c `FoX-config --fcflags` sourcefile.f90
For linking only to the FoX library, do:
f95 -o program `FoX-config --libs` *.o
or similar, according to your compilation scheme.
Note that by default, FoX-config
assumes you are using all modules of the library. If you are only using part, then this can be specified by also passing the name of each module required, like so:
FoX-config --fcflags --wcml
Because of the shortcomings in some compilers, it is not possible to compile FoX everywhere. Equally, sometimes it is useful to be able to compile a code both with and without support for FoX (perhaps to reduce executable size). Especially where FoX is being used only for additional output, it is useful to be able to run the code and perform computations even without the possibility of XML output.
For this reason, it is possible to compile a dummy version of FoX. This includes all public interfaces, so that your code will compile and link correctly - however none of the subroutines do anything, so you can retain the same version of your code without having to comment out all FoX calls.
Because this dummy version of FoX contains nothing except empty subroutines, it compiles and links with all known Fortran 95 compilers, regardless of compiler bugs.
To compile the dummy code, use the --enable-dummy
switch. Note that currently the dummy mode is not yet available for the DOM module.
The recommended way to use FoX is to embed the full source code as a subdirectory, into an existing project.
In order to do this, you need to do something like the following:
It is probably best to isolate use of XML facilities to a small part of the program. This is easily accomplished for XML input, which will generally happen in only one or two places.
For XML output, this can be more complex. The easiest, and least intrusive way is probably to create a F90 module for your program, looking something like example_xml_module.f90
Then you must somewhere (probably in your main program), use this module, and call initialize_xml_output()
at the start; and then end_xml_output()
at the end of the program.
In any of the subroutines where you want to output data to the xml file, you should then insert use example_xml_module
at the beginning of the subroutine. You can then use any of the xml output routines with no further worries, as shown in the examples.
It is easy to make the use of FoX optional, by the use of preprocessor defines. This can be done simply by wrapping each call to your XML wrapper routines in #ifdef XML
, or similar. Alternatively, the use of the dummy FoX interfaces allows you to switch FoX on and off at compile time - see Compilation.
First, FoX must be configured, to ensure that it is set up correctly for your compiler.
(See Compilation)
If your main code has a configure
step, then run FoX's configure
as part of it.
If your code doesn't have its own configure step, then the first thing that "make" does should be to configure FoX, if it's not already configured. But that should only happen once; every time you make your code thereafter, you don't need to re-configure FoX, because nothing has changed. To do that, put a target like the following in your Makefile.
FoX/.config:
(cd FoX; ./configure FC=$(FC))
(Assuming that your Makefile
already has a variable FC
which sets the Fortran compiler)
When FoX configure completes, it "touch"es a file called FoX/.config
. That means that
whenever you re-run your own make, it checks to see if FoX/.config
exists - if it does,
then it knows FoX doesn't need to be re-configured, so it doesn't bother.
Then, FoX needs to be compiled before your code (because your modules will depend on FoX's modules.) But again, it only needs to be compiled once. You won't be changing FoX, you'll only be changing your own code, so recompiling your code doesn't require recompiling FoX.
So, add another target like the following;
FoX/.FoX: FoX/.config
(cd FoX; $(MAKE))
This has a dependency on the configure
script as I showed above, but it will only run it
if the configure
script hasn't already been run.
When FoX is successfully compiled, the last thing its Makefile
does is "touch" the file called
FoX/.FoX
. So the above target checks to see if that file exists; and if it does, then it doesn't
bother recompiling FoX, because it's already compiled. On the very first time you compile
your code, it will cd
into the FoX directory and compile it - but then never again.
You then need to have that rule be a dependency of your main target; like so:
MyExecutable: FoX/.FoX
(or whatever your default Makefile
rule is).
which will ensure that before MyExecutable
is compiled, make
will check to see that FoX
has been compiled (which most of the time it will be, so nothing further will happen).
But the first time you compile your code, it will call the FoX target, and FoX will be
configured & compiled.
You should add this to your FFLAGS
(or equivalent - the variable that holds
flags for compile-time use.
FFLAGS=-g -O2 -whatever-else $$(FoX/FoX-config --fcflags)
to make sure that you get the path to your modules. (Different compilers have different flags for specifying module
paths; some use -I
, some use -M
, etc, if you use the above
construction it will pick the right one automatically for your compiler.)
Similarly, for linking, add the following to your LDFLAGS
(or equivalent - the variable
that holds flags for link-time use.)
LDFLAGS=-lwhatever $$(FoX/FoX-config --libs)
(For full details of the FoX-config
script, see Compilation)
Finally - you probably have a clean
target in your makefile. Don't tie FoX into this
target - most of the time when you make clean
, you don't want to make clean
with
FoX as well, because there's no need - FoX won't have changed and
it'll take a couple of minutes to recompile.
However, you can add a distclean
(or something) target, which you use before
moving your code to another machine, that looks like:
distclean: clean
(cd FoX; $(MAKE) distclean)
and that will ensure that when you do make distclean
, even FoX's object files are
cleaned up. But of course that will mean that you have to reconfigure & recompile
FoX next time you compile your code
FoX is written with reference to the following standards:
[XML10]: http://www.w3.org/TR/REC-xml/
[XML11]: http://www.w3.org/TR/xml11
[Namespaces10]: http://www.w3.org/TR/xml-names
[Namespaces11]: http://www.w3.org/TR/xml-names11
[xml:id]: http://www.w3.org/TR/xml-id/
[xml:base]: http://www.w3.org/TR/xmlbase/
[CanonicalXML]: http://www.w3.org/TR/xml-c14n
[SAX2]: http://saxproject.org
[DOM1]: http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html
[DOM2]: http://www.w3.org/TR/DOM-Level-2-Core/
[DOM3]: http://www.w3.org/TR/DOM-Level-3-Core/
In particular:
FoX_wxml
knows about [XML10], [XML11], [Namespaces10], [Namespaces11], [CanonicalXML]
FoX_sax
knows about [XML10], [XML11], [Namespaces10], [Namespaces11], [xml:id], [xml:base], [SAX2]
FoX_dom
knows about [XML10], [XML11], [Namespaces10], [Namespaces11], [xml:id], [xml:base], [DOM1], [DOM2], [DOM3], [CanonicalXML]
For exceptions, please see the relevant parts of the FoX documentation.
FoX_common is a module exporting interfaces to a set of convenience functions common to all of the FoX modules, which are of more general use.
Currently, there are three publically available functions and four subroutines:
str
converts primitive datatypes into strings in a consistent fashion, conformant with the expectations of XML processors.It is fully described in StringFormatting
The subroutine rts
performs the reverse function, taking a string (obtained from an XML document) and converts it into a primitive Fortran datatype.
The function countrts
examinies a string and determines the size of array requiered to hold all its data, once converted to a primitive Fortran datatype.
It is fully described in StringConversion
The final four procedures change the way that errors and warnings are handled when encounterd by any FoX modules. Using these procedures it is possible to convert non-fatal warnings and fatal errors to calls to the internal about routine. This generally has the effect of generating a stack trace or core dump of the program before temination. This is a global setting for all XML documents being manipulated. Two subroutines take a single logical argument to turn on (true) and off (false) the feature for warnings and errors respectivly:
FoX_set_fatal_warnings
for warnings
FoX_set_fatal_errors
for errors
and two functions (without arguments) allow the state to be checked:
FoX_get_fatal_warnings
for warnings
FoX_get_fatal_errors
for errors
Both fatal warnings and errors are off by default. This corresponds to the previous behaviour.
Many of the routines in wxml, and indeed in wcml which is built on top of wxml, are overloaded so that data may be passed to the same routine as string, integer, logical, real, or complex data.
In such cases, a few notes on the conversion of non-textual data to text is in order. The standard Fortran I/O formatting routines do not offer the control required for useful XML output, so FoX performs all its own formatting.
This formatting is done internally through a function which is also available publically to the user, str
.
To use this in your program, import it via:
use FoX_common, only; str
and use it like so:
print*, str(data)
In addition, for ease of use, the //
concatenation operator is overloaded, such that strings can easily be formed by concatenation of strings to other datatypes. To use this you must import it via:
use FoX_common, only: operator(//)
and use it like so:
integer :: data
print*, "This is a number "//data
This will work for all native Fortran data types - but no floating point formatting is available as described below with concatenation, only with str()
You may pass data of the following primitive types to str
:
Character data is returned unchanged.
Logical data is output such that True values are converted to the string 'true', and False to the string 'false'.
Integer data is converted to the standard decimal representation.
Real numbers, both single and double precision, are converted to strings in one of two ways, with some control offered to the user. The output will conform to the real number formats specified by XML Schema Datatypes.
This may be done in one of two ways:
Exponential notation, with variable number of significant figures. Format strings of the form "s
n" are accepted, where n is the number of significant figures.
Thus the number 111
, when output with various formats, will produce the following output:
s1 | 1e2 |
s2 | 1.1e2 |
s3 | 1.11e2 |
s4 | 1.110e2 |
The number of significant figures should lie between 1 and the number of digits precision provided by the real kind. If a larger or smaller number is specified, output will be truncated accordingly. If unspecified, then a sensible default will be chosen.
This format is not permitted by XML Schema Datatypes 1.0, though it is in 2.0
Non-exponential notation, with variable number of digits after the decimal point. Format strings of the form "r
n", where n is the number of digits after the decimal point.
Thus the number 3.14159
, when output with various formats, will produce the following output:
r0 | 3 |
r1 | 3.1 |
r2 | 3.14 |
r3 | 3.142 |
The number of decimal places must lie between 0 and whatever would output the maximum digits precision for that real kind. If a larger or smaller number is specified, output will be truncated accorsingly. If unspecified, then a sensible default will be chosen.
This format is the only one permitted by XML Schema Datatypes 1.0
If no format is specified, then a default of exponential notation will be used.
If a format is specified not conforming to either of the two forms above, a run-time error will be generated.
NB Since by using FoX or str, you are passing real numbers through various functions, this means that they must be valid real numbers. A corollary of this is that if you pass in +/-Infinity, or NaN, then the behaviour of FoX is unpredictable, and may well result in a crash. This is a consequence of the Fortran standard, which strictly disallows doing anything at all with such numbers, including even just passing them to a subroutine.
Complex numbers will be output as pairs of real numbers, in the following way:
(1.0e0)+i(1.0e0)
where the two halves can be formatted in the way described for 'Real numbers' above; only one format may be specified, and it will apply to both.
All the caveats described above apply for complex number as well; that is, output of complex numbers either of whose components are infinite or NaN is illegal in Fortran, and more than likely will cause a crash in FoX.
All of the above types of data may be passed in as arrays and matrices as well. In this case, a string containing all the individual elements will be returned, ordered as they would be in memory, each element separated by a single space.
If the data is character data, then there is an additional option to str, delimiter
which may be any single-character string, and will replace a space as the delimiter.
All functions in wxml which can accept arbitrary data (roughly, wherever you put anything that is not an XML name; attribute values, pseudo-attribute values, character data) will take scalars, arrays, and matrices of any of the above data types, with fmt=
and delimiter=
optional arguments where appropriate.
Similarly, wcml functions which can accept varied data will behave similarly.
Two procedures are provided to simplify reading data retreved from XML documents into Fortran variables. The subroutine rts
performs the data conversion step and the function countrts
can be used to allocate an array of the correct size for the incomming data.
rts
subroutineThe rts
subroutine can be imported from FoX_common
. In its simplest form, it is called in this fashion:
call rts(string, data)
string
is a simple Fortran string (probably retrieved from an XML file.)
data
is any native Fortran datatype: logical
, character
, integer
, real
, double precision
, complex
, double complex
, and may be a scalar, 1D or 2D array.
rts
will attempt to parse the contents of string
into the appropriate datatype, and return the value in data
.
Additional information or error handling is accomplished with the following optional arguments:
num
num
is an integer; on returning from the function it indicates the number of data items read before either:
data
was filled.iostat
iostat
is an integer, which on return from the function has the values:
0
for no problems-1
if too few elements were found in string
to fill up data
1
if data
was filled, but there were still data items left in string
2
if the characters found in string
could not be converted to the appropriate type for data
.NB if iostat
is not specified, and a non-zero value is returned, then the program will stop with an error message.
When string
is expected to be an array of strings, the following options are used to break string
into its constituent elements:
By default it is assumed that the elements are separated by whitespace, and that multiple whitespace characters are not significant. No zero-length elements are possible, nor are elements containing whitespace.
An optional argument, separator
may be specified, which is a single character. In this case, each element consists of all characters between subsequent occurences of the separator
. Zero-length elements are possible, but no escaping mechanism is possible.
Alternatively, an optional logical argument csv
may be specified. In this case, the value of delimiter
is ignored, and the string is parsed as a Comma-Separated-Value string, according to RFC 4180.
Numbers are expected to be formatted according to the usual conventions for Fortran input.
Complex numbers may be formatted according to either normal Fortran conventions (comma-separated pairs) or CMLComp conventions
Logical variables must be encoded according to the conventions of XML Schema Datatypes - that is, True may be written as "true" or "1", and False may be written as "false" or "0".
countrts
functionThe countrts
function can also be imported from FoX_common
. In its simplest form, it is called in this fashion:
countrts(string, datatype)
string
is a simple Fortran string (probably retrived from an XML file)
datatype
is a scalar argument of any native Fortran datatype (logical
, character
, integer
, real
, double precision
, complex
or double complex
).
The function returns a default integer equal to the number of elements that rts would
return if called with a sufficently large array of the same type as datatype
. countrts
returns 0 to indicate that characters were found in the string that could not be converted. If datatype is a character, the optional arguments seperator
and csv
are avalable as described in "string formatting" above. The countrts
function is pure and can be used as a specification function.
wxml
is a general Fortran XML output library. It offers a Fortran interface, in the form of a number of subroutines, to generate well-formed XML documents. Almost all of the XML features described in XML11 and Namespaces are available, and wxml
will diagnose almost all attempts to produce an invalid document. Exceptions below describes where wxml
falls short of these aims.
First, Conventions describes the conventions use in this document.
Then, Functions lists all of wxml
's publically exported functions, in three sections:
Please note that where the documentation below is not clear, it may be useful to look at some of the example files. There is a very simple example in the examples/
subdirectory, but which nevertheless shows the use of most of the features you will use.
A more elaborate example, using almost all of the XML features found here, is available in the top-level directory as wxml_example.f90
. It will be automatically compiled as part of the build porcess.
monospace
Note that where strings are passed in, they will be passed through entirely unchanged to the output file - no truncation of whitespace will occur.
It is strongly recommended that the functions be used with keyword arguments rather than replying on implicit ordering.
xmlf_t
This is an opaque type representing the XML file handle. Each function requires this as an argument, so it knows which file to operate on. (And it is an output of the xml_OpenFile subroutine) Since all subroutines require it, it is not mentioned below.
xml_OpenFile
Open a file for writing XML
By default, the XML will have no extraneous text nodes. This can have the effect of it looking slightly ugly, since there will be no newlines inserted between tags.
This behaviour can be changed to produce slightly nicer looking XML, by switching on pretty_print. This will insert newlines and spaces between some tags where they are unlikely to carry semantics. Note, though, that this does result in the XML produced being not quite what was asked for, since extra characters and text nodes have been inserted.
NB: The replace option should be noted. By default, xml_OpenFile will fail with a runtime error if you try and write to an existing file. If you are sure you want to continue on in such a case, then you can specify **replace**=.true.
and any existing files will be overwritten. If finer granularity is required over how to proceed in such cases, use the Fortran inquire
statement in your code. There is no 'append' functionality by design - any XML file created by appending to an existing file would be invalid.
xml_Close
Close an opened XML file, closing all still-opened tags so that it is well-formed.
In the normal run of event, trying to close an XML file with no root element will cause an error, since this is not well-formed. However, an optional argument, empty is provided in case it is desirable to close files which may be empty. In this case, a warning will still be emitted, but no fatal error generated.
xml_NewElement
Open a new element tag
xml_EndElement
Close an open tag
xml_AddAttribute
CDATA
, ID
, IDREF
, IDREFS
, NMTOKEN
, NMTOKENS
, ENTITY
, ENTITIES
, or NOTATION
(always upper case). If specified, this must match any attribute declarations that have been previously declared in the DTD. If unspecified this (as the XML standard requires) defaults to CDATA
.Add an attribute to the currently open tag.
By default, if the attribute value contains markup characters, they will be escaped automatically by wxml before output.
However, in rare cases you may not wish this to happen - if you wish to output Unicode
characters, or entity references. In this case, you should set escape=.false.
for the relevant
subroutine call. Note that if you do this, no checking on the validity of the output string iis performed; the onus is on you to ensure well-formedness
The value to be added may be of any type; it will be converted to text according to FoX's formatting rules, and if it is a 1- or 2-dimensional array, the elements will all be output, separated by spaces (except if it is a character array, in which case the delimiter may be changed to any other single character using an optional argument).
NB The type option is only provided so that in the case of an external DTD which FoX is unaware of, the attribute type can be specified (which gives FoX more information to ensure well-formedness and validity). Specifying the type incorrectly may result in spurious error messages)
xml_AddCharacters
Add text data. The data to be added may be of any type; they will be converted to text according to FoX's formatting rules, and if they are a 1- or 2-dimensional array, the elements will all be output, separated by spaces (except if it is a character array, in which case the delimiter may be changed to any other single character using an optional argument).
xml_AddNewline
Within the context of character output, add a (system-dependent) newline character. This function can only
be called wherever xml_AddCharacters
can be called. (Newlines outside of character context are under
FoX's control, and cannot be manipulated by the user.)
xml_DeclareNamespace
Add an XML Namespace declaration. This function may be called at any time, and its precise effect depends on when it is called; see below
xml_UndeclareNamespace
Undeclare an XML namespace. This is equivalent to declaring an namespace with an empty URI, and renders the namespace ineffective for the scope of the declaration. For explanation of its scope, see below.
NB Use of xml_UndeclareNamespace
implies that the resultant document will be compliant with XML Namespaces 1.1, but not 1.0; wxml will issue an error when trying to undeclare namespaces under XML 1.0.
If xml_[Un]declareNamespace
is called immediately prior to an xml_NewElement
call, then the namespace will be declared in that next element, and will therefore take effect in all child elements.
If it is called prior to an xml_NewElement
call, but that element has namespaced attributes
To explain by means of example: In order to generate the following XML output:
<cml:cml xmlns:cml="http://www.xml-cml.org/schema"/>
then the following two calls are necessary, in the prescribed order:
xml_DeclareNamespace(xf, 'cml', 'http://www.xml-cml.org')
xml_NewElement(xf, 'cml:cml')
However, to generate XML input like so:
xml_DeclareNamespace
call is made before the element tag is closed (either by xml_EndElement
, or by a new element tag being opened, or some text being added etc.) the correct XML will be generated.
Two previously mentioned functions are affected when used in a namespace-aware fashion.
xml_NewElement
, xml_AddAttribute
The element or attribute name is checked, and if it is a QName (ie if it is of the form prefix:tagName) then wxml will check that prefix is a registered namespace prefix, and generate an error if not.
If you don't know the purpose of any of these, then you don't need to.
xml_AddXMLDeclaration
Add XML declaration to the first line of output. If used, then the file must have been opened with addDecl = .false.
, and this must be the first wxml call to the document.o
NB The only XML versions available are 1.0 and 1.1. Attempting to specify anything else will result in an error. Specifying version 1.0 results in additional output checks to ensure the resultant document is XML-1.0-conformant.
NB Note that if the encoding is specified, and is specified to not be UTF-8, then if the specified encoding does not match that supported by the Fortran processor, you may end up with output you do not expect.
xml_AddDOCTYPE
Add an XML document type declaration. If used, this must be used prior to first xml_NewElement
call, and only one such call must be made.
xml_AddInternalEntity
Define an internal entity for the document. If used, this call must be made after xml_AddDOCTYPE
and before the first xml_NewElement
call.
xml_AddExternalEntity
Define an external entity for the document. If used, this call must be made after xml_AddDOCTYPE
and before the first xml_NewElement
call.
xml_AddParameterEntity
Define a parameter entity for the document. If used, this call must be made after xml_AddDOCTYPE
and before the first xml_NewElement
call.
xml_AddNotation
Define a notation for the document. If used, this call must be made after xml_AddDOCTYPE
and before the first xml_NewElement
call.
xml_AddElementToDTD
Add an ELEMENT declaration to the DTD. The syntax of the declaration is not checked in any way, nor does this affect how elements may be added in the content of the XML document.
If used, this call must be made after xml_AddDOCTYPE
and before the first xml_NewElement
call.
xml_AddAttlistToDTD
Add an ATTLIST declaration to the DTD. The syntax of the declaration is not checked in any way, nor does this affect how attributes may be added in the content of the XML document.
If used, this call must be made after xml_AddDOCTYPE
and before the first xml_NewElement
call.
xml_AddPEreferenceToDTD
Add a reference to a Parameter Entity in the DTD. No check is made according to whether the PE exists, has been declared, or may legally be used.
If used, this call must be made after xml_AddDOCTYPE
and before the first xml_NewElement
call.
xml_AddXMLStylesheet
Add XML stylesheet processing instruction, as described in [Stylesheets]. If used, this call must be made before the first xml_NewElement
call.
xml_AddXMLPI
Add an XML Processing Instruction.
If data is present, nothing further can be added to the PI. If it is not present, then pseudoattributes may be added using the call below.
Normally, the name is checked to ensure that it is XML-compliant. This requires that PI targets not start with [Xx][Mm][Ll]
, because such names are reserved. However, some are defined by later W3 specificataions. If you wish to use such PI targets, then set xml=.true.
when outputting them.
The output PI will look like:
<?name data?>
xml_AddPseudoAttribute
Add a pseudoattribute to the currently open PI.
xml_AddComment
Add an XML comment.
xml_AddEntityReference
This may be used anywhere that xml_AddCharacters
may be, and will insert an entity reference into the contents of the XML document at that point. Note that if the entity inserted is a character entity, its validity well be checked according to the rules of XML-1.1, not 1.0.
If the entity reference is not a character entity, then no check is made of its validity, and a warning will be issued
These functions may be of use in building wrapper libraries:
xmlf_Name
result(string)Return the filename of an open XML file
xmlf_OpenTag
result(string)Return the currently open tag of the current XML file (or the empty string if none is open)
xmlf_GetPretty_print
result(logical)Return the current value of pretty_print.
xmlf_SetPretty_print
NewValue: logicalSet the current value of pretty_print to the NewValue. This may be useful in a mixed namespace document where pretty printing the output may change the meaning under one of the namespaces.
Below are explained areas where wxml fails to implement the whole of XML 1.0/1.1. These are divided into two lists; where wxml does not permit the generation of a particular well-formed XML document, and where it does permit the generation of a particular non-well-formed document.
Ways in which wxml renders it impossible to produce a certain sort of well-formed XML document:
wxml will try very hard to ensure that output is well-formed. However, it is possible to fool wxml into producing ill-formed XML documents. Avoid doing so if possible; for completeness these ways are listed here. In all cases where ill-formedness is a possibility, a warning can be issued. These warnings can be verbose, so are off by default, but if they are desired, they can be switched on by manipulating the warning
argument to xml_OpenFile
.
Finally, note that constraints on XML documents are divided into two sets - well-formedness constraints (WFC) and validity constraints (VC). The above only applies to WFC checks. wxml can make some minimal checks on VCs, but this is by no means complete, nor is it intended to be. These checks are off by default, but may be switched on by manipulating the validate
argument to xml_OpenFile
.
WCML is a library for outputting CML data. It wraps all the necessary XML calls, such that you should never need to touch any WXML calls when outputting CML.
The CML output is conformant to version 2.4 of the CML schema.
The available functions and their intended use are listed below. Quite deliberately, no reference is made to the actual CML output by each function.
Wcml is not intended to be a generalized Fortran CML output layer. rather it is intended to be a library which allows the output of a limited set of well-defined syntactical fragments.
Further information on these fragments, and on the style of CML generated here, is available at http://www.uszla.me.uk/specs/subset.html.
This section of the manual will detail the available CML output subroutines.
wcml subroutines can be accessed from within a module or subroutine by inserting
use FoX_wcml
at the start. This will import all of the subroutines described below, plus the derived type xmlf_t
needed to manipulate a CML file.
No other entities will be imported; public/private Fortran namespaces are very carefully controlled within the library.
The use of dictionaries with WCML is strongly encouraged. (For those not conversant with dictionaries, a fairly detailed explanation is available at http://www.xml-cml.org/information/dictionaries)
In brief, dictionaries are used in two ways.
Firstly, to identify and disambiguate output data. Every output function below takes an optional argument, dictRef=""
. It is intended that every piece of data output is tagged with a dictionary reference, which will look something like nameOfCode:nameOfThing
.
So, for example, in SIESTA, all the energies are output with different dictRefs, looking like: siesta:KohnShamEnergy
, or siesta:kineticEnergy
, etc. By doing this, we can ensure that later on all these numbers can be usefully identified.
We hope that ultimately, dictionaries can be written for codes, which will explain what some of these names might mean. However, it is not in any way necessary that this be done - and using dictRef
attributes will help merely by giving the ability to disambiguate otherwise indistinguishable quantities.
We strongly recommend this course of action - if you choose to do follow our recommendation, then you should add a suitable Namespace to your code. That is, immediately after cmlBeginFile
and before cmlStartCml
, you should add something like:
call cmlAddNamespace(xf=xf, 'nameOfCode', 'WebPageOfCode')
Again, for SIESTA, we add:
call cmlAddNamespace(xf, 'siesta, 'http://www.uam.es/siesta')
If you don't have a webpage for your code, don't worry; the address is only used as an identifier, so anything that looks like a URL, and which nobody else is using, will suffice.
Secondly, we use dictionaries for units. This is compulsory (unlike dictRef
s above). Any numerical quantity that is output through cmlAddProperty or cmlAddParameter is required to carry units. These are added with the units=""
argument to the function. In addition, every other function below which will take numerical arguments also will take optional units, although default will be used if no units are supplied.
Further details are supplied in section Units below.
Functions are named in the following way:
All functions begin
cml
To begin and end a section of the CML file, a pair of functions will exist:
cmlStart
somethingcmlEnd
somethingTo output a given quantity/property/concept etc. a function will exist
cmlAdd
something
monospace
Note that where strings are passed in, they will be passed through entirely unchanged to the output file - no truncation of whitespace will occur.
Also note that wherever a real number can be passed in (including through anytype) then the formatting can be specified using the conventions described in StringFormatting
Where an array is passed in, it may be passed either as an assumed-shape array; that is, as an F90-style array with no necessity for specifying bounds; thusly:
integer :: array(50)
call cmlAddProperty(xf, 'coords', array)
or as an assumed-size array; that is, an F77-style array, in which case the length must be passed as an additional parameter:
integer :: array(*)
call cmlAddProperty(xf, 'coords', array, nitems=50)
Similarly, when a matrix is passed in, it may be passed in both fashions:
integer :: matrix(50, 50)
call cmlAddProperty(xf, 'coords', matrix)
or
integer :: array(3, *)
call cmlAddProperty(xf, 'coords', matrix, nrows=3, ncols=50)
All functions take as their first argument an XML file object, whose keyword is always xf
. This file object is initialized by a cmlBeginFile
function.
It is highly recommended that subroutines be called with keywords specified rather than relying on the implicit ordering of arguments. This is robust against changes in the library calling convention; and also stepsides a significant cause of errors when using subroutines with large numbers of arguments.
Note below that the functions cmlAddParameter
and cmlAddProperty
both require that units be specified for any numerical quantities output.
If you are trying to output a quantity that is genuinely dimensionless, then you should specify units="units:dimensionless"
; or if you are trying to output a countable quantity (eg number of CPUs) then you may specify units="units:countable"
.
For other properties, all units should be specified as namespaced quantities. If you are using a very few common units, it may be easiest to borrow definitions from the provided dictionaries;
(These links do not resolve yet.)
cmlUnits: http://www.xml-cml.org/units/units
siUnits: http://www.xml-cml.org/units/siUnits
atomicUnits: http://www.xml-cml.org/units/atomic
A default units dictionary, containing only the very basic units that wcml needs to know about, which has a namespace of: http://www.uszla.me.uk/FoX/units
, and wcml assigns it automatically to the prefix units
.
This is added automatically, so attempts to add it manually will fail.
The contents of all of these dictionaries, plus the wcml dictionary, may be viewed at: http://www.uszla.me.uk/unitsviz/units.cgi.
Otherwise, you should feel at liberty to construct your own namespace;
declare it using cmlAddNamespace
, and markup all your units as:
units="myNamespace:myunit"
cmlBeginFile
-1
as the unit number, in which case wcml will make a guess This takes care of all calls to open a CML output file.
cmlFinishFile
This takes care of all calls to close an open CML output file, once you have finished with it. It is compulsory to call this - if your program finished without calling this, then your CML file will be invalid.
cmlAddNamespace
This adds a namespace to a CML file.
NB This may only ever be called immediately after a cmlBeginFile
call, before any
output has been performed.
Attempts to do otherwise will result in a runtime error.
This will be needed if you are adding dictionary references to your output. Thus for siesta, we do:
call cmlAddNamespace(xf, 'siesta', 'http://www.uam.es/siesta')
and then output all our properties and parameters with dictRef="siesta:something"
.
cmlStartCml
(fileId) string scalar: name of originating file. (default: current filename)
(version) string scalar: version of CML in use. (default: 2.4)
cmlEndCml
This pair of functions begin and end the CML output to an existing CML file. It takes care of namespaces.
Note that unless specified otherwise, there will be a convention
attribute added to the cml
tag specifying FoX_wcml-2.0
as the convention. (see http://www.uszla.me.uk/FoX for details)
cmlStartMetadataList
(name) string scalar: name for the metadata list
(role) string scalar role which the element plays
cmlEndMetadataList
This pair of functions open & close a metadataList, which is a wrapper for metadata items.
cmlStartParameterList
(ref) string scalar: Reference an id
attribute of another element (generally deprecated)
(role) string scalar role which the element plays
cmlEndParameterList
This pair of functions open & close a parameterList, which is a wrapper for input parameters.
cmlStartPropertyList
(ref) string scalar: Reference an id
attribute of another element (generally deprecated)
(role) string scalar role which the element plays
cmlEndPropertyList
This pair of functions open & close a propertyList, which is a wrapper for output properties.
cmlStartKpointList
cmlEndKpointList
Start/end a list of k-points (added using cmlAddKpoint
below)
cmlStartModule
Note that in most cases where you might want to use a serial number, you should probably be using the cmlStartStep
subroutine below.
cmlEndModule
This pair of functions open & close a module of a computation which is unordered, or loosely-ordered. For example, METADISE uses one module for each surface examined.
cmlStartStep
(index) integer scalar: index number for the step. In the absence of an index, steps will be assumed to be consecutively numbered. Specifying this is useful if you wish to output eg every hundredth step.
(type) string scalar: what sort of step is this? This should be a namespaced string, for example: siesta:CG
is a Conjugate Gradient step in siesta.
cmlEndStep
This pair of functions open and close a module of a computation which is strongly ordered. For example, DLPOLY uses steps for each step of the simulation.
cmlAddMetadata
This adds a single item of metadata. Metadata vocabulary is completely uncontrolled within WCML. This means that metadata values may only be strings of characters. If you need your values to contain numbers, then you need to define the representation yourself, and construct your own strings.
cmlAddParameter
id
attribute of another element (generally deprecated) This function adds a tag representing an input parameter
cmlAddProperty
id
attribute of another element (generally deprecated) This function adds a tag representing an output property
cmlAddMolecule
cartesian
- the coordinates are Cartesian, or fractional
- the coordinates are fractional. The default is Cartesian. id
attribute of another element (generally deprecated) Outputs an atomic configuration. Bonds may be added using the optional arguments bondAtom1Refs, bondAtom2Refs and bondOrders. All these arrays must be the same lenght and all must be present if bonds are to be added. Optionally, bondIds can be used to add Ids to the bond elements. Some valididity constraints are imposed (atomsRefs in the bonds must be defined, bonds cannot be added twice). The meaning of the terms "molecule", "bond" and "bond order" is left loosly defined.
cmlAddLattice
real
or reciprocal
space. Outputs information about a unit cell, in lattice-vector form
cmlAddCrystal
units:angstrom
units:degrees
Outputs information about a unit cell, in crystallographic form
cmlStartKPoint
Start a kpoint section.
cmlEndKPoint
End a kpoint section.
cmlAddKPoint
Add an empty kpoint section.
cmlStartBand
Start a section describing one band.
cmlEndBand
End a section describing one band.
cmlAddEigenValue
Add a single eigenvalue to a band.
cmlAddBandList
Add a list of eigenvalues for a kpoint
cmlAddEigenValueVector
value: real scalar the eigenvalue for this band Add a phononic eigenpoint to the band - which has a single energy, and a 3xN matrix representing the eigenvector.
All cmlAdd
and cmlStart
routines take the following set of optional arguments:
id
: Unique identifying string for element. (Uniqueness is not enforced, though duplicated ids on output are usually an error and may cause later problems) title
: Human-readable title of element for display purposes dictRef
: reference to disambiguate element. Should be a QName; a namespaced string. An actual dictionary entry may or may not exist. It is not an error for it not to. convention
: convention by which the element is to be read. convention
is deliberately loose.)WKML is a library for creating KML documents. These documents are intended to be used for "expressing geographic annotation and visualization" for maps and Earth browsers such as Google Earth or Marble. WKML wraps all the necessary XML calls, such that you should never need to touch any WXML calls when outputting KML from a Fortran application.
WKML is intended to produce XML documents that conform to version 2.2 of the Open Geospatial Consortium's schema. However, the library offers no guarantee that documents produced will be valid as only a small subset of the constraints are enforced. The API is designed to minimize the possibilty of producing invalid KML in common use cases, and well-formdness is maintained by the underlying WXML library.
The available functions and their intended use are listed below. One useful reference to the use of KML is Google's KML documentation.
wkml subroutines can be accessed from within a module or subroutine by inserting
use FoX_wkml
at the start. This will import all of the subroutines described below, plus the derived type xmlf_t
needed to manipulate a KML file.
No other entities will be imported; public/private Fortran namespaces are very carefully controlled within the library.
monospace
All functions take as their first argument an XML file object, whose
keyword is always xf
. This file object is initialized by a kmlBeginFile
function.
It is highly recommended that subroutines be called with keywords specified rather than relying on the implicit ordering of arguments. This is robust against changes in the library calling convention; and also stepsides a significant cause of errors when using subroutines with large numbers of arguments.
kmlBeginFile
-1
as the unit number, in which case wkml will make a guess This takes care of all calls to open a KML output file.
kmlFinishFile
This takes care of all calls to close an open KML output file, once you have finished with it. It is compulsory to call this - if your program finished without calling this, then your KML file will be invalid.
kmlOpenFolder
This starts a new folder. Folders are used in KML to organize other objects into groups, the visability of these groups can be changed in one operation within Google Earth. Folders can be nested.
kmlCloseFolder
This closes the current folder.
kmlOpenDocument
This starts a new document element at this point in the output. Note that no checks are currently performed to ensure that this is permitted, for example only one document is permitted to be a child of the kml root element. Most users should not need to use this subroutine.
kmlCloseDocument
This closes the current document element. Do not close the outermose document
element created with kmlBeginFile
, this must be closed with kmlFinishFile
.
Most users should not need to use this subroutine.
kmlCreatePoints
A single function, kmlCreatePoints accepts various combinations of arguments, and will generate a series of individual points to be visualized in Google Earth. In fact, the KML produced will consist of a Folder, containing Placemarks, one for each point. The list of points may be provided in any of the three ways specified above.
kmlCreateLine
A single function, kmlCreateLine accepts various combinations of arguments, and will generate a series of individual points to be visualized as a (closed or open) path in Google Earth. In fact, the KML produced will consist of a LineString, or LinearRing, containing a list of coordinates. The list of points may be provided in any of the three ways specified above.
kmlStartRegion
Creates a filled region with the outer boundary described by the list of points. May be followed by one or more calls to kmlAddInnerBoundary
and these must be followed by a call to kmlAddInnerBoundary
.
kmlEndRegion
Ends the specification of a region with or without inner boundaries.
kmlAddInnerBoundary
Introduces an internal area that is to be excluded from the enclosing region.
WKML also contains two subroutines to allow scalar fields to be plotted over a geographical region. Data is presented to WKML as a collection of values and coordinates and this data can be displayed as a set of coloured cells, or as isocontours.
For all 2-D field subroutines both position and value of the data must be specified. The data values must always be specified as a rank-2 array, values(:,:). The grid can be specified in three ways depending on grid type.
Grid-point (i, j) = (longitude(i), latitude(j))
longitude(:,:)
and latitude(:,:)
. The grid may be of any form, aligned with no other projection: Grid-point (i, j)
is taken as (longitude(i, j), latitude(i, j))
In all cases, single or double precision data may be used so long as all data is consistent in precision within one call.
The third dimension of the data can be visualized in two (not mutually-exclusive) ways; firstly by assigning colours according to the value of the tird dimension, and secondly by using the altitude of the points as a (suitable scaled) proxy for the third dimension. The following optional arguments control this aspect of the visualization (both for cells and for contours)
Where no colormap is provided, one will be autogenerated with the appropriate number of levels as calculated from the provided contourvalues. Where no contourvalues are provided, they are calculated based on the size of the colormap provided. Where neither colormap nor contour_values are provided, a default of 5 levels with an autogenerated colormap will be used.
kmlCreateCells
This subroutine generates a set of filled pixels over a region of the earth.
kmlCreateContours
This subroutine creates a set of contour lines.
KML natively handles all colours as 32-bit values, expressed as 8-digit hexadecimal numbers in ABGR (alpha-blue-green-red) channel order. However, this is not very friendly. WKML provides a nicer interface to this, and all WKML functions which accept colour arguments will accept them in three ways:
A function and a subroutine are provided to maniputate the color_t derived type:
kmlGetCustomColor
This function takes a single argument of type integer or string and returns a color_t derived type. If the argument is a string the colour is taken from the set of X11 colours, if it is an integer, i, the ith colour is selected from the X11 list.
kmlSetCustomColor
This functon takes a single argument of type string(len=8) representing an 8-digit AVGR hexadecimal number and returns a color_t derived type representing that colour.
Several features of wkml make use of "colour maps", arrays of the color_t derived type, which are used to relate numerical values to colours when showing fields of data. These are created and used thus:
program colours
use FoX_wkml
type(color_t) :: colourmap(10)
! Use X11 colours from 101 to 110:
colourmap(1:10) = kmlGetCustomColor(101:110)
! Except for number 5 which should be red:
colourmap(5) = kmlGetCustomColor("indian red")
! And for number 6 which should be black
call kmlSetCustomColor(colourmp(6), "00000000")
end program colours
Controling styling in KML can be quite complex. Most of the subroutines in WKML allow some control of the generated style but they do not ptovide access to the full KML vocabulary which allows more complex styling. In order to access the more complex styles in KML it is necessary to create KML style maps - objects that are defined, named with a styleURL. The styleURL is then used to reference to the style defined by the map.
Styles can be created using the following three subroutines. In each case one argument is necessary: id, which must be a string (starting with an alphabetic letter, and containing no spaces or punctuation marks) which is used later on to reference the style. All other arguments are optional.
kmlCreatePointStyle
Creates a style that can be used for points.
kmlCreateLineStyle
Creates a style that can be used for lines.
kmlCreatePolygonStyle
Creates a style that can be used for a polygon.
Following experience integrating FoX_wxml
into several codes, here are a few tips for debugging any problems you may encounter.
You may encounter problems at the compiling or linking stage, with error messages along the lines of: 'No Specific Function can be found for this Generic Function' (exact phrasing depending on compiler, of course.)
If this is the case, it is possible that you have accidentally got the arguments to the offending out of order. If so, then use the keyword form of the argument to ensure correctness; that is, instead of doing:
call cmlAddProperty(file, name, value)
do:
call cmlAddProperty(xf=file, name=name, value=value)
This will prevent argument mismatches, and is recommended practise in any case.
You may encounter run-time issues. FoX performs many run-time checks to ensure the validity of the resultant XML code. In so far as it is possible, FoX will either issue warnings about potential problems, or try and safely handle any errors it encounters. In both cases, warning will be output on stderr, which will hopefully help diagnose the problem.
Sometimes, however, FoX will encounter a problem it can do nothing about, and must stop. In all cases, it will try and write out an error message highlighting the reason, and generate a backtrace pointing to the offending line. Occasionally though, the compiler will not generate this information, and the error message will be lost.
If this is the case, you can either investigate the coredump to find the problem, or (if you are on a Mac) look in ~/Library/Logs/CrashReporter to find a human-readable log.
If this is not enlightening, or you cannot find the problem, then some of the most common issues we have encountered are listed below. Many of them are general Fortran problems, but sometimes are not easily spotted in the context of FoX.
Make sure, whenever you are writing out a real number through one of FoX's routines, and specifying a format, that the format is correct according to StringFormatting. Fortran-style formats are not permitted, and will cause crashes at runtime.
If you are outputting arrays or matrices, and are doing so in the traditional Fortran style - by passing both the array and its length to the routine, like so:
call xml_AddAttribute(xf=file, name=name, value=array, nvalue=n)
then if n
is wrong, you may end up with an array overrun, and cause a crash.
We highly recommend wherever possible using the Fortran-90 style, like so:
call xml_AddAttribute(xf=file, name=name, value=array)
where the array length will be passed automatically.
If you are passing variables to FoX which have not been initialized, you may well cause a crash. This is especially true, and easy to cause if you are passing in an array which (due to a bug elsewhere) has been partly but not entirely initialized. To diagnose this, try printing out suspect variables just before passing them to FoX, and look for suspiciously wrong values.
If during the course of your calculation you accidentally generate Infinities, or NaNs, then passing them to any Fortran subroutine can result in a crash - therefore trying to pass them to FoX for output may result in a crash.
If you suspect this is happening, try printing out suspect variables before calling FoX.
SAX stands for Simple API for XML, and was originally a Java API for reading XML. (Full details at http://saxproject.org). SAX implementations exist for most common modern computer languages.
FoX includes a SAX implementation, which translates most of the Java API into Fortran, and makes it accessible to Fortran programs, enabling them to read in XML documents in a fashion as close and familiar as possible to other languages.
SAX is a stream-based, event callback API. Conceptually, running a SAX parser over a document results in the parser generating events as it encounters different XML components, and sends the events to the main program, which can read them and take suitable action.
Events are generated when the parser encounters, for example, an element opening tag, or some text, and most events carry some data with them - the name of the tag, or the contents of the text.
The full list of events is quite extensive, and may be seen below. For most purposes, though, it is unlikely that most users will need more than the 5 most common events, documented here.
startDocument
- generated when the parser starts reading the document. No accompanying data.endDocument
- generated when the parser reaches the end of the document. No accompanying data.startElement
- generated by an element opening tag. Accompanied by tag name, namespace information, and a list of attributesendElement
- generated by an element closing tag. Accompanied by tag name, and namespace information.characters
- generated by text between tags. Accompanied by contents of text.Given these events and accompanying information, a program can extract data from an XML document.
Any program using the FoX SAX parser must a) use the FoX module, and b) declare a derived type variable to hold the parser, like so:
use FoX_sax
type(xml_t) :: xp
The FoX SAX parser then works by requiring the programmer to write a module containing subroutines to receive any of the events they are interested in, and passing these subroutines to the parser.
Firstly, the parser must be initialized, by passing it XML data. This can be done either by giving a filename, which the parser will manipulate, or by passing a string containing an XML document. Thus:
call open_xml_file(xp, "input.xml", iostat)
The iostat
variable will report back any errors in opening the file.
Alternatively,
call open_xml_string(xp, XMLstring)
where XMLstring
is a character variable.
To now run the parser over the file, you simply do:
call parse(xp, list_of_event_handlers)
And once you're finished, you can close the file, and clean up the parser, with:
call close_xml_t(xp)
It is unlikely that most users will need to operate any of these options, but the following are available for use; all are optional boolean arguments to parse
.
namespaces
Does namespace processing occur? Default is .true.
, and if on, then any non-namespace-well-formed documents will be rejected, and namespace URI resolution will be performed according to the version of XML in question. If off, then documents will be processed without regard for namespace well-formedness, and no namespace URI resolution will be performed.
namespace_prefixes
Are xmlns
attributes reported through the SAX parser? Default is .false.
; all such attributes are removed by the parser, and transparent namespace URI resolution is performed. If on, then such attributes will be reported, and treated according to the value of xmlns-uris
below. (If namespaces
is false, this flag has no effect)
validate
Should validation be performed? Default is .false.
, no validation checks are made, and the influence of the DTD on the XML Infoset is ignored. (Ill-formed DTD's will still cause fatal errors, of course.) If .true.
, then validation will be performed, and the Infoset modified accordingly.
xmlns_uris
Should xmlns
attributes have a namespace of http://www.w3.org/2000/xmlns/
? Default is .false.
. If such attributes are reported, they have no namespace. If .true.
then they are supplied with the appropriate namespace. (if namespaces
or namespace-prefixes
are .false.
, then this flag has no effect.)
To receive events, you must construct a module containing event handling subroutines. These are subroutines of a prescribed form - the input & output is predetermined by the requirements of the SAX interface, but the body of the subroutine is up to you.
The required forms are shown in the API documentation below, but here are some simple examples.
To receive notification of character events, you must write a subroutine which takes as input one string, which will contain the characters received. So:
module event_handling
use FoX_sax
contains
subroutine characters_handler(chars)
character(len=*), intent(in) :: chars
print*, chars
end subroutine
end module
That does very little - it simply prints out the data it receives. However, since the subroutine is in a module, you can save the data to a module variable, and manipulate it elsewhere; alternatively you can choose to call other subroutines based on the input.
So, a complete program which reads in all the text from an XML document looks like this:
module event_handling
use FoX_sax
contains
subroutine characters_handler(chars)
character(len=*), intent(in) :: chars
print*, chars
end subroutine
end module
program XMLreader
use FoX_sax
use event_handling
type(xml_t) :: xp
call open_xml_file(xp, 'input.xml')
call parse(xp, characters_handler=characters_handler)
call close_xml_t(xp)
end program
The other likely most common event is the startElement event. Handling this involves writing a subroutine which takes as input three strings (which are the local name, namespace URI, and fully qualified name of the tag) and a dictionary of attributes.
An attribute dictionary is essentially a set of key:value pairs - where the key is the attributes name, and the value is its value. (When considering namespaces, each attribute also has a URI and localName.)
Full details of all the dictionary-manipulation routines are given in AttributeDictionaries, but here we shall show the most common.
getLength(dictionary)
- returns the number of entries in the dictionary (the number of attributes declared)
hasKey(dictionary, qName)
(where qName
is a string) returns .true.
or .false.
depending on whether an attribute named qName
is present.
hasKey(dictionary, URI, localname)
(where URI
and localname
are strings) returns .true.
or .false.
depending on whether an attribute with the appropriate URI
and localname
is present.
getQName(dictionary, i)
(where i
is an integer) returns a string containing the key of the i
th dictionary entry (ie, the name of the i
th attribute.
getValue(dictionary, i)
(where i
is an integer) returns a string containing the value of the i
th dictionary entry (ie the value of the i
th attribute.
getValue(dictionary, URI, localname)
(where URI
and localname
are strings) returns a string containing the value of the attribute with the appropriate URI
and localname
(if it is present)
So, a simple subroutine to receive a startElement event would look like:
module event_handling
contains
subroutine startElement_handler(URI, localname, name,attributes)
character(len=*), intent(in) :: URI
character(len=*), intent(in) :: localname
character(len=*), intent(in) :: name
type(dictionary_t), intent(in) :: attributes
integer :: i
print*, name
do i = 1, getLength(attributes)
print*, getQName(attributes, i), '=', getValue(attributes, i)
enddo
end subroutine startElement_handler
end module
program XMLreader
use FoX_sax
use event_handling
type(xml_t) :: xp
call open_xml_file(xp, 'input.xml')
call parse(xp, startElement_handler=startElement_handler)
call close_xml_t(xp)
end program
Again, this does nothing but print out the name of the element, and the names and values of all of its attributes. However, by using module variables, or calling other subroutines, the data could be manipulated further.
The SAX parser detects all XML well-formedness errors (and optionally validation errors). By default, when it encounters an error, it will simply halt the program with a suitable error message. However, it is possible to pass in an error handling subroutine if some other behaviour is desired - for example it may be nice to report the error to the user, finish parsing, and carry on with some other task.
In any case, once an error is encountered, the parser will finish. There is no way to continue reading past an error. (This means that all errors are treated as fatal errors, in the terminology of the XML standard).
An error handling subroutine works in the same way as any other event handler, with the event data being an error message. Thus, you could write:
subroutine fatalError_handler(msg)
character(len=*), intent(in) :: msg
print*, "The SAX parser encountered an error:"
print*, msg
print*, "Never mind, carrying on with the rest of the calcaulation."
end subroutine
The parser can be stopped at any time. Simply do (from within one of the callback functions).
call stop_parser(xp)
(where xp
is the XML parser object). The current callback function will be completed, then the parser will be stopped, and control will return to the main program, the parser having finished.
There is one derived type, xml_t
. This is entirely opaque, and is used as a handle for the parser.
There are four subroutines:
open_xml_file
type(xml_t), intent(inout) :: xp
character(len=*), intent(in) :: string
integer, intent(out), optional :: iostat
This opens a file. xp
is initialized, and prepared for parsing. string
must contain the name of the file to be opened. iostat
reports on the success of opening the file. A value of 0
indicates success.
open_xml_string
type(xml_t), intent(inout) :: xpi
character(len=*), intent(in) :: string
This prepares to parse a string containing XML data. xp
is initialized. string
must contain the XML data.
close_xml_t
type(xml_t), intent(inout) :: xp
This closes down the parser (and closes the file, if input was coming from a file.) xp
is left uninitialized, ready to be used again if necessary.
parse
type(xml_t), intent(inout) :: xp
external :: list of event handlers
logical, optional, intent(in) :: validate
This tells xp
to start parsing its document.
(Advanced: See above for the list of options that the parse
subroutine may take.)
The full list of event handlers is in the next section. To use them, the interface must be placed in a module, and the body of the subroutine filled in as desired; then it should be specified as an argument to parse
as:
name_of_event_handler = name_of_user_written_subroutine
Thus a typical call to parse
might look something like:
call parse(xp, startElement_handler = mystartelement, endElement_handler = myendelement, characters_handler = mychars)
where mystartelement
, myendelement
, and mychars
are all subroutines written by you according to the interfaces listed below.
All of the callbacks specified by SAX 2 are implemented. Documentation of the SAX 2 interfaces is available in the JavaDoc at http://saxproject.org, but as the interfaces needed adjustment for Fortran, they are listed here.
For documentation on the meaning of the callbacks and of their arguments, please refer to the Java SAX documentation.
characters_handler
subroutine characters_handler(chunk)
character(len=*), intent(in) :: chunk
end subroutine characters_handler
Triggered when some character data is read from between tags.
NB Note that all character data is reported, including whitespace. Thus you will probably get a lot of empty characters
events in a typical XML document.
NB Note also that it is not required that a single chunk of character data all come as one event - it may come as multiple consecutive events. You should concatenate the results of subsequent character events before processing.
endDocument_handler
subroutine endDocument_handler()
end subroutine endDocument_handler
Triggered when the parser reaches the end of the document.
endElement_handler
subroutine endElement_handler(namespaceURI, localName, name)
character(len=*), intent(in) :: namespaceURI
character(len=*), intent(in) :: localName
character(len=*), intent(in) :: name
end subroutine endElement_handler
Triggered by a closing tag.
endPrefixMapping_handler
subroutine endPrefixMapping_handler(prefix)
character(len=*), intent(in) :: prefix
end subroutine endPrefixMapping_handler
Triggered when a namespace prefix mapping goes out of scope.
ignorableWhitespace
subroutine ignorableWhitespace_handler(chars)
character(len=*), intent(in) :: chars
end subroutine ignorableWhitespace_handler
Triggered when whitespace is encountered within an element declared as having no PCDATA. (Only active in validating mode.)
processingInstruction_handler
subroutine processingInstruction_handler(name, content)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: content
end subroutine processingInstruction_handler
Triggered by a Processing Instruction
skippedEntity_handler
subroutine skippedEntity_handler(name)
character(len=*), intent(in) :: name
end subroutine skippedEntity_handler
Triggered when either an external entity, or an undeclared entity, is skipped.
startDocument_handler
subroutine startDocument_handler()
end subroutine startDocument_handler
Triggered when the parser starts reading the document.
startElement_handler
subroutine startElement_handler(namespaceURI, localName, name, attributes)
character(len=*), intent(in) :: namespaceUri
character(len=*), intent(in) :: localName
character(len=*), intent(in) :: name
type(dictionary_t), intent(in) :: attributes
end subroutine startElement_handler
Triggered when an opening tag is encountered. (see LINK for documentation on handling attribute dictionaries.
startPrefixMapping_handler
subroutine startPrefixMapping_handler(namespaceURI, prefix)
character(len=*), intent(in) :: namespaceURI
character(len=*), intent(in) :: prefix
end subroutine startPrefixMapping_handler
Triggered when a namespace prefix mapping start.
notationDecl_handler
subroutine notationDecl_handler(name, publicId, systemId)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: publicId
character(len=*), intent(in) :: systemId
end subroutine notationDecl_handler
Triggered when a NOTATION declaration is made in the DTD
unparsedEntityDecl_handler
subroutine unparsedEntityDecl_handler(name, publicId, systemId, notation)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: publicId
character(len=*), intent(in) :: systemId
character(len=*), intent(in) :: notation
end subroutine unparsedEntityDecl_handler
Triggered when an unparsed entity is declared
error_handler
subroutine error_handler(msg)
character(len=*), intent(in) :: msg
end subroutine error_handler
Triggered when a error is encountered in parsing. Parsing will continue after this event.
fatalError_handler
subroutine fatalError_handler(msg)
character(len=*), intent(in) :: msg
end subroutine fatalError_handler
Triggered when a fatal error is encountered in parsing. Parsing will cease after this event.
warning_handler
subroutine warning_handler(msg)
character(len=*), intent(in) :: msg
end subroutine warning_handler
Triggered when a parser warning is generated. Parsing will continue after this event.
attributeDecl_handler
subroutine attributeDecl_handler(eName, aName, type, mode, value)
character(len=*), intent(in) :: eName
character(len=*), intent(in) :: aName
character(len=*), intent(in) :: type
character(len=*), intent(in) :: mode
character(len=*), intent(in) :: value
end subroutine attributeDecl_handler
Triggered when an attribute declaration is encountered in the DTD.
elementDecl_handler
subroutine elementDecl_handler(name, model)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: model
end subroutine elementDecl_handler
Triggered when an element declaration is enountered in the DTD.
externalEntityDecl_handler
subroutine externalEntityDecl_handler(name, publicId, systemId)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: publicId
character(len=*), intent(in) :: systemId
end subroutine externalEntityDecl_handler
Triggered when a parsed external entity is declared in the DTD.
internalEntityDecl_handler
subroutine internalEntityDecl_handler(name, value)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: value
end subroutine internalEntityDecl_handler
Triggered when an internal entity is declared in the DTD.
comment_handler
subroutine comment_handler(comment)
character(len=*), intent(in) :: comment
end subroutine comment_handler
Triggered when a comment is encountered.
endCdata_handler
subroutine endCdata_handler()
end subroutine endCdata_handler
Triggered by the end of a CData section.
endDTD_handler
subroutine endDTD_handler()
end subroutine endDTD_handler
Triggered by the end of a DTD.
endEntity_handler
subroutine endEntity_handler(name)
character(len=*), intent(in) :: name
end subroutine endEntity_handler
Triggered at the end of entity expansion.
startCdata_handler
subroutine startCdata_handler()
end subroutine startCdata_handler
Triggered by the start of a CData section.
startDTD_handler
subroutine startDTD_handler(name, publicId, systemId)
character(len=*), intent(in) :: name
character(len=*), intent(in) :: publicId
character(len=*), intent(in) :: systemId
end subroutine startDTD_handler
Triggered by the start of a DTD section.
startEntity_handler
subroutine startEntity_handler(name)
character(len=*), intent(in) :: name
end subroutine startEntity_handler
Triggered by the start of entity expansion.
The FoX SAX implementation implements all of XML 1.0 and 1.1; all of XML Namespaces 1.0 and 1.1; xml:id and xml:base.
Although FoX tries very hard to work to the letter of the XML and SAX standards, it falls short in a few areas.
FoX will only process documents consisting of nothing but US-ASCII data. It will accept documents labelled with any single byte character set which is identical to US-ASCII in its lower 7 bits (for example, any of the ISO-8859 charsets, or UTF-8) but an error will be generated as soon as any character outside US-ASCII is encountered. (This includes non-ASCII characters present only be character entity reference)
As a corollary, UTF-16 documents of any endianness will also be rejected.
(It is impossible to implement IO of non-ASCII documents in a portable fashion using standard Fortran 95, and it is impossible to handle non-ASCII data internally using standard Fortran strings. A fully unicode-capable FoX version is under development, but requires Fortran 2003. Please enquire for further details if you're interested.)
file
, will be skipped)Beyond this, any aspects of the listed XML standards to which FoX fails to do justice to are bugs.
The difference betweek Java & Fortran means that none of the SAX APIs can be copied directly. However, FoX offers data types, subroutines, and interfaces covering most of the facilities offered by SAX. Where it does not, this is mentioned here.
org.sax.xml:
parse
subroutine.org.sax.xml.ext:
org.sax.xml.helpers:
When parsing XML using the FoX SAX module, attributes are returned contained within a dictionary object.
This dictionary object implements all the methods described by the SAX interfaces Attributes and Attributes2. Full documentation is available from the SAX Javadoc, but is reproduced here for ease of reference.
All of the attribute dictionary objects and functions are exported through FoX_sax - you must USE the module to enable them. The dictionary API is described here.
An attribute dictionary consists of a list of entries, one for each attribute. The entries all have the following pieces of data:
and for namespaced attributes:
In addition, the following pieces of data will be picked up from a DTD if present:
There is one derived type of interest, dictionary_t
.
It is opaque - that is, it should only be manipulated through the functions described here.
getLength
type(dictionary_t), intent(in) :: dict
Returns an integer with the length of the dictionary, ie the number of dictionary entries.
hasKey
type(dictionary_t), intent(in) :: dict
character(len=*), intent(in) :: key
Returns a logical value according to whether the dictionary contains an attribute named key
or not.
hasKey
type(dictionary_t), intent(in) :: dict
character(len=*), intent(in) :: uri
character(len=*), intent(in) :: localname
Returns a logical value according to whether the dictionary contains an attribute with the correct URI
and localname
.
getQName
type(dictionary_t), intent(in) :: dict
integer, intent(in) :: i
Return the full name of the i
th dictionary entry.
getValue
type(dictionary_t), intent(in)
integer, intent(in) :: i
If an integer is passed in - the value of the i
th attribute.
getValue
type(dictionary_t), intent(in)
character(len=*), intent(in) :: qName
If a single string is passed in, the value of the attribute with that name.
getValue
type(dictionary_t), intent(in)
character(len=*), intent(in) :: uri, localname
If two strings are passed in, the value of the attribute with that uri and localname.
getURI
type(dictionary_t), intent(in)
integer, intent(in) :: i
Returns a string containing the nsURI of the i
th attribute.
getlocalName
type(dictionary_t), intent(in)
integer, intent(in) :: i
Returns a string containing the localName of the i
th attribute.
The following functions are only of interest if you are using DTDs.
getType
type(dictionary_t), intent(in)
integer, intent(in), optional :: i
If an integer is passed in, returns the type of the i
th attribute.
getType
type(dictionary_t), intent(in)
character(len=*), intent(in) :: qName
If a single string is passed in, returns the type of the attribute with that QName.
getType
type(dictionary_t), intent(in)
character(len=*), intent(in) :: uri
character(len=*), intent(in) :: localName
If a single string is passed in, returnsthe type of the attribute with that {uri,localName}.
isDeclared
type(dictionary_t), intent(in)
integer, intent(in), optional :: i
If an integer is passed in, returns false unless the i
th attribute is declared in the DTD.
isDeclared
type(dictionary_t), intent(in)
character(len=*), intent(in) :: qName
If a single string is passed in, returns false unless the attribute with that QName is declared in the DTD.
isDeclared
type(dictionary_t), intent(in)
character(len=*), intent(in) :: uri
character(len=*), intent(in) :: localName
If a single string is passed in, returns false unless the attribute with that {uri,localName} is declared in the DTD.
isSpecified
type(dictionary_t), intent(in)
integer, intent(in), optional :: i
If an integer is passed in, returns true unless the i
th attribute is a default value from the DTD.
isSpecified
type(dictionary_t), intent(in)
character(len=*), intent(in) :: qName
If a single string is passed in, returns true unless the attribute with that QName is a default value from the DTD.
isSpecified
type(dictionary_t), intent(in)
character(len=*), intent(in) :: uri
character(len=*), intent(in) :: localName
If a single string is passed in, returns true unless the attribute with that {uri,localName} is a default value from the DTD.
The FoX DOM interface exposes an API as specified by the W3C DOM Working group.
FoX implements essentially all of DOM Core Levels 1 and 2, (there are a number of minor exceptions which are listed below) and a substantial portion of DOM Core Level 3.
FoX implements all objects and methods mandated in DOM Core Level 1 and 2. (A listing of supported DOM Core Level 3 interfaces is given below.)
In all cases, the mapping from DOM interface to Fortran implementation is as follows:
type(Node), pointer :: aNode
aNodelist = aNode.getElementsByTagName(tagName)
aNodelist => getElementsByTagName(aNode, tagName)
aNode.normalize()
call normalize(aNode)
name = node.nodeName
node.nodeValue = string
name = getnodeName(node)
call setnodeValue(string)
aNodelist => getElementsByTagName(aNode, tagName)
The W3C DOM requires that a DOMString
object exist, capable of holding Unicode strings; and that all DOM functions accept and emit DOMString objects when string data is to be transferred.
FoX does not follow this model. Since (as mentioned elsewhere) it is impossible to perform Unicode I/O in standard Fortran, it would be obtuse to require users to manipulate additional objects merely to transfer strings. Therefore, wherever the DOM mandates use of a DOMString
, FoX merely uses standard Fortran character strings.
All functions or subroutines which expect DOMString input arguments should be used with normal character strings.
All functions which should return DOMString objects will return Fortran character strings.
All functions are exposed through the module FoX_DOM
. USE
this in your program:
program dom_example
use FoX_DOM
type(Node) :: myDoc
myDoc => parseFile("fileIn.xml")
call serialize(myDoc, "fileOut.xml")
end program dom_example
This manual will not exhaustively document the functions available through the Fox_DOM
interface. Primary documentation may be found in the W3C DOM specifications:`
The systematic rules for translating the DOM interfaces to Fortran are given in the previous section. For completeness, though, there is a list here. The W3C specifications should be consulted for the use of each.
DOMImplementation:
type(DOMImplementation), pointer
hasFeature(impl, feature, version)
createDocumentType(impl, qualifiedName, publicId, systemId)
createDocument(impl, qualifiedName, publicId, systemId)
Document:
type(Node), pointer
getDocType(doc)
getImplementation(doc)
getDocumentElement(doc)
createElement(doc, tagname)
createDocumentFragment(doc)
createTextNode(doc, data)
createComment(doc, data)
createCDataSection(doc, data)
createProcessingInstruction(doc, target, data)
createAttribute(doc, name)
createEntityReference(doc, name)
getElementsByTagName(doc, tagname)
importNode(doc, importedNode, deep)
createElementNS(doc, namespaceURI, qualifiedName)
createAttributeNS(doc, namespaceURI, qualifiedName)
getElementsByTagNameNS(doc, namespaceURI, qualifiedName)
getElementById(doc, elementId)
Node:
type(Node), pointer
getNodeName(arg)
getNodeValue(arg)
setNodeValue(arg, value)
getNodeType(arg)
getParentNode(arg)
getChildNodes(arg)
getFirstChild(arg)
getLastChild(arg)
getPreviousSibling(arg)
getNextSibling(arg)
getAttributes(arg)
getOwnerDocument(arg)
insertBefore(arg, newChild, refChild)
replaceChild(arg, newChild, refChild)
removeChild(arg, oldChild)
appendChild(arg, newChild)
hasChildNodes(arg)
cloneNode(arg, deep)
normalize
isSupported(arg, feature, version)
getNamespaceURI(arg)
getPrefix(arg)
setPrefix(arg, prefix)
getLocalName(arg)
hasAttributes(arg)
NodeList:
type(NodeList), pointer
item(arg, index)
getLength(arg)
NamedNodeMap:
type(NamedNodeMap), pointer
getNamedItem(map, name)
setNamedItem(map, arg)
removeNamedItem(map, name)
item(map, index)
getLength(map)
getNamedItemNS(map, namespaceURI, qualifiedName)
setNamedItemNS(map, arg)
removeNamedItemNS(map, namespaceURI, qualifiedName)
CharacterData:
type(Node), pointer
getData(np)
setData(np, data)
getLength(np)
substringData(np, offset, count)
appendData(np, arg)
deleteData(np, offset, count)
replaceData(np, offset, count, arg)
Attr:
type(Node), pointer
getName(np)
getSpecified(np)
getValue(np)
setValue(np, value)
getOwnerElement(np)
Element:
type(Node), pointer
getTagName(np)
getAttribute(np, name)
setAttribute(np, name, value)
removeAttribute(np, name)
getAttributeNode(np, name)
setAttributeNode(np, newAttr)
removeAttributeNode(np, oldAttr)
getElementsByTagName(np, name)
getAttributeNS(np, namespaceURI, qualifiedName)
setAttributeNS(np, namespaceURI, qualifiedName, value)
removeAttributeNS(np, namespaceURI, qualifiedName)
getAttributeNode(np, namespaceURI, qualifiedName)
setAttributeNode(np, newAttr)
removeAttributeNode(np, oldAttr)
getElementsByTagNameNS(np, namespaceURI, qualifiedName)
hasAttribute(np, name)
hasAttributeNS(np, namespaceURI, qualifiedName)
Text:
type(Node), pointer
splitText(np, offset)
DocumentType:
type(Node), pointer
getName(np)
getEntites(np)
getNotations(np)
getPublicId(np)
getSystemId(np)
getInternalSubset(np)
Notation:
type(Node), pointer
getPublicId(np)
getSystemId(np)
Entity:
type(Node), pointer
getPublicId(np)
getSystemId(np)
getNotationName(np)
ProcessingInstruction:
type(Node), pointer
getTarget(np)
getData(np)
setData(np, data)
In addition, the following DOM Core Level 3 functions are available:
Document:
getDocumentURI(np)
setDocumentURI(np, documentURI)
getDomConfig(np)
getInputEncoding(np)
getStrictErrorChecking(np)
setStrictErrorChecking(np, strictErrorChecking)
getXmlEncoding(np)
getXmlStandalone(np)
setXmlStandalone(np, xmlStandalone)
getXmlVersion(np)
setXmlVersion(np, xmlVersion)
adoptNode(np, source)
normalizeDocument(np)
renameNode(np, namespaceURI, qualifiedName)
Node:
getBaseURI(np)
getTextContent(np)
setTextContent(np, textContent)
isEqualNode(np, other)
isSameNode(np)
isDefaultNamespace(np, namespaceURI)
lookupPrefix(np, namespaceURI)
lookupNamespaceURI(np, prefix)
Attr:
getIsId(np)
Entity:
getInputEncoding(np)
getXmlVersion(np)
getXmlEncoding(np)
Text:
getIsElementContentWhitespace(np)
DOMConfiguration:
type(DOMConfiguration)
canSetParameter(arg, name, value)
getParameter(arg, name)
getParameterNames(arg)
setParameter(arg, name)
NB For details on DOMConfiguration, see below
The DOM is written in terms of an object model involving inheritance, but also permits a flattened model. FoX implements this flattened model - all objects descending from the Node are of the opaque type Node
. Nodes carry their own type, and attempts to call functions defined on the wrong nodetype (for example, getting the target
of a node which is not a PI) will result in a FoX_INVALID_NODE
exception.
The other types available through the FoX DOM are:
DOMConfiguration
DOMException
DOMImplementation
NodeList
NamedNodeMap
All DOM objects exposed to the user may only be manipulated through pointers. Attempts to access them directly will result in compile-time or run-time failures according to your environment.
This should have little effect on the structure of your programs, except that you must always remember, when calling a DOM function, to perform pointer assignment, not direct assignment, thus:
child => getFirstChild(parent)
and not
child = getFirstChild(parent)
Fortran offers no garbage collection facility, so unfortunately a small degree of memory handling is necessarily exposed to the user.
However, this has been kept to a minimum. FoX keeps track of all memory allocated and used when calling DOM routines, and keeps references to all DOM objects created.
The only memory handling that the user needs to take care of is destroying any
DOM Documents (whether created manually, or by the parse()
routine.) All other nodes or node structures created will be destroyed automatically by the relevant destroy()
call.
As a consequence of this, all DOM objects which are part of a given document will become inaccessible after the document object is destroyed.
Several additional utility functions are provided by FoX.
Firstly, to construct a DOM tree, from either a file or a string containing XML data.
parseFile
filename should be an XML document. It will be opened and parsed into a DOM tree. The parsing is performed by the FoX SAX parser; if the XML document is not well-formed, a PARSE_ERR
exception will be raised. configuration is an optional argument - see DOMConfiguration for its meaning.
parseString
XMLstring should be a string containing XML data. It will be parsed into a DOM tree. The parsing is performed by the FoX SAX parser; if the XML document is not well-formed, a PARSE_ERR
exception will be raised. configuration is an optional argument - see DOMConfiguration for its meaning.
Both parseFile
and parseString
return a pointer to a Node
object containing the Document Node.`
Secondly, to output an XML document:
serialize
This will open fileName
and serialize the DOM tree by writing into the file. If fileName
already exists, it will be overwritten. If an problem arises in serializing the document, then a fatal error will result.
(Control over serialization options is done through the configuration of the arg's ownerDocument, see below.)
Finally, to clean up all memory associated with the DOM, it is necessary to call:
destroy
This will clear up all memory usage associated with the document (or documentType) node passed in.
The standard DOM functions only deal with string data. When dealing with numerical (or logical) data, the following functions may be of use.
extractDataContent
extractDataAttribute
extractDataAttributeNS
These extract data from, respectively, the text content of an element, from one of its attributes, or from one of its namespaced attributes. They are used like so:
(where p
is an element which has been selected by means of the other DOM functions)
call extractDataContent(p, data)
The subroutine will look at the text contents of the element, and interpret according to the type of data
. That is, if data
has been declared as an integer
, then the contents of p
will be read as such an placed into data
.
data
may be a string
, logical
, integer
, real
, double precision
, complex
or double complex
variable.
In addition, if data
is supplied as a rank-1 or rank-2 variable (ie an array or a matrix) then the data will be read in assuming it to be a space- or comma-separated list of such data items.
Thus, the array of integers within the XML document:
<element> 1 2 3 4 5 6 </element>
could be extracted by the following Fortran program:
type(Node), pointer :: doc, p
integer :: i_array(6)
doc => parseFile(filename)
p => item(getElementsByTagName(doc, "element"), 0)
call extractDataContent(p, i_array)
For extracting data from text content, the example above suffices. For data in a non-namespaced attribute (in this case, a 2x2 matrix of real numbers)
<element att="0.1, 2.3 7.56e23, 93"> Some uninteresting text </element>
then use a Fortran program like:
type(Node), pointer :: doc, p
real :: r_matrix(2,2)
doc => parseFile(filename)
p => item(getElementsByTagName(doc, "element"), 0)
call extractDataAttribute(p, "att", r_matrix)
or for extracting from a namespaced attribute (in this case, a length-2 array of complex numbers):
<myml xmlns:ns="http://www.example.org">
<element ns:att="0.1,2.3 3.4e2,5.34"> Some uninteresting text </element>
</myml>
then use a Fortran program like:
type(Node), pointer :: doc, p
complex :: c_array(2)
doc => parseFile(filename)
p => item(getElementsByTagName(doc, "element"), 0)
call extractDataAttributeNS(p, &
namespaceURI="http://www.example.org", localName="att", &
data=c_array)
The extraction may fail of course, if the data is not of the sort specified, or if there are not enough elements to fill the array or matrix. In such a case, this can be detected by the optional arguments num
and iostat
.
num
will hold the number of items successfully read. Hopefully this should be equal to the expected number of items; but it may be less if reading failed for some reason, or if there were less items than expected in the element.
iostat
will hold an integer - this will be 0
if the extraction went ok; -1
if too few elements were found, 1
if although the read went ok, there were still some elements left over, or 2
if the extraction failed due to either a badly formatted number, or due to the wrong data type being found.
For all data types apart from strings, arrays and matrices are specified by space- or comma-separated lists. For strings, some additional options are available. By default, arrays will be extracted assuming that separators are spaces (and multiple spaces are ignored). So:
<element> one two three </element>
will result in the string array (/"one", "two", "three"/)
.
However, you may specify an optional argument separator
, which specifies another single-character separator to use (and does not ignore multiple spaces). So:
<element>one, two, three </element>
will result in the string array (/"one", " two", " three "/)
. (note the leading and trailing spaces).
Finally, you can also specify an optional logical argument, csv
. In this case, the separator
is ignored, and the extraction proceeds assuming that the data is a list of comma-separated values. (see: CSV)
setFoX_checks
This affects whether additional FoX-only checks are made (see DomExceptions below).
getFoX_checks
Retrieves the current setting of FoX_checks.
Note that FoX_checks can only be turned on and off globally, not on a per-document basis.
setLiveNodeLists
arg must be a Document Node. Calling this function affects whether any nodelists active on the document are treated as live - ie whether updates to the documents are reflected in the contents of nodelists (see DomLiveNodelists below).
getLiveNodeLists
Retrieves the current setting of liveNodeLists.
Note that the live-ness of nodelists is a per-document setting.
Exception handling is important to the DOM. The W3C DOM standards provide not only interfaces to the DOM, but also specify the error handling that should take place when invalid calls are made.
The DOM specifies these in terms of a DOMException
object, which carries a numeric code whose value reports the kind of error generated. Depending upon the features available in a particular computer language, this DOMException object should be generated and thrown, to be caught by the end-user application.
Fortran of course has no mechanism for throwing and catching exceptions. However, the behaviour of an exception can be modelled using Fortran features.
FoX defines an opaque DOMException
object.
Every DOM subroutine and function implemented by FoX will take an optional argument, 'ex', of type DOMException
.
If the optional argument is not supplied, any errors within the DOM will cause an immediate abort, with a suitable error message. However, if the optional argument is supplied, then the error will be captured within the DOMException
object, and returned to the caller for inspection. It is then up to the application to decide how to proceed.
Functions for inspecting and manipulating the DOMException
object are described below:
inException
: A function returning a logical value, according to whether ex
is in exception - that is, whether the last DOM function or subroutine, from which ex
returned, caused an error. Note that this will not change the status of the exception.
getExceptionCode
A function returning an integer value, describing the nature of the exception reported in ex
. If the integer is 0, then ex
does not hold an exception. If the integer is less than 200, then the error encountered was of a type specified by the DOM standard; for a full list, see below, and for explanations, see the various DOM standards. If the integer is 200 or greater, then the code represents a FoX-specific error. See the list below.
Note that calling getExceptionCode
will clean up all memory associated with the DOMException object, and reset the object such that it is no longer in exception.
Note that when an Exception is thrown, memory is allocated within the DOMException object. Calling getExceptionCode
on a DOMEXception will clean up this memory. If you use the exception-handling interfaces of FoX, then you must check every exception, and ensure you check its code, otherwise your program will leak memory.
The W3C DOM interface allows the creation of unserializable XML document in various ways. For example, it permits characters to be added to a text node which would be invalid XML. FoX performs multiple additional checks on all DOM calls to prevent the creation of unserializable trees. These are reported through the DOMException mechanisms noted above, using additional exception codes. However, if for some reason, you want to create such trees, then it is possible to switch off all FoX-only checks. (DOM-mandated checks may not be disabled.) To do this, use the setFoX_checks
function described in DomUtilityFunctions.
Note that FoX does not yet currently check for all ways that a tree may be made non-serializable.
The DOM specification requires that all NodeList objects are live - that is, that any change in the document structure is immediately reflected in the contents of any nodelists.
For example, any nodelists returned by getElementsByTagName or getElementsByTagNameNS must be updated whenever nodes are added to or removed from the document; and the order of nodes in the nodelists must be changed if the document structure changes.
Though FoX does keep all nodelists live, this can impose a significant performance penalty when manipulating large documents. Therefore, FoX can be instructed to inly use 'dead' nodelists - that is, nodelists which reflect a snapshot of the document structure at the point they were created. To do this, call setLiveNodeLists
(see API documentation).
However, note that the nodes within the nodelist remain live - any changes made to the nodes will be reflected in accessing them through the nodelist.
Furthermore, since the nodelists are still associated with the document, they and their contents will be rendered inaccessible when the document is destroyed.
Multiple valid DOM trees may be produced from a single document. When parsing input, some of these choices are made available to the user.
By default, the DOM tree presented to the user will be produced according to the following criteria:
However, if another tree is desired, the user may change this. For example, very often you would rather be working with the fully canonicalized tree, with all cdata sections replaced by text nodes and merged, and all entity references replaced with their contents.
The mechanism for doing this is the optional configuration
argument to parseFile
and parseString
. configuration
is a DOMConfiguration
object, which may be manipulated by setParameter
calls.
Note that FoX's implementation of DOMConfiguration
does not follow the specification precisely. One DOMConfiguration
object controls all of parsing, normalization and serialization. It can be used like so:
use FoX_dom
implicit none
type(Node), pointer :: doc
! Declare a new configuration object
type(DOMConfiguration), pointer :: config
! Request full canonicalization
! ie convert CDATA sections to text sections, remove all entity references etc.
config => newDOMConfig()
call setParameter(config, "canonical-form", .true.)
! Turn on validation
call setParameter(config, "validate", .true.)
! parse the document
doc => parseFile("doc.xml", config)
! Do a whole lot of DOM processing ...
! change the configuration to allow cdata-sections to be preserved.
call setParameter(getDomConfig(doc), "cdata-sections", .true.)
! normalize the document again
call normalizeDocument(doc)
! change the configuration to influence the output - make sure there is an XML declaration
call setParameter(getDomConfig(doc), "xml-declaration", .true.)
! and write the document out.
call serialize(doc)
! once everything is done, destroy the doc and config
call destroy(doc)
call destroy(config)
The available configuration options are fully explained in:
and are all implemented, with the exceptions of: error-handler
, schema-location
, and schema-type
.
In total there are 24 implemented configuration options (schema-location
and schema-type
are not
implemented). The options known by FoX are as follows:
canonical-form
default: false, can be set to true. See note below.cdata-sections
default: true, can be changed.check-character-normalization
default: false, cannot be changed.comments
default: true, can be changed.datatype-normalization
default: false, cannot be changed.element-content-whitespace
default: true, can be changed.entities
default: true, can be changed.error-handler
default: false, cannot be changed. This is a breach of the DOM specification.namespaces
default: true, can be changed.namespace-declarations
default: true, can be changed.normalize-characters
default: false, cannot be changed.split-cdata-sections
default: true, can be changed.validate
default: false, can be changed. See note below.validate-if-schema
default: false, can be changed.well-formed
default true, cannot be changed.charset-overrides-xml-encoding
default false, cannot be changed.disallow-doctype
default false, cannot be changed.ignore-unknown-character-denormalizations
default true, cannot be changed.resource-resolver
default false, cannot be changed.supported-media-types-only
default false, cannot be changed.discard-default-content
default: true, can be changed.format-pretty-print
default: false, cannot be changed.xml-declaration
default: true, can be changed.invalid-pretty-print
default: false, can be changed. This is a FoX specific extension which works like format-pretty-print
but does not preseve the validity of the document.Setting canonical-form
changes the value of entities
, cdata-sections
, discard-default-content
, invalid-pretty-print
, and xml-declaration
to false and changes namespaces
, namespace-declarations
, and element-content-whitespace
to true. Unsetting canonical-form
causes these options to revert to the defalt settings. Changing the values of any of these options has the side effect of unsetting canonical-form
(but does not cause the other options to be reset). Setting validate
unsets validate-if-schema
and vica versa.
Other issues
It was decided to implement W3C DOM interfaces primarily because they are specified in a language-agnostic fashion, and thus made Fortran implementation possible. A number of criticisms have been levelled at the W3C DOM, but many apply only from the perspective of Java developers. However, more importantly, the W3C DOM suffers from a lack of sufficient error checking so it is very easy to create a DOM tree, or manipulate an existing DOM tree into a state, that cannot be serialized into a legal XML document.
(Although the Level 3 DOM specifications finally addressed this issue, they did so in a fashion that was neither very useful, nor easily translatable into a Fortran API.)
Therefore, FoX will by default produce errors about many attempts to manipulate the DOM in such a way as would result in invalid XML. These errors can be switched off if standards-compliant behaviour is wanted. Although extensive, these checks are not complete.
In particular, the way the W3C DOM mandates namespace handling makes it trivial to produce namespace non-well-formed document trees, and very difficult for the processor to automatically detect the non-well-formedness. Thus a fully well-formed tree is only guaranteed after a suitable normalizeDocument
call.
FoX_utils
is a collection of general utility functions that the rest of FoX depends on, but which may be of independent use. They are documented here.
All functions are accessible from the FoX_utils
module.
NB Unlike the APIs of WXML, WCML, and SAX, the UTILS APIs may not remain constant between FoX versions. While some effort will be expended to ensure they don't change unnecessarily, no guarantees are made.
For any end-users interested in the code who are worried about interface changes, it is recommended that the relevant code (all found in the utils/
directory be lifted directly and imported into other projects, rather than accessed through the FoX interfaces.
Two sets of utility functions are provided; one concerned with UUIDs, and a set concerned with URIs.
UUIDs (see RFC 4122) are Universally Unique IDentifiers. They are a 128-bit number, represented as a 36-character string. For example:
f81d4fae-7dec-11d0-a765-00a0c91e6bf6
The intention of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. Thus, anyone can create a UUID and use it to identify something with reasonable confidence that the identifier will never be unintentionally used by anyone for anything else.
This property also makes them useful as Uniform Resource Names, to refer to a given document without requiring a position in a particular URI scheme. Thus the above UUID could be referred to as
urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6
UUIDs are used by WCML to ensure that every document generated has a unique ID. This enables users to go back later on and have confidence that they are examining the same document, regardless of where it might have ended up in file-system hierarchies or databases.
In addition, UUIDs come in several flavours, one of which stores the time of creation to 100-nanosecond accuracy. This can later be extracted (see, for example this service) to verify creation time.
This may well be useful for other XML document types, or indeed in non-XML applications. Thus, UUIDs may be generated by the following function, with one optional argument.
generate_UUID
This function returns a 36-character string containing the UUID.
version identifies the version of UUID to be used (see section 4.1.3 of the RFC). Only versions 0, 1, and 4 are supported. Version 0 generates a nil UUID; version 1 a time-based UUID, and version 4 a pseudo-randomly-generated UUID.
Version 1 is the default, and is recommended.
(Note: all pseudo-random-numbers are generated using the high-quality Mersenne Twister algorithm, using the Fortran implementation of Scott Robert Ladd.)
URIs (see RFC 2396) are Universal Resource Identifiers. A URI is a string, containing several components, which identifies a resource. Very often, this resource is a file, and the URI represents the local or network path to this file.
For example:
http://www.uszla.me.uk/FoX/DoX/index.html
is a URI pointing to the FoX documentation.
Equally, however:
FoX/configure
is a URI reference pointing to the FoX configure script (relative to the current directory, or base URI
).
A string which is a URI reference contains several components, some of which are optional.
scheme
- eg, http
authority
- eg, www.uszla.me.uk
path
- eg, /FoX/DoX/index.html
In addition, a URI reference may contain userinfo
, host
, port
, query
, and fragment
information. (see the RFC for full details.)
The FoX URI library provides the following features:
type(URI)
This is an opaque Fortran type which is used to hold URI information. The functions described below use this type.
parseURI
This takes one argument, a URI reference, and returns a pointer to a newly-allocated URI object.
If the string provided is not a valid URI reference, then a null pointer is returned; thus this function can be used to check whether a URI is valid.
expressURI
This takes one argument, a URI object, and returns the (fully-escaped) string representing that URI.
rebaseURI
This takes two arguments, both URI objects, and returns a pointer to a third URI object. It calculates the location of the second URI with reference to the first.
Thus, if the first URI were /FoX/DoX
, and the second ../DoX2/index.html
, then the resulting URI would be /FoX/DoX2/index.html
destroyURI
This takes one argument, a pointer to a URI object, and clears up all memory associated with it.For each component a URI might have (scheme
, authority
, userinfo
, host
, port
, path
, query
, fragment
) there are two functions for extracting the component:
hasXXX
will return a logical variable according to whether the component is defined. (except for path
which is always defined, but may be empty)
getXXX
will return a string containing the value of the component. (except for port
which is returned as an integer.
Thus, listing these functions in full:
hasScheme
Is there a scheme associated with the URI?
getScheme
Return the value of the scheme
hasAuthority
Is there an authority associated with the URI?
getAuthority
Return the value of the authority
hasUserinfo
Is there userinfo associated with the URI?
getUserinfo
Return the value of the userinfo
hasHost
Is there a host associated with the URI?
getHost
Return the value of the host
hasPort
Is there a port associated with the URI?
getPort
Return the value of the port
getPath
Return the value of the path
hasQuery
Is there a query associated with the URI?
getQuery
Return the value of the query
hasFragment
Is there a fragment associated with the URI?
getFragment
Return the value of the fragment
FoX evolved from the initial codebase of xmlf90, which was written largely by Alberto Garcia <albertog@icmab.es> and Jon Wakelin <jon.wakelin@bristol.ac.uk>.
FoX is the work of Toby White <tow@uszla.me.uk>, and all bug reports/complaints/bouquets of roses should be sent to him. Andrew Walker <andrew.walker@bristol.ac.uk> currently looks after maintenance of FoX.
There is a FoX website at http://www1.gly.bris.ac.uk/~walker/FoX/.
There is also a mailing list for announcements/queries/bug reports. Information on how to subscribe may be found at http://groups.google.com/group/fox-discuss/. The archive of an older mailing list can be found at http://www.uszla.me.uk/pipermail/fox/.
This manual is © Toby White 2006-2008 with additional modifications by Andrew Walker 2008-2010.
FoX is licensed under the agreement below. This is intended to make it as freely available as possible, subject only to retaining copyright notices and acknowledgements.
If for any reason this license causes issues with your intended use of the code, please contect the author.
The license can also be found within the distributed source, in the file FoX/LICENSE
Copyright:
© 2003, 2004, Alberto Garcia, Jon Wakelin
© 2005-2008, Toby White
© 2007-2009, Gen-Tao Chiang
© 2008-2012, Andrew Walker
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
In addition, FoX includes a random number library, written by Scott Robert Ladd, which is licensed as follows:
! This computer program source file is supplied "AS IS". Scott Robert
! Ladd (hereinafter referred to as "Author") disclaims all warranties,
! expressed or implied, including, without limitation, the warranties
! of merchantability and of fitness for any purpose. The Author
! assumes no liability for direct, indirect, incidental, special,
! exemplary, or consequential damages, which may result from the use
! of this software, even if advised of the possibility of such damage.
!
! The Author hereby grants anyone permission to use, copy, modify, and
! distribute this source code, or portions hereof, for any purpose,
! without fee, subject to the following restrictions:
!
! 1. The origin of this source code must not be misrepresented.
!
! 2. Altered versions must be plainly marked as such and must not
! be misrepresented as being the original source.
!
! 3. This Copyright notice may not be removed or altered from any
! source or altered source distribution.
!
! The Author specifically permits (without fee) and encourages the use
! of this source code for entertainment, education, or decoration. If
! you use this source code in a product, acknowledgment is not required
! but would be appreciated.