> ./configure > make
FoX (Fortran XML) library.
Written in pure Fortran 95 (no compiler extensions, no cross-language issues). Tested against all major Fortran compilers.
Understands and provides interfaces to all of XML 1.0 and 1.1, and XML Namespaces 1.0 and 1.1.
BSD licensed (i.e. free for any commercial or non-commercial usage)
Full documentation is available. This document is a short tutorial on how to adapt an atomistic simulation code to use FoX for CML output.
FoX includes the following modules:
WXML
XML writing. Emits guaranteed well-formed XML. (Also performs limited validity checking if a DTD is part of the output.) Minimal state is kept in memory, allowing documents of arbitrary size to be produced from long-running jobs.
SAX
Interface modelled strongly after Java SAX2 interface. Supports all 35 callbacks. Understands internal & external entities (general and parameter) Tested on official W3C testsuite; currently passes all well-formedness and validity tests. (with the exception of any test involving non-ASCII characters; Fortran does not support I/O of any characters outside the ASCII set.)
DOM
Mapping of the W3C DOM interface to Fortran. Supports all of Core Levels 1 & 2 and most of Level 3 (including partial LS support). Passes 100% of the relevant portions of the W3C DOM testsuite.
Both WXML and DOM include extension functions tailored specifically for the Fortran programmer, with transparent support for all 7 native Fortran datatypes (character, logical, integer, real, double precision, complex, complex double), each in scalar, array and matrix form.
On top of the basic library, there are available (XML) language-specific front-ends. One of these is WCML, which writes out CMLComp.
These language-specific frontends aim to hide all of the complexity of XML from the end-user, and permit the production of guaranteed well-formed and valid XML fragments, without requiring any knowledge from the Fortran programmer.
A full (though still under development) specification of the CMLComp document format can be found at the CMLComp website. In this document, though, I'll show how to build a CMLComp document using the WCML calls, which requires little to no knowledge of the underlying format - just the ability to write Fortran.
Compilation of FoX is in most cases very easy; simply do:
> ./configure > make
To compile a code using FoX, then simply add the following to your FFLAGS (or equivalent)
FFLAGS="$FFLAGS `FoX/FoX-config --fcflags`"
and the following to your LDFLAGS (or equivalent)
LDFLAGS="$LDFLAGS `FoX/FoX-config --libs`"
Full instructions on how the configure/compile process works is available in the FoX manual.
A minimal CML output program would look something like this:
program CMLoutput use FoX_wcml ! to use any of the WCML apparatus type(xmlf_t) :: xf ! an opaque filetype call cmlBeginFile(xf, filename="output.cml", unit=22) ! open a CML file called "output.cml" on unit 22. call cmlStartCml(xf) ! output CML preamble call cmlEndCml(xf) ! output CML post-amble call cmlFinishFile(xf) ! Close the file end program CMLoutput
This program will open a file, write the minimal CML preamble, and then close the file, ensuring all tags are well-balanced.
dictRefs are the mechanism by which Golem works. You don't need to know much about how, but you do need to do the following at the start of your CML file:
subroutine writeCmlPreamble(xf, filename) call cmlBeginFile(xf, filename="output.cml", unit=22) call cmlAddNamespace(mainXML, prefix='siesta', & nsURI='http://www.uam.es/siesta/namespace') call cmlAddNamespace(mainXML, prefix='siestaUnits', & nsURI='http://www.uam.es/siesta/namespace/units') call cmlStartCml(xf) end subroutine writeCmlPreamble
That is: immediately before you issue the StartCml call, you need to add two namespaces: one for your code, one for your units.
The prefix you use is up to you, but will conventionally be the name of your code. The nsURI should be a URL - that need not exist - but which is at an address you control.
Having done that, then you can use dictRefs as mentioned in multiple places below; you need to consistently use the prefixes registered in these calls, but you never need to use the nsURI anywhere else.
In reality you will also want to output: firstly some metadata about the program:
subroutine outputMetadata(xf) call cmlStartMetadataList(xf) call cmlAddMetadata(xf, name='Program', content='Siesta') call cmlAddMetadata(xf, name='Version', content=version_str) if (nodes>1) then call cmlAddMetadata(xf, name='Mode', content='Parallel') else call cmlAddMetadata(xf, name='Mode', content='Serial') endif call cmlAddMetadata(xf, name='StartTime',content=datestring()) call cmlEndMetadataList(xf) end subroutine outputMetadata
That should be fairly self-explanatory. The output will look like:
<metadataList> <metadata name="Program" content="Siesta"/> <metadata name="Version" content="siesta@uam.es--2006/siesta-devel--reference--2.4--patch-3"/> <metadata name="StartTime" content="2007-11-16T14-21-34"/> <metadata name="Mode" content="Serial"/> </metadataList>
Secondly, you will want to mark up a list of input parameters:
subroutine outputParameters(xf, inputParameters ...) call cmlStartParameterList(xf, "General parameters") call cmlAddParameter(xf, name='System Name', & value=systemName, & dictRef='siesta:systemname') call cmlAddParameter(xf, name='Mesh Cut-off', & value=cutoff, units='siestaUnits:Ry',& dictRef='siesta:cutoff') call cmlAddParameter(xf, name='Energy Tolerance', & value=etol, units='siestaUnits:eV', & dictRef='siesta:etol') call cmlAddParameter(xf, name='Use Pulay Mixing', & value=l_pulay, & dictRef='siesta:l_pulay') call cmlEndParameterList(xf) end subroutine outputParameters
Things to note:
you can (optionally) name the parameter list (and have more than one if you like, in order to categorize your parameters)
the "value" of each parameter can be any Fortran datatype, and can equally well be a scalar, array, or matrix.
where the value is a numerical quantity, units are required. The string used for units should be unitsPrefixGivenEarlier:nameOfUnit The program will refuse to compile if no units are given.
Each parameter should have a "dictRef". This will be of the form: `prefixGivenEarlier:nameOfVariable". Golem needs this.
The CML that is output by the above is fairly verbose. For the sake of completeness, it is:
<parameterList xmlns:myDict="http://www.example.com/dict"> <parameter dictRef="siesta:systemname" name="System Name"> <scalar dataType="xsd:string">1,3,5-trichloro-dibenzofuran</scalar> </parameter> <parameter dictRef="siesta:cutoff" name="Mesh Cut-off"> <scalar dataType="xsd:integer" units="siestaUnits:Ry">150</scalar> </parameter> <parameter dictRef="siesta:etol" name="Energy Tolerance"> <scalar dataType="fpx:real" units="siestaUnits:eV">3.50000e0</scalar> </parameter> <parameter dictRef="siesta:l_pulay" name="Use Pulay Mixing"> <scalar dataType="xsd:boolean">true</scalar> </parameter> </parameterList>
You will also want to output the values of calculated properties. This is done in a manner very similar to input parameters:
subroutine outputProperties(xf, ...) call cmlStartPropertyList(xf, title="Forces") call cmlAddProperty(xf, value=fa, & dictref='siesta:forces', units='siestaUnits:evpa') ! fa is a 3xn matrix of all forces call cmlAddProperty(xf, value=ftot, & dictref='siesta:ftot', units='siestaUnits:evpa') ! ftot is a length-n array of the total force on each atom call cmlAddProperty(xf, value=fmax, & dictref='siesta:fmax', units='siestaUnits:evpa') ! fmax is a scalar, the maximum force on any one atom. call cmlAddProperty(xf, value=fres, & dictref='siesta:fres', units='siestaUnits:evpa') call cmlAddProperty(xf, value=cfmax, & dictref='siesta:cfmax', units='siestaUnits:evpa') call cmlEndPropertyList(mainXML) end subroutine outputProperties
These work exactly the same, and the XML output will be similar, only it will have <property> tags not <parameter>
Again:
propertyLists can be named.
the "value" argument can be any Fortran datatype.
numerical datatypes require units.
dictRefs on every property.
You will also want to output atomic structures occasionally. This is done with the cmlAddMolecule call.
Note
|
In CMLComp, a "molecule" represents nothing more than a collection of atoms and their 3D coords. There is no connotation that it must represent a chemically-meaningful structure, it may be a configuration of atoms halfway through a simulation.) |
call cmlAddMolecule(xf, <coordinates>, elements=elem)
<coordinates> can be specified in two ways:
A single matrix
call cmlAddMolecule(xf, coords=xyz, elements=elem)
where the coords= argument is a 3xn matrix of reals or double precisions.
Three arrays
call cmlAddMolecule(xf, x=xarray, y=yarray, z=zarray, elements=elem)
where each of x, y, and z are length-n arrays of reals or double precisions.
In each case, elements= must be a length-n array of strings, representing the element names.
Note that by default, the coordinates are assumed to be Cartesian, and given in Angstrom.
If they are in different units, this can be specified with an optional units= argument. If they are fractional rather than Cartesian, this can be specified with style=xfrac.
You may want to sometimes output a lattice/crystal information. For simulation programs this is most often done to represent the size and shape of the box representing the periodic boundary conditions. If your simulation works without explicit boundary conditions, you may not be interested in representing this.
If you need it, you can specify the box either with traditional crystal parameters, (a, b, c, alpha, beta, gamma), or as a set of lattice vectors.
call cmlAddCrystal(xf, a, b, c, alpha, beta, gamma)
call cmlAddLattice(xf, cell)
For AddLattice, the cell is the 3x3 matrix representing the lattice vectors.
The default units in both cases are Angstrom for length, and degrees for angles. A units= argument can be specified if this needs to be changed.
The markup for eigen-information described is initially aimed only at solid-state simulation; and thus focusses on electronic or phononic structure, calculate by k(q)-point.
Eigen information is expected to be found within a kpointList, containing multiple kpoints.
Each kpoint carries attributes describing the reciprocal-space coordinates of that kpoint, and its weight. Any information on eigenvalues will be found within the relevant kpoint section.
The simplest case is where you are really only interested in recording the existence of the kpoint, and you are not interested in any properties it has. In this case, the Fortran below is all that is required.
call cmlStartKpointList(xf) ! start the kpoint/eigen information do i = 1, n_kpoints call cmlAddKpoint(xf, coords=kpt(:,i)) enddo call cmlEndKpointList(xf)
The second common case is where you want to output, for each kpoint, a list of anonymous eigenvalues (i.e. there are no labels for the bands).
call cmlStartKpointList(xf) ! start the kpoint/eigen information do i = 1, n_kpoints call cmlStartKpoint(xf, coords=kpt(:,i)) ! start the information relating to this kpoint call cmlAddBandList(xf, values=eigenvalues(:,i), spin="up") ! output a list of the eigenvalues for all bands ! which optionally may have a spin call cmlEndKpoint(xf) enddo call cmlEndKpointList(xf)
Thirdly, when you don't want to output all of the eigenvectors, but you do want to output the eigenvalues for each band in each kpoint, possibly labelled by band, and with optional accompanying information.
In this case, something like the following sequence of calls is necessary:
call cmlStartKpointList(xf) ! start the kpoint/eigen information do i = 1, n_kpoints call cmlStartKpoint(xf, coords=kpt(:,i)) ! start the information relating to this kpoint do j = 1, n_nbands call cmlStartBand(xf, label(j), spin) ! start the information relating to the this band, ! which optionally may have a spin call cmlAddEigenValue(xf, eigval(j), units) ! Output the eigenvalue for this band ! Perhaps add more information call cmlEndBand(xf) enddo call cmlEndKpoint(xf) enddo call cmlEndKpointList(xf)
Finally, in the case where maximal information is to be output (usually for phonon calculations, then both the eigenvalue, and the full eigenvector need to be stored for each band, at each kpoint. the following series of calls should be made:
call cmlStartKpointList(xf) ! start the kpoint/eigen information do i = 1, n_kpoints call cmlStartKpoint(xf, coords=kpt(:,i)) ! start the information relating to this kpoint ! (kpt is a 3*n_kpoints real matrix) do j = 1, n_bands call cmlStartBand(xf, label(j), spin) ! start the information relating to the first band, ! which optionally may have a spin call cmlAddEigenValueVector(xf, eigval(j), eigvec(:,:,j), units) ! Output all the data on this eigenenergy - its value (and units), and its associated eigenvector (a 3*N size real or complex matrix) ! Optionally, output any further information relevant to this band at this kpoint ! eg IR frequencies at the gamma point. call cmlEndBand(xf) ! Optionally, output the other spin of this band enddo call cmlEndKpoint(xf) enddo call cmlEndKpointList(xf)
To encode information about the symmetry of your system, there is a cmlAddSymmetry call. This can be used at two levels of detail.
Firstly, encoding just the point group or spacegroup of your system:
call cmlAddSymmetry(xf, spaceGroup="P -4 21 m") call cmlAddSymmetry(xf, pointGroup="C2v")
By convention, all space groups are assumed to be in Hermann-Mauguin notation, and all point groups in Schoenflies notation. (using the standard translation to ASCII in both cases).
Sometimes you may also want to encode all the symmetry operators explicitly. In this case, the cmlAddSymmetry calls require the matrices for the symmetry operations (and symmetry displacements for space groups). Symmetry operations must be supplied as a rank-3 array, of n (3x3) matrices (ie a 3x3xn array), and symmetry displacements must be supplied as a rank-2 array of n length-3 vectors (ie a 3xn array).
call cmlAddSymmetry(xf, sym_ops=symopMatrices, sym_disps=symDispArrays)
Of course, these can be combined, and you can write out both symmetry group symbol, and full symmetry operations.
call cmlAddSymmetry(xf, pointGroup="C2v", sym_ops=symopMatrices)
In the very simplest case, a simulation output file can simply involve:
metadata
input parameters
output properties
However, it is often more useful to include information about the progress of the simulation (so you can, for example, follow the progress of the total energy towards convergence in a CG calculation; or watch the variation in temperature in an MD calculation.
If you are outputting atomic structures, it's also useful to distinguish initial structures from final structures. (and mere order in the output file is not a useful way to do this in XML.)
To accomplish this, we have modules and steps. Steps divide up the output sequentially.
So, at the beginning of every step, you do
call cmlStartStep(xf, index=i)
where i is the index of the step.
And at the end of that step, you do:
call cmlEndStep(xf)
Any properties or structures you output in the middle of a step will then automatically be marked up as being part of that step.
Furthermore, you can specifically mark up initial and final sets of structures, or properties, etc., by using a module. So, at the end of your simulation, you should do:
call cmlStartModule(xf, title="Finalization") ! output lots of properties, a final structure, anything else, etc. call cmlEndModule(xf)
You can also mark up your initial structure by wrapping it in a module with a title of Initial System.
You can then end up with a full document structure that looks something like:
metadata
parameters
initial system (containing a structure)
step by step property output
final system (containing a structure and any final properties.
But even if you don't get this far, and you only mark up metadata/parameters/properties, then Golem can still make use of your output.