Outputs parsing
This section describes the general approach used for the parsing of outputs produced by TURBOMOLE.
In particular here we focus mainly on the text output that is written in the stdout
. In the following
we will consider that this output is written in a file name name_of_the_executable.log
.
Since some outputs are stored in files structured with data groups you can also use the DataGroups
object described in The data group files. The information from there will only be just parsed as
strings though and most of information should be extracted from the log files using the objects
described below.
Lastly we will also deal below with the methods to extract the list of states from the outputs of a calculation. This does not rely on the parsing of the log files.
The log files
Quick Start
Before digging in the details of how the objects are organized it will be good to have a quick look at how you can use the objects available and a few examples of the data that they can extract. If you instead prefer to have an idea about the structure of the object you can first check the Overall Structure section and then come back here to the examples.
The most important method to parse your output file is the from_file
class method. All the data
and file objects can be created using it.
Starting with a basic example of an RIDFT calculation, for which the ridft.log
file is the
output of the calculation, you can extract the relevant data (in the form of a data object) in
the same manner. For example, the final energy (as well as its decomposition into its different
contributions) is obtained using the turbomoleio.output.data.ScfEnergiesData
object:
>>> from turbomoleio.output.data import ScfEnergiesData
>>> energy_data = ScfEnergiesData.from_file('ridft.log')
>>> energy_data
<turbomoleio.output.data.ScfEnergiesData at 0x7fb15f425eb8>
>>> print(energy_data.total_energy)
-0.49586861429
>>> print(energy_data.virial_theorem)
1.42945116692
Similarly, the complete information of this same RIDFT calculation can be gathered in an ScfOutput
object. This object contains a series of basic data objects (such as the above ScfEnergiesData
).
An instance of ScfOutput
can be generated using the following procedure:
>>> from turbomoleio.output.files import ScfOutput
>>> scf_output = ScfOutput.from_file('ridft.log')
You can access the basic data objects through different attributes. For example, you have access
to the above ScfEnergiesData
object from the energies
attribute:
>>> print(scf_output.energies.total_energy)
-0.49586861429
You can also have access to the SCF iterations of the calculations:
>>> iterations = scf_output.scf.iterations
>>> iterations
<turbomoleio.output.data.ScfIterationData at 0x7fd37d6282b0>
So-called helper functions are available for the different data objects. For example, from
the ScfIterationData
object above, you can directly have a plot of the convergence:
>>> iterations.plot_energies()
Overall Structure
The code that performs the parsing is organized on three levels.
At the lowest level there is a generic turbomoleio.output.parser.Parser
object that
takes the string of the file and is only focused on the extraction of information.
This has several properties, each one tailored to parse specific subsections of the text and extract the relevant data
using regular expressions. The properties return a dictionary with the data that have
been extracted or None, if the section that it should parse could not be found in the string.
In addition these properties are lazy, in the sense that they will store the output before returning it
and if called again they will not repeat the parsing.
The Parser
object, although it can be easily used to extract some punctual information, is probably less
useful to the generic user, who would mostly interact with the higher level objects.
The second level is given by a series of common data objects (see Common data objects). These objects
describe single pieces of information that could be gathered together based on similarity and
type of information (e.g. data related to basis set, data related to Cosmo, …).
They rely on the Parser
to extract the data (can call one or more of its methods) and store them
in a systematic way as attributes, so that they can be accessed easily.
The root method for initializing the data objects is a from_parser
class method that takes a Parser as an input.
The user however is much more likely to use the class method from_file
, that takes the path to a
TURBOMOLE output file as an input.
The top level is given by the File data objects, that are designed to parse the outputs files
produced by the different kinds of TURBOMOLE executables. The objects contain as attributes different
instances of the data objects, thus collecting in a single place all the information that can be
extracted from a specific output file. These file objects can also be easily created with the
from_file
method.
Note that, like most of the other objects in turbomoleio, data and files objects are all MSONable
(see monty documentation). This means that they
can be converted to and generated from a dictionary with the as_dict
and from_dict
methods.
This should be enough to understand how to interact with these objects from the user side. If you need more information about the internal implementation or you wish to implement the parsing of additional quantities you should check out the Output logs parsing section of the developer guide.
Data and File Objects
The data and file objects are found in the turbomoleio.output.data
and turbomoleio.output.files
respectively. Here we provide a list of all the objects in the modules along with a quick description
of the data that they contain. You can check the API documentation of each of them for more details
about their content.
Common data objects
turbomoleio.output.data.TurbomoleData
Turbomole version and executable used.
turbomoleio.output.data.RunData
Information about where the calculation was executed and the timings.
turbomoleio.output.data.BasisData
Basis sets used for the calculation.
turbomoleio.output.data.CosmoData
Information about the use of cosmo.
turbomoleio.output.data.SymmetryData
Information on the symmetry of the molecule.
turbomoleio.output.data.FunctionalData
Exchange-correlation functional.
turbomoleio.output.data.RiData
Information about the use of the Resolution of Identity approach.
turbomoleio.output.data.DispersionCorrectionData
Dispersion correction used in the calculation.
turbomoleio.output.data.DFTData
Information about a dft calculation (composed of FunctionalData, RiData, DispersionCorrectionData and grids size information).
turbomoleio.output.data.ScfIterationData
Details about the iterations in a scf calculation.
turbomoleio.output.data.ScfData
Information about options and operations in an scf calculation (contains ScfIterationData and other information such as DIIS, virtual orbital shift, convergence criteria, …).
turbomoleio.output.data.ScfEnergiesData
Final energies and different contributions obtained from an scf calculation.
turbomoleio.output.data.ElectrostaticMomentsData
Electrostatic moments (charge, dipole and quadrupole).
turbomoleio.output.data.GeometryData
Geometry of the system: molecule and centers of mass/charge.
turbomoleio.output.data.SpinData
Information about the spin in the calculation.
turbomoleio.output.data.SmearingData
Information about the smearing.
turbomoleio.output.data.IntegralData
Thresholds for integrals.
turbomoleio.output.data.EscfIterationData
Details about the iterations in an escf calculation.
turbomoleio.output.data.EscfData
Output of an escf calculation.
turbomoleio.output.data.StatptData
Initial information provided in statpt.
turbomoleio.output.data.RelaxData
Initial information provided in relax.
turbomoleio.output.data.RelaxGradientsData
Gradient values extracted from the relax/stapt output.
turbomoleio.output.data.RelaxConvergenceData
Final information about convergence.
turbomoleio.output.data.AoforceNumericalIntegrationData
Information about the numerical integration in aoforce.
turbomoleio.output.data.AoforceRotationalData
Analysis of rotational states in aoforce.
turbomoleio.output.data.AoforceVibrationalData
Analysis of vibrational states in aoforce.
turbomoleio.output.data.MP2Data
Information about an MP2 calculation.
turbomoleio.output.data.MP2Results
Results from an MP2 calculation.
turbomoleio.output.data.PeriodicityData
Information about the periodicity of the calculation.
File data objects
turbomoleio.output.files.ScfOutput
Data from a dscf, ridft or riper calculations.
turbomoleio.output.files.EscfOutput
Data from an escf calculation (contains some data about the previous scf calculation).
turbomoleio.output.files.EscfOnlyOutput
Data from an escf calculation (data only related to escf).
turbomoleio.output.files.GradOutput
Data from a grad or rdgrad calculation.
turbomoleio.output.files.EgradOutput
Data from an egrad calculation (contains both the “grad”-related and “escf”-related data).
turbomoleio.output.files.RelaxOutput
Data from a relax calculation.
turbomoleio.output.files.StatptOutput
Data from a statpt calculation.
turbomoleio.output.files.AoforceOutput
Data from an aoforce calculation.
turbomoleio.output.files.Ricc2Output
Data from an Ri CC2 calculation.
turbomoleio.output.files.MP2Output
Data from an MP2 calculation (mpgrad, ricc2 or pnoccsd).
turbomoleio.output.files.JobexOutput
Data from the last step of a jobex calculation.
The States object
The list of eigenstates of a molecule with their occupation can normally be extracted from the
TURBOMOLE outputs using the eiger
script. In turbomoleio a similar code has been implemented
that uses the content of the different data groups in the control
file (possibly accessing
subfiles linked in control
) to build an instance of turbomoleio.output.states.States
.
This is a subclass of collections.abc.MutableSequence
containing a list of
turbomoleio.output.states.State
. The states are sorted in ascending order based on the
eigenvalues and each State
contains the information about the eigenvalue, the irreducible
representation, the index associated with the irreducible representation, the occupation and,
for UHF calculations, the spin.
Similarly to the other output parsing objects it can be instantiated using the from_file
class method
states = States.from_file("control")
Here, in general, the control file alone is not enough and the code needs to access
the $scfmo
, $uhfmo_alpha
and $uhfmo_beta
datagroups, that are usually stored
in external files, to read the eigenvalues. If the files are not available the generation
of the object fails.
The States
object offer various methods to extract further information from the list
of states, like the gap
or the has_hole
to determine if in the list of eigenstates
there is one empty state with energy lower than some occupied state. You can check the
rest of the API to have a list of the methods available.