Outputs parsing
This section describes the general approach used for the parsing of outputs produced by TURBOMOLE.
In particular here we focus mainly on the text output that is written in the stdout. In the following
we will consider that this output is written in a file name name_of_the_executable.log.
Since some outputs are stored in files structured with data groups you can also use the DataGroups
object described in The data group files. The information from there will only be just parsed as
strings though and most of information should be extracted from the log files using the objects
described below.
Lastly we will also deal below with the methods to extract the list of states from the outputs of a calculation. This does not rely on the parsing of the log files.
The log files
Quick Start
Before digging in the details of how the objects are organized it will be good to have a quick look at how you can use the objects available and a few examples of the data that they can extract. If you instead prefer to have an idea about the structure of the object you can first check the Overall Structure section and then come back here to the examples.
The most important method to parse your output file is the from_file class method. All the data
and file objects can be created using it.
Starting with a basic example of an RIDFT calculation, for which the ridft.log file is the
output of the calculation, you can extract the relevant data (in the form of a data object) in
the same manner. For example, the final energy (as well as its decomposition into its different
contributions) is obtained using the turbomoleio.output.data.ScfEnergiesData object:
>>> from turbomoleio.output.data import ScfEnergiesData
>>> energy_data = ScfEnergiesData.from_file('ridft.log')
>>> energy_data
<turbomoleio.output.data.ScfEnergiesData at 0x7fb15f425eb8>
>>> print(energy_data.total_energy)
-0.49586861429
>>> print(energy_data.virial_theorem)
1.42945116692
Similarly, the complete information of this same RIDFT calculation can be gathered in an ScfOutput
object. This object contains a series of basic data objects (such as the above ScfEnergiesData).
An instance of ScfOutput can be generated using the following procedure:
>>> from turbomoleio.output.files import ScfOutput
>>> scf_output = ScfOutput.from_file('ridft.log')
You can access the basic data objects through different attributes. For example, you have access
to the above ScfEnergiesData object from the energies attribute:
>>> print(scf_output.energies.total_energy)
-0.49586861429
You can also have access to the SCF iterations of the calculations:
>>> iterations = scf_output.scf.iterations
>>> iterations
<turbomoleio.output.data.ScfIterationData at 0x7fd37d6282b0>
So-called helper functions are available for the different data objects. For example, from
the ScfIterationData object above, you can directly have a plot of the convergence:
>>> iterations.plot_energies()
Overall Structure
The code that performs the parsing is organized on three levels.
At the lowest level there is a generic turbomoleio.output.parser.Parser object that
takes the string of the file and is only focused on the extraction of information.
This has several properties, each one tailored to parse specific subsections of the text and extract the relevant data
using regular expressions. The properties return a dictionary with the data that have
been extracted or None, if the section that it should parse could not be found in the string.
In addition these properties are lazy, in the sense that they will store the output before returning it
and if called again they will not repeat the parsing.
The Parser object, although it can be easily used to extract some punctual information, is probably less
useful to the generic user, who would mostly interact with the higher level objects.
The second level is given by a series of common data objects (see Common data objects). These objects
describe single pieces of information that could be gathered together based on similarity and
type of information (e.g. data related to basis set, data related to Cosmo, …).
They rely on the Parser to extract the data (can call one or more of its methods) and store them
in a systematic way as attributes, so that they can be accessed easily.
The root method for initializing the data objects is a from_parser class method that takes a Parser as an input.
The user however is much more likely to use the class method from_file, that takes the path to a
TURBOMOLE output file as an input.
The top level is given by the File data objects, that are designed to parse the outputs files
produced by the different kinds of TURBOMOLE executables. The objects contain as attributes different
instances of the data objects, thus collecting in a single place all the information that can be
extracted from a specific output file. These file objects can also be easily created with the
from_file method.
Note that, like most of the other objects in turbomoleio, data and files objects are all MSONable
(see monty documentation). This means that they
can be converted to and generated from a dictionary with the as_dict and from_dict methods.
This should be enough to understand how to interact with these objects from the user side. If you need more information about the internal implementation or you wish to implement the parsing of additional quantities you should check out the Output logs parsing section of the developer guide.
Data and File Objects
The data and file objects are found in the turbomoleio.output.data and turbomoleio.output.files
respectively. Here we provide a list of all the objects in the modules along with a quick description
of the data that they contain. You can check the API documentation of each of them for more details
about their content.
Common data objects
turbomoleio.output.data.TurbomoleDataTurbomole version and executable used.
turbomoleio.output.data.RunDataInformation about where the calculation was executed and the timings.
turbomoleio.output.data.BasisDataBasis sets used for the calculation.
turbomoleio.output.data.CosmoDataInformation about the use of cosmo.
turbomoleio.output.data.SymmetryDataInformation on the symmetry of the molecule.
turbomoleio.output.data.FunctionalDataExchange-correlation functional.
turbomoleio.output.data.RiDataInformation about the use of the Resolution of Identity approach.
turbomoleio.output.data.DispersionCorrectionDataDispersion correction used in the calculation.
turbomoleio.output.data.DFTDataInformation about a dft calculation (composed of FunctionalData, RiData, DispersionCorrectionData and grids size information).
turbomoleio.output.data.ScfIterationDataDetails about the iterations in a scf calculation.
turbomoleio.output.data.ScfDataInformation about options and operations in an scf calculation (contains ScfIterationData and other information such as DIIS, virtual orbital shift, convergence criteria, …).
turbomoleio.output.data.ScfEnergiesDataFinal energies and different contributions obtained from an scf calculation.
turbomoleio.output.data.ElectrostaticMomentsDataElectrostatic moments (charge, dipole and quadrupole).
turbomoleio.output.data.GeometryDataGeometry of the system: molecule and centers of mass/charge.
turbomoleio.output.data.SpinDataInformation about the spin in the calculation.
turbomoleio.output.data.SmearingDataInformation about the smearing.
turbomoleio.output.data.IntegralDataThresholds for integrals.
turbomoleio.output.data.EscfIterationDataDetails about the iterations in an escf calculation.
turbomoleio.output.data.EscfDataOutput of an escf calculation.
turbomoleio.output.data.StatptDataInitial information provided in statpt.
turbomoleio.output.data.RelaxDataInitial information provided in relax.
turbomoleio.output.data.RelaxGradientsDataGradient values extracted from the relax/stapt output.
turbomoleio.output.data.RelaxConvergenceDataFinal information about convergence.
turbomoleio.output.data.AoforceNumericalIntegrationDataInformation about the numerical integration in aoforce.
turbomoleio.output.data.AoforceRotationalDataAnalysis of rotational states in aoforce.
turbomoleio.output.data.AoforceVibrationalDataAnalysis of vibrational states in aoforce.
turbomoleio.output.data.MP2DataInformation about an MP2 calculation.
turbomoleio.output.data.MP2ResultsResults from an MP2 calculation.
turbomoleio.output.data.PeriodicityDataInformation about the periodicity of the calculation.
File data objects
turbomoleio.output.files.ScfOutputData from a dscf, ridft or riper calculations.
turbomoleio.output.files.EscfOutputData from an escf calculation (contains some data about the previous scf calculation).
turbomoleio.output.files.EscfOnlyOutputData from an escf calculation (data only related to escf).
turbomoleio.output.files.GradOutputData from a grad or rdgrad calculation.
turbomoleio.output.files.EgradOutputData from an egrad calculation (contains both the “grad”-related and “escf”-related data).
turbomoleio.output.files.RelaxOutputData from a relax calculation.
turbomoleio.output.files.StatptOutputData from a statpt calculation.
turbomoleio.output.files.AoforceOutputData from an aoforce calculation.
turbomoleio.output.files.Ricc2OutputData from an Ri CC2 calculation.
turbomoleio.output.files.MP2OutputData from an MP2 calculation (mpgrad, ricc2 or pnoccsd).
turbomoleio.output.files.JobexOutputData from the last step of a jobex calculation.
The States object
The list of eigenstates of a molecule with their occupation can normally be extracted from the
TURBOMOLE outputs using the eiger script. In turbomoleio a similar code has been implemented
that uses the content of the different data groups in the control file (possibly accessing
subfiles linked in control) to build an instance of turbomoleio.output.states.States.
This is a subclass of collections.abc.MutableSequence containing a list of
turbomoleio.output.states.State. The states are sorted in ascending order based on the
eigenvalues and each State contains the information about the eigenvalue, the irreducible
representation, the index associated with the irreducible representation, the occupation and,
for UHF calculations, the spin.
Similarly to the other output parsing objects it can be instantiated using the from_file
class method
states = States.from_file("control")
Here, in general, the control file alone is not enough and the code needs to access
the $scfmo, $uhfmo_alpha and $uhfmo_beta datagroups, that are usually stored
in external files, to read the eigenvalues. If the files are not available the generation
of the object fails.
The States object offer various methods to extract further information from the list
of states, like the gap or the has_hole to determine if in the list of eigenstates
there is one empty state with energy lower than some occupied state. You can check the
rest of the API to have a list of the methods available.