..
The turbomoleio package, a python interface to Turbomole
for preparing inputs, parsing outputs and other related tools.
Copyright (C) 2018-2022 BASF SE, Matgenix SRL.
This file is part of turbomoleio.
Turbomoleio is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Turbomoleio is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with turbomoleio (see ~turbomoleio/COPYING). If not,
see .
.. _parsing_outputs:
===============
Outputs parsing
===============
This section describes the general approach used for the parsing of outputs produced by TURBOMOLE.
In particular here we focus mainly on the text output that is written in the ``stdout``. In the following
we will consider that this output is written in a file name ``name_of_the_executable.log``.
Since some outputs are stored in files structured with data groups you can also use the ``DataGroups``
object described in :ref:`datagroup_files`. The information from there will only be just parsed as
strings though and most of information should be extracted from the log files using the objects
described below.
Lastly we will also deal below with the methods to extract the list of states from the outputs of
a calculation. This does not rely on the parsing of the log files.
The log files
=============
Quick Start
-----------
Before digging in the details of how the objects are organized it will be good to have a quick look
at how you can use the objects available and a few examples of the data that they can extract.
If you instead prefer to have an idea about the structure of the object you can first
check the :ref:`parsing_outputs_overall_logs` section and then come back here to the examples.
The most important method to parse your output file is the ``from_file`` class method. All the data
and file objects can be created using it.
Starting with a basic example of an RIDFT calculation, for which the ``ridft.log`` file is the
output of the calculation, you can extract the relevant data (in the form of a data object) in
the same manner. For example, the final energy (as well as its decomposition into its different
contributions) is obtained using the :class:`turbomoleio.output.data.ScfEnergiesData` object:
.. code-block:: python
>>> from turbomoleio.output.data import ScfEnergiesData
>>> energy_data = ScfEnergiesData.from_file('ridft.log')
>>> energy_data
>>> print(energy_data.total_energy)
-0.49586861429
>>> print(energy_data.virial_theorem)
1.42945116692
Similarly, the complete information of this same RIDFT calculation can be gathered in an ``ScfOutput``
object. This object contains a series of *basic* data objects (such as the above ``ScfEnergiesData``).
An instance of ``ScfOutput`` can be generated using the following procedure:
.. code-block:: python
>>> from turbomoleio.output.files import ScfOutput
>>> scf_output = ScfOutput.from_file('ridft.log')
You can access the basic data objects through different attributes. For example, you have access
to the above ``ScfEnergiesData`` object from the ``energies`` attribute:
.. code-block:: python
>>> print(scf_output.energies.total_energy)
-0.49586861429
You can also have access to the SCF iterations of the calculations:
.. code-block:: python
>>> iterations = scf_output.scf.iterations
>>> iterations
So-called *helper functions* are available for the different data objects. For example, from
the ``ScfIterationData`` object above, you can directly have a plot of the convergence:
.. code-block:: python
>>> iterations.plot_energies()
.. figure:: _static/scf_iterations.png
:width: 500px
:align: center
:alt: scf iterations
.. _parsing_outputs_overall_logs:
Overall Structure
-----------------
The code that performs the parsing is organized on three levels.
At the lowest level there is a generic :class:`turbomoleio.output.parser.Parser` object that
takes the string of the file and is only focused on the extraction of information.
This has several properties, each one tailored to parse specific subsections of the text and extract the relevant data
using regular expressions. The properties return a dictionary with the data that have
been extracted or None, if the section that it should parse could not be found in the string.
In addition these properties are *lazy*, in the sense that they will store the output before returning it
and if called again they will not repeat the parsing.
The ``Parser`` object, although it can be easily used to extract some punctual information, is probably less
useful to the generic user, who would mostly interact with the higher level objects.
The second level is given by a series of common data objects (see `Common data objects`_). These objects
describe single pieces of information that could be gathered together based on similarity and
type of information (e.g. data related to basis set, data related to Cosmo, ...).
They rely on the ``Parser`` to extract the data (can call one or more of its methods) and store them
in a systematic way as attributes, so that they can be accessed easily.
The root method for initializing the data objects is a ``from_parser`` class method that takes a Parser as an input.
The user however is much more likely to use the class method ``from_file``, that takes the path to a
TURBOMOLE output file as an input.
The top level is given by the `File data objects`_, that are designed to parse the outputs files
produced by the different kinds of TURBOMOLE executables. The objects contain as attributes different
instances of the data objects, thus collecting in a single place all the information that can be
extracted from a specific output file. These file objects can also be easily created with the
``from_file`` method.
Note that, like most of the other objects in turbomoleio, data and files objects are all ``MSONable``
(see `monty documentation `_). This means that they
can be converted to and generated from a dictionary with the ``as_dict`` and ``from_dict`` methods.
This should be enough to understand how to interact with these objects from the user side. If you
need more information about the internal implementation or you wish to implement the parsing
of additional quantities you should check out the :ref:`developer_parse_logs` section of the developer guide.
Data and File Objects
---------------------
The data and file objects are found in the :mod:`turbomoleio.output.data` and :mod:`turbomoleio.output.files`
respectively. Here we provide a list of all the objects in the modules along with a quick description
of the data that they contain. You can check the API documentation of each of them for more details
about their content.
Common data objects
^^^^^^^^^^^^^^^^^^^
:class:`turbomoleio.output.data.TurbomoleData` Turbomole version and executable used.
:class:`turbomoleio.output.data.RunData` Information about where the calculation was executed and the timings.
:class:`turbomoleio.output.data.BasisData` Basis sets used for the calculation.
:class:`turbomoleio.output.data.CosmoData` Information about the use of cosmo.
:class:`turbomoleio.output.data.SymmetryData` Information on the symmetry of the molecule.
:class:`turbomoleio.output.data.FunctionalData` Exchange-correlation functional.
:class:`turbomoleio.output.data.RiData` Information about the use of the Resolution of Identity approach.
:class:`turbomoleio.output.data.DispersionCorrectionData` Dispersion correction used in the calculation.
:class:`turbomoleio.output.data.DFTData` Information about a dft calculation (composed of FunctionalData,
RiData, DispersionCorrectionData and grids size information).
:class:`turbomoleio.output.data.ScfIterationData` Details about the iterations in a scf calculation.
:class:`turbomoleio.output.data.ScfData` Information about options and operations in an scf calculation
(contains ScfIterationData and other information such as DIIS, virtual orbital shift, convergence criteria, ...).
:class:`turbomoleio.output.data.ScfEnergiesData` Final energies and different contributions obtained
from an scf calculation.
:class:`turbomoleio.output.data.ElectrostaticMomentsData` Electrostatic moments (charge, dipole
and quadrupole).
:class:`turbomoleio.output.data.GeometryData` Geometry of the system: molecule and centers
of mass/charge.
:class:`turbomoleio.output.data.SpinData` Information about the spin in the calculation.
:class:`turbomoleio.output.data.SmearingData` Information about the smearing.
:class:`turbomoleio.output.data.IntegralData` Thresholds for integrals.
:class:`turbomoleio.output.data.EscfIterationData` Details about the iterations in an escf calculation.
:class:`turbomoleio.output.data.EscfData` Output of an escf calculation.
:class:`turbomoleio.output.data.StatptData` Initial information provided in statpt.
:class:`turbomoleio.output.data.RelaxData` Initial information provided in relax.
:class:`turbomoleio.output.data.RelaxGradientsData` Gradient values extracted from the relax/stapt output.
:class:`turbomoleio.output.data.RelaxConvergenceData` Final information about convergence.
:class:`turbomoleio.output.data.AoforceNumericalIntegrationData` Information about the numerical
integration in aoforce.
:class:`turbomoleio.output.data.AoforceRotationalData` Analysis of rotational states in aoforce.
:class:`turbomoleio.output.data.AoforceVibrationalData` Analysis of vibrational states in aoforce.
:class:`turbomoleio.output.data.MP2Data` Information about an MP2 calculation.
:class:`turbomoleio.output.data.MP2Results` Results from an MP2 calculation.
:class:`turbomoleio.output.data.PeriodicityData` Information about the periodicity of the calculation.
File data objects
^^^^^^^^^^^^^^^^^
:class:`turbomoleio.output.files.ScfOutput` Data from a dscf, ridft or riper calculations.
:class:`turbomoleio.output.files.EscfOutput` Data from an escf calculation (contains some data
about the previous scf calculation).
:class:`turbomoleio.output.files.EscfOnlyOutput` Data from an escf calculation (data only related to escf).
:class:`turbomoleio.output.files.GradOutput` Data from a grad or rdgrad calculation.
:class:`turbomoleio.output.files.EgradOutput` Data from an egrad calculation (contains both the
"grad"-related and "escf"-related data).
:class:`turbomoleio.output.files.RelaxOutput` Data from a relax calculation.
:class:`turbomoleio.output.files.StatptOutput` Data from a statpt calculation.
:class:`turbomoleio.output.files.AoforceOutput` Data from an aoforce calculation.
:class:`turbomoleio.output.files.Ricc2Output` Data from an Ri CC2 calculation.
:class:`turbomoleio.output.files.MP2Output` Data from an MP2 calculation (mpgrad, ricc2 or pnoccsd).
:class:`turbomoleio.output.files.JobexOutput` Data from the last step of a jobex calculation.
The States object
=================
The list of eigenstates of a molecule with their occupation can normally be extracted from the
TURBOMOLE outputs using the ``eiger`` script. In turbomoleio a similar code has been implemented
that uses the content of the different data groups in the ``control`` file (possibly accessing
subfiles linked in ``control``) to build an instance of :class:`turbomoleio.output.states.States`.
This is a subclass of :py:class:`collections.abc.MutableSequence` containing a list of
:class:`turbomoleio.output.states.State`. The states are sorted in ascending order based on the
eigenvalues and each ``State`` contains the information about the eigenvalue, the irreducible
representation, the index associated with the irreducible representation, the occupation and,
for UHF calculations, the spin.
Similarly to the other output parsing objects it can be instantiated using the ``from_file``
class method
.. code-block:: python
states = States.from_file("control")
Here, in general, the control file alone is not enough and the code needs to access
the ``$scfmo``, ``$uhfmo_alpha`` and ``$uhfmo_beta`` datagroups, that are usually stored
in external files, to read the eigenvalues. If the files are not available the generation
of the object fails.
The ``States`` object offer various methods to extract further information from the list
of states, like the ``gap`` or the ``has_hole`` to determine if in the list of eigenstates
there is one empty state with energy lower than some occupied state. You can check the
rest of the API to have a list of the methods available.