The data group files

Most of the input and output files in TURBOMOLE are structured as a list of data groups separated by a $ symbol. (e.g. $coord). The initial part of the of each group usually determines the type of data with optional values that might be defined on the same line or on subsequent lines. In turbomoleio a basic object is implemented to allow the parsing and writing of generic data groups files, with a set of subclasses to interact with selected types of file that require specific functionalities (e.g. the control file).

The DataGroups object

The most basic object to interact with data group files is turbomoleio.core.datagroups.DataGroups. It keeps an internal representation of the file given by a list of data groups, i.e. of strings that should contain a single data group and start with $, stored in the dg_list attribute. Given the TURBOMOLE (lack of) conventions (i.e. the keyword can be one or more words separated by spaces and there is no complete list of the possible keywords available), the defining keyword at the beginning of the data group cannot be identified unambiguously. For this reason all the methods in the DataGroups objects and its subclasses make a distinction between the defining keyword (referred as data_group) and its value (the data_block), but this is done just by matching the value of data_group with the beginning of each string in dg_list and no validation is performed on the names provided.

When reading from an existing data group file to generate an instance of DataGroups the list of data groups will be extracted by parsing the content of the file, while cleaning it up of the superfluous parts, trying to reduce the content of the object to just the data relevant to TURBOMOLE. In particular the following steps are applied:

  • remove everything before the first dollar (“$”) sign

  • remove blank lines

  • remove spaces and tabs before the # comments

  • remove spaces and tabs before the $ signs

  • remove comments, both with a full line (i.e lines starting with #) and inline (e.g. $uhf # this is an comment)

  • remove all the $dummy data groups

  • remove all the lines after the $end data group

  • split the data groups based on the $ symbol

While it is apparently not strictly required by TURBOMOLE, in turbomoleio the presence of the final $end data group, marking the end of the data groups list, is mandatory. Parsing files with the missing keyword will result in an error. Note that, while some other checks are also performed while parsing a file, if the file is not automatically generated by TURBOMOLE, it is responsibility of the user to verify that the format is correct. In addition the full initial string is stored under the string attribute. This value however is never updated when the list of data groups is modified, so it should just be used as a reference for the initial content of a parsed file.

An instance of DataGroups can be written to a file with the to_file method that will contain only the data included in the dg_list. The object also offers a series of methods to view and modify the data groups present in the list. These are inspired from the standard scripts distributed with TURBOMOLE, but are entirely implemented in python and may have some differences in their behavior, in addition to have some more options. Note that, while the examples provided below are based on a control for simplicity, the methods can be applied to any file structured with data groups.

Accessing a data group value

The method to get the value of a data group is show_data_group with its shortcut sdg. The data_group string in input will be used to match the beginning of the elements of dg_list. Note that the leading $ symbol is optional, if missing from data_group it will be added internally before trying to match the data groups. The strict argument means that there should be an exact match with the element of the list (i.e. if the data group to be removed is followed by a space, a tab, a new line or a return). To avoid ambiguities only one value matching the data_group can be present in the list. If more than one are encountered an exception is raised. The method will return the string following the matched value of data_group. In case no matching is found instead the default value is returned.

A particular case is given by the data groups whose value is contained in another file. A typical example in the control file is for example:

$coord    file=coord

In this case show_data_group can be asked to open the file with name coord and return the content of the $coord data group there, instead of returning simply file=coord. This will happen if the show_from_subfile is set to True (the default). If such a file is not present in the current working directory the file=coord string will be returned.

Warning

The subfile used to read a datagroup is the one in the working directory where the python code is being executed. Always make sure that you are in the correct folder of the file that you intent to read.

In some cases a data group can contain different options that are being set. Consider for example the $dft data group:

$dft
   functional b-p
   gridsize   m3

You might be interested in extracting the value of the functional. In this case you can use the the show_data_group_option method (and the sdgo shortcut) providing both the data_group and the option that should be matched:

dg.show_data_group_option(data_group="dft", option="functional")

In this case the code will return `` b-p``. Since also for these kind of options the conventions are not uniform in TURBOMOLE (e.g. the name of the option and the value can be separated by a =) the support offered by this method are relatively limited. The code will just check among the lines in the $dft data group if one starts with the specified option are return the remaining string contained in the line.

Modifying a data group value

Different methods allow to change the values of the dg_list. The first considered is kill_data_group (or its shortcut kdg) that allows to remove an element from dg_list. The data_group and strict arguments have the same meaning as in show_data_group, but here all the data groups matching will be removed from the list. For example:

dg.kill_data_group("$thi", strict=False)

removes both $thize and $thime from a data groups list, while with strict=True none of them is removed.

The next operation available is to add a new datagroup with add_data_group (or its shortcut adg). This adds an element to the list given by the data_group string (with a prepended $, if not already present) joined with the data_block string. The new element will be added at the end of the list, but before the $end data group. Note that if you want to add a data group with no explicit value you need to provide an empty string as data_block. For example:

dg.add_data_group("$symmetry", "c1")
dg.add_data_group("uhf", "")

The change_data_group (and its shortcut cdg) changes the value of the data_group to the new data_block value. The method first performs a kill_data_group with strict=True and then add_data_group. The side effect is that the changed datagroup will be moved to the end of the dg_list. The data group will be added if not already present. If data_block is None only the kill_data_group will be applied. To set a data group with no explicit value use an empty string for data_block. Examples:

# change $scfiterlimit to 100
dg.change_data_group("$scfiterlimit", "100")
# remove $scfconv (equivalent to dg.kill_data_group("scfconv")
dg.change_data_group("scfconv", None)

Note

The data_block argument of add_data_group and change_data_group should be a string, i.e. if the data block is a number, it should be the string representation of that number, as shown above with “$scfiterlimit” set to “100”.

Lastly, you have a method to modify the options contained in a data group. As for the show_data_group_option method, also modify_data_group_options (shortcut: mdgo) has some limitations. The data_group will be used to match the data group, while the options argument should be a dictionary where the keys are used to identify the line that should be modified (much like show_data_group_option) but the values should contain the whole line that will replace the original one In connection with the $dft example one can run

options = {"functional": "functional pbe"}
dg.modify_data_group_options(data_group="dft", options=options)

This will modify the data group in the object to look like this:

$dft
   functional pbe
   gridsize   m3

In a similar way to change_data_group if the value is None the line will be removed, if the option has no explicit value the value in the dictionary should be set to an empty string. If no line is matching the key of the option a new line with the value will be added. If no data group is matching the data_group value the data group will be added to the list with the specified options only.

Warning

Note that these methods will act on the list in the object and not on the file from which they were parsed. If you need to update a file you should call the to_file method after modifying the list in the object.

The Control object

An important subclass of DataGroups is turbomoleio.core.control.Control. This is specifically designed to interact with the control file providing additional functionalities besides the one described for the basic data groups files. Different options are available to extract frequently accessed data groups (e.g. the get_charge method) or to set specific group of options (e.g. set_disp to specify the dispersion correction). The list of such methods is likely to increase over time, so the best option is to directly check its API. A few methods however need a particular attention.

First we can consider the energy and gradient attributes, that read the data from the $energy and $gradient data groups, respectively and generate instances of the turbomoleio.core.control.Energy and turbomoleio.core.control.Gradient objects. These will not only provide the parsed numerical values of the energies and gradients organized in lists, but also offer some methods for analysing and plotting the data. For example the evolution of the energy during the steps of a geometry optimization can be obtained with:

c = Control.from_file("control")
c.energy.plot()
energy

Another important method is get_shells that extracts the list of occupied shells divided according to the irreducible representation from the the alpha/beta/closed shells data groups and generate an instance of the turbomoleio.core.control.Shells object. This is used to generate the list of occupied states.

Finally we mention the cpc that reproduces the cpc script command available in TURBOMOLE. Note that this does not call the original script, but it is a python implementation that reproduces a similar effect.

It should also be noted that the turbomoleio.core.control module contains a set of functions that allow to directly access and modify a control file without passing explicitly through the object. The sdg, kdg, cdg, sdgo, mdgo and cpc functions read the file from the specified path and convert it to a Control object, apply the corresponding method and, if needed, directly write the modified version to the file system. Note that since these operations all use the Control object you will still have all the side effects of parsing a data group file mentioned above (e.g. removing the comments).