Running define
A crucial point of the execution of TURBOMOLE is generating the input through the define executable.
Since define is executed interactively this step is challenging when aiming at automating
the calculations with TURBOMOLE.
This problem is solved with the turbomoleio.input.define.DefineRunner object by automating
the interaction with define with the pexpect module, that spawns
child applications and responds to expected patterns in their output. This can be used inside
some complex workflow, but also to speed up the manual execution of define.
DefineRunner, based on the parameters provided in input as a dictionary, runs define and
navigates through the menus, setting the different options. It also stores all the history of the
submitted commands in the sent_commands attribute and the comments about the commands and the expected
replies from the code in the history attribute.
In case of failure an exception will be raised, that can be turbomoleio.input.define.DefineError
or one of its subclasses, depending on the type of error encountered. In particular, it should be noted
that for each submitted command code will expect some specific string coming from define, if this
does not happen within the number of seconds specified by the timeout argument a
turbomoleio.input.define.DefineExpectError is raised. This can happen mainly for two reasons,
the most common is that a combination of the files already available and/or of the options selected
brought the execution of define in a situation that was not foreseen. However this can also happen
in case of really low responsiveness from the system, taking too much time to provide the output.
Usually 60-120 seconds is a good compromise, but in case you encounter this failure and you cannot
figure out the correct reason, you might want to check if there is some system problem by increasing the
value of the timeout.
Given an instance of DefineRunner there are three methods that can be called and will result in
different kind of executions of define:
run_full()will generate the whole set of inputs running all the menus indefine. It expects to only have thecoordfile in the working directory. If acontrolfile is already present this will likely lead to someDefineError.run_update_internal_coords()runsdefinejust to update the internal coordinates of a set of inputs previously generated. In this case thecontrolfile should already be present in the working directory. In this case the execution is interrupted after going through the molecular geometry menu, sodefineends abnormally (by sending theqqcommand).run_generate_mo_files()is used to regenerate the molecular orbitals files in a folder where some inputs are already present. As for the previous case thecontrolfile should be already present anddefinewill be terminated abnormally after the molecular orbital menu.
In order to help new users to start using DefineRunner a small set of predefined input parameters
for the most common type of calculations have been prepared. These are stored as YAML files in the
turbomoleio/input/templates folder, but can also be easily accessed with the
turbomoleio.input.utils.get_define_template() function. A typical execution of DefineRunner
in a folder already containing a coord file can be done in this way
from turbomoleio.input.define import DefineRunner
from turbomoleio.input.utils import get_define_template
dp = get_define_template("ridft")
dr = DefineRunner(parameters=dp)
dr.run_full()
This will prepapare the inputs for an ridft calculation. Of course you can tune the parameters
according to the type of calculation that you want to perform. A good way of proceeding would be
to prepare the dictionary with options that you need and store it as a YAML or JSON file, so
that it can be easily retrieved afterwards to generate other calculations.
In general, even though it does not cover every single scenario available in define,
DefineRunner supports a large variety of options and can produce the inputs for different types
of calculations. Anyway it is focused on providing options for those inputs that can only be set through
define. Simple data groups can be set after the execution of DefineRunner by the user with the
standard functions available to modify a data group file (e.g. with the cdg function.
See Modifying a data group value and The Control object).
The DefineRunner parameters
In this section we provide a full list of all the parameters that can be passed to the parameter
argument when creating an instance of DefineRunner.
title(str): title of the job passed todefine.metric(int): sets the$metrickeywork in thecontrolfile before runningdefine.copymo(str): path to a directory containing themos,alphaandbetafiles that will be copied in the current working directory at the end ofdefine. The control file should be present in the folder as well, since it will be used to extract the value of the symmetry, overriding thesymanddesyvalues.sym(str): the value will be passed to force a specific symmetry with thesycommand in the molecular geometry menu.sym_eps(float): if present will be added to thesymoption as a tolerance for thesycommand in the molecular geometry menu.desy(bool): the system will determine the symmetry of the molecule in the molecular geometry menu. Only used ifsymis not defined.desy_eps(float): if present will be added as a tolerance for thedesycommand in the molecular geometry menu.ired(bool): generates the internal coordinates with theiredcommand.usemo(str): path to acontrolfile or to a directory containing it. The file will be passed todefinewith theusecommand.ex_method(str): method used to calculate the excited states. Available options:rpa,cis,dynpol,polly.ex_multi(str): multiplicity of the excited states. Available options:singlet,triplet. This will be only applied for closed shell calculations to distinguish betweenrpas/cissandrpat/cist. The type of calculation will be determined according to the options available for the calculations based on the outcome of the EHT. The value ofex_multiwill be ignored if the calculation is an UHT or ifex_methodisdynpolorpolly.ex_all_states(int): the number of excited states for all the irreps present in the system according todefine. Sincedefinedoes not accept values larger than the number of states available, the code will check all the available states and set the number to the minimum betweenex_all_statesand the number of actual available states for each specific irrep.ex_irrep_states(dict): a dictionary of the form{"irrep": num_exc_states}, with the key representing the irrep and the value the number of excited states (e.g.{"a1": 10}). It will override the values ofex_all_statesfor the specific states mentioned in this dictionary.ex_mp2(dict): list of excited states for the mp2 calculations. Should be a dict of the form{"irrep": [multiplicity_int, num_exc_states]}, with the key representing the irrep and the value a list with anintrepresenting the multiplicity (1=singlet, 2=doublet, 3=triplet) and one one representing the number of excited states (e.g.{"a1": [1, 10]}). This will be provided to theexcicommand withirrep=a1 multiplicity=1 nexc=10.ex_frequency(float): value of the frequency for the calculation of dynamic polarisabilities. Default 589 nm if not defined and excited states are required.ex_frequency_unit(str): units of the frequency for the calculation of dynamic polarisabilities. Default nm.ex_exopt(int): explicitly enforces treatment of the n-th state. Set directly in thecontrolfile with the$exoptkeyword usingcdgwhiledefineis being executed.method(str): calculation method. Available optionsdft,hf,mp2,adc(2),ccsd(t)(orccsdt).mp2energy(bool): ifTruethe calculations will be limited to the energy (i.e. in thericc2menu only themethodwill be provided. Otherwise thegeoopt methodoption will be given todefine.basis(str): the basis that will be used for all the elements in the system (e.g.b all def2-SV(P)).basis_atom(dict): a dictionary of the type{"atom": "basis"}defining the basis for specic atoms, where the key of the dictionary can be any string accepted bydefine(e.g.1,2,4-6,c). Ifbasisalso defined, it will be used to set the basis for all the atoms and thenbasis_atomwill override specific atoms.charge(int): charge defined in the extended Hueckel guess.unpaired_electrons(int): number of unpaired electrons for UHF.rijk(bool): activates rijk calculation.ri(bool): activatesricalculation, only ifrijkis disabled.marij(bool): activatesmarijcalculation, only ifrijkis disabled andriis enabled.functional(str): functional for DFT. All available values in TURBOMOLE.gridsize(str): size of the grid in DFT calculation.maxcor(float): memory set in mp2 calculations.use_f12(bool): enables f12 calculation for methodsmp2,adc(2)andccsd(t).use_f12*(bool): enables f12* calculation for methodccsd(t). Adds the lineccsdapprox ccsd(f12*)to the$rir12data group. Requiresuse_f12.maxiter(int): maximum number of iterations for mp2 calculations.scfiterlimit(int): maximum number of scf iterations.scfconv(int): accuracy of scf energy. A number in the range 4-9.coord_file(str): path to thecoordfile. By default uses thecoordfile in the working folder.disp(str): activates dispersion correction according to the provided value. Accepted values areDFT-D1,DFT-D2,DFT-D3,DFT-D3 BJ. N.B. the values will be set directly on the control file, not usingdefine.
In addition the following keywords are related to cosmo and are set directly in the control file after
define has completed. The functionality is activated setting the use_cosmo in the parameters.
This adds the $cosmo data group to the control file with the following options plus the
$cosmo_out = out.cosmo data group. The additional options will modify the values inside the
$cosmo data group.
use_cosmo(bool): If True enables the calculation with cosmo.epsilon(float): permittivity used for scaling of the screening charges.nppa(int): number of basis grid points per atom.nspa(int): number of segments per atom.disex(float): distance threshold for A matrix elements (Angstrom).rsolv(float): distance to outer solvent sphere for cavity construction (Angstrom).routf(float): factor for outer cavity construction in the outlying charge correction.cavity(str): acceptable values are “open” (leave untidy seams between atoms) and “closed” (pave intersection seams with segments).use_old_amat(bool): if True adds theuse_old_amatto the$cosmodata group, i.e. uses A matrix setup of TURBOMOLE 5.7.
Validation
An experimental feature is available to validate the correctness of the dictionary that you want
to use as paramaters for DefineRunner. It is based on the
cerberus package and is implemented in the
turbomoleio.input.utils module.
The simplest way to use it is:
from turbomoleio.input.utils import get_define_template, validate_parameters
dp = get_define_template("dscf")
dp["use_cosmo"] = True
validate_parameters(dp)
This will return True if the passed dictionary is correct according to the defined schema
and False otherwise. At the moment the function will provide the following kind of validations:
Check that all the keys of the dictionary are acceptable ones. This should prevent inserting typos in the keys.
Check that the values are of the correct type, according to the definitions above.
Validate some dependencies among the different options. For example
use_f12*can beTrueonly ifuse_f12is alsoTrue.Validate the values of some options. For example that
methodis among one the allowed values:dft,hf,mp2,adc(2),ccsd(t),ccsdt
None of the parameters is required and thus an empty dictionary will be considered as valid.
Alternatively you can directly access turbomoleio.input.utils.define_parameters_validator,
that is an instance of a cerberus Validator, so that you can take full advantages of its features.
Being experimental this feature has not been extensively tested, and you should use it with a bit
of care. If the validation of one of your dictionaries fails but you are absolutely certain that
it is correct you can probably ignore the failure in the validation. In addition for some of the
parameters a validation of the possible values is missing. Just to mention one, the validation
only checks that the value of functional is a string, but no check is performed to verify
that the value is the name of an existing functional.