Running define
A crucial point of the execution of TURBOMOLE is generating the input through the define
executable.
Since define
is executed interactively this step is challenging when aiming at automating
the calculations with TURBOMOLE.
This problem is solved with the turbomoleio.input.define.DefineRunner
object by automating
the interaction with define
with the pexpect module, that spawns
child applications and responds to expected patterns in their output. This can be used inside
some complex workflow, but also to speed up the manual execution of define
.
DefineRunner
, based on the parameters provided in input as a dictionary, runs define
and
navigates through the menus, setting the different options. It also stores all the history of the
submitted commands in the sent_commands
attribute and the comments about the commands and the expected
replies from the code in the history
attribute.
In case of failure an exception will be raised, that can be turbomoleio.input.define.DefineError
or one of its subclasses, depending on the type of error encountered. In particular, it should be noted
that for each submitted command code will expect some specific string coming from define
, if this
does not happen within the number of seconds specified by the timeout
argument a
turbomoleio.input.define.DefineExpectError
is raised. This can happen mainly for two reasons,
the most common is that a combination of the files already available and/or of the options selected
brought the execution of define
in a situation that was not foreseen. However this can also happen
in case of really low responsiveness from the system, taking too much time to provide the output.
Usually 60-120 seconds is a good compromise, but in case you encounter this failure and you cannot
figure out the correct reason, you might want to check if there is some system problem by increasing the
value of the timeout
.
Given an instance of DefineRunner
there are three methods that can be called and will result in
different kind of executions of define
:
run_full()
will generate the whole set of inputs running all the menus indefine
. It expects to only have thecoord
file in the working directory. If acontrol
file is already present this will likely lead to someDefineError
.run_update_internal_coords()
runsdefine
just to update the internal coordinates of a set of inputs previously generated. In this case thecontrol
file should already be present in the working directory. In this case the execution is interrupted after going through the molecular geometry menu, sodefine
ends abnormally (by sending theqq
command).run_generate_mo_files()
is used to regenerate the molecular orbitals files in a folder where some inputs are already present. As for the previous case thecontrol
file should be already present anddefine
will be terminated abnormally after the molecular orbital menu.
In order to help new users to start using DefineRunner
a small set of predefined input parameters
for the most common type of calculations have been prepared. These are stored as YAML files in the
turbomoleio/input/templates
folder, but can also be easily accessed with the
turbomoleio.input.utils.get_define_template()
function. A typical execution of DefineRunner
in a folder already containing a coord
file can be done in this way
from turbomoleio.input.define import DefineRunner
from turbomoleio.input.utils import get_define_template
dp = get_define_template("ridft")
dr = DefineRunner(parameters=dp)
dr.run_full()
This will prepapare the inputs for an ridft
calculation. Of course you can tune the parameters
according to the type of calculation that you want to perform. A good way of proceeding would be
to prepare the dictionary with options that you need and store it as a YAML or JSON file, so
that it can be easily retrieved afterwards to generate other calculations.
In general, even though it does not cover every single scenario available in define
,
DefineRunner
supports a large variety of options and can produce the inputs for different types
of calculations. Anyway it is focused on providing options for those inputs that can only be set through
define. Simple data groups can be set after the execution of DefineRunner
by the user with the
standard functions available to modify a data group file (e.g. with the cdg
function.
See Modifying a data group value and The Control object).
The DefineRunner parameters
In this section we provide a full list of all the parameters that can be passed to the parameter
argument when creating an instance of DefineRunner
.
title
(str
): title of the job passed todefine
.metric
(int
): sets the$metric
keywork in thecontrol
file before runningdefine
.copymo
(str
): path to a directory containing themos
,alpha
andbeta
files that will be copied in the current working directory at the end ofdefine
. The control file should be present in the folder as well, since it will be used to extract the value of the symmetry, overriding thesym
anddesy
values.sym
(str
): the value will be passed to force a specific symmetry with thesy
command in the molecular geometry menu.sym_eps
(float
): if present will be added to thesym
option as a tolerance for thesy
command in the molecular geometry menu.desy
(bool
): the system will determine the symmetry of the molecule in the molecular geometry menu. Only used ifsym
is not defined.desy_eps
(float
): if present will be added as a tolerance for thedesy
command in the molecular geometry menu.ired
(bool
): generates the internal coordinates with theired
command.usemo
(str
): path to acontrol
file or to a directory containing it. The file will be passed todefine
with theuse
command.ex_method
(str
): method used to calculate the excited states. Available options:rpa
,cis
,dynpol
,polly
.ex_multi
(str
): multiplicity of the excited states. Available options:singlet
,triplet
. This will be only applied for closed shell calculations to distinguish betweenrpas/ciss
andrpat/cist
. The type of calculation will be determined according to the options available for the calculations based on the outcome of the EHT. The value ofex_multi
will be ignored if the calculation is an UHT or ifex_method
isdynpol
orpolly
.ex_all_states
(int
): the number of excited states for all the irreps present in the system according todefine
. Sincedefine
does not accept values larger than the number of states available, the code will check all the available states and set the number to the minimum betweenex_all_states
and the number of actual available states for each specific irrep.ex_irrep_states
(dict
): a dictionary of the form{"irrep": num_exc_states}
, with the key representing the irrep and the value the number of excited states (e.g.{"a1": 10}
). It will override the values ofex_all_states
for the specific states mentioned in this dictionary.ex_mp2
(dict
): list of excited states for the mp2 calculations. Should be a dict of the form{"irrep": [multiplicity_int, num_exc_states]}
, with the key representing the irrep and the value a list with anint
representing the multiplicity (1=singlet, 2=doublet, 3=triplet) and one one representing the number of excited states (e.g.{"a1": [1, 10]}
). This will be provided to theexci
command withirrep=a1 multiplicity=1 nexc=10
.ex_frequency
(float
): value of the frequency for the calculation of dynamic polarisabilities. Default 589 nm if not defined and excited states are required.ex_frequency_unit
(str
): units of the frequency for the calculation of dynamic polarisabilities. Default nm.ex_exopt
(int
): explicitly enforces treatment of the n-th state. Set directly in thecontrol
file with the$exopt
keyword usingcdg
whiledefine
is being executed.method
(str
): calculation method. Available optionsdft
,hf
,mp2
,adc(2)
,ccsd(t)
(orccsdt
).mp2energy
(bool
): ifTrue
the calculations will be limited to the energy (i.e. in thericc2
menu only themethod
will be provided. Otherwise thegeoopt method
option will be given todefine
.basis
(str
): the basis that will be used for all the elements in the system (e.g.b all def2-SV(P)
).basis_atom
(dict
): a dictionary of the type{"atom": "basis"}
defining the basis for specic atoms, where the key of the dictionary can be any string accepted bydefine
(e.g.1,2,4-6
,c
). Ifbasis
also defined, it will be used to set the basis for all the atoms and thenbasis_atom
will override specific atoms.charge
(int
): charge defined in the extended Hueckel guess.unpaired_electrons
(int
): number of unpaired electrons for UHF.rijk
(bool
): activates rijk calculation.ri
(bool
): activatesri
calculation, only ifrijk
is disabled.marij
(bool
): activatesmarij
calculation, only ifrijk
is disabled andri
is enabled.functional
(str
): functional for DFT. All available values in TURBOMOLE.gridsize
(str
): size of the grid in DFT calculation.maxcor
(float
): memory set in mp2 calculations.use_f12
(bool
): enables f12 calculation for methodsmp2
,adc(2)
andccsd(t)
.use_f12*
(bool
): enables f12* calculation for methodccsd(t)
. Adds the lineccsdapprox ccsd(f12*)
to the$rir12
data group. Requiresuse_f12
.maxiter
(int
): maximum number of iterations for mp2 calculations.scfiterlimit
(int
): maximum number of scf iterations.scfconv
(int
): accuracy of scf energy. A number in the range 4-9.coord_file
(str
): path to thecoord
file. By default uses thecoord
file in the working folder.disp
(str
): activates dispersion correction according to the provided value. Accepted values areDFT-D1
,DFT-D2
,DFT-D3
,DFT-D3 BJ
. N.B. the values will be set directly on the control file, not usingdefine
.
In addition the following keywords are related to cosmo and are set directly in the control
file after
define
has completed. The functionality is activated setting the use_cosmo
in the parameters.
This adds the $cosmo
data group to the control file with the following options plus the
$cosmo_out = out.cosmo
data group. The additional options will modify the values inside the
$cosmo
data group.
use_cosmo
(bool
): If True enables the calculation with cosmo.epsilon
(float
): permittivity used for scaling of the screening charges.nppa
(int
): number of basis grid points per atom.nspa
(int
): number of segments per atom.disex
(float
): distance threshold for A matrix elements (Angstrom).rsolv
(float
): distance to outer solvent sphere for cavity construction (Angstrom).routf
(float
): factor for outer cavity construction in the outlying charge correction.cavity
(str
): acceptable values are “open” (leave untidy seams between atoms) and “closed” (pave intersection seams with segments).use_old_amat
(bool
): if True adds theuse_old_amat
to the$cosmo
data group, i.e. uses A matrix setup of TURBOMOLE 5.7.
Validation
An experimental feature is available to validate the correctness of the dictionary that you want
to use as paramaters
for DefineRunner
. It is based on the
cerberus package and is implemented in the
turbomoleio.input.utils
module.
The simplest way to use it is:
from turbomoleio.input.utils import get_define_template, validate_parameters
dp = get_define_template("dscf")
dp["use_cosmo"] = True
validate_parameters(dp)
This will return True
if the passed dictionary is correct according to the defined schema
and False
otherwise. At the moment the function will provide the following kind of validations:
Check that all the keys of the dictionary are acceptable ones. This should prevent inserting typos in the keys.
Check that the values are of the correct type, according to the definitions above.
Validate some dependencies among the different options. For example
use_f12*
can beTrue
only ifuse_f12
is alsoTrue
.Validate the values of some options. For example that
method
is among one the allowed values:dft
,hf
,mp2
,adc(2)
,ccsd(t)
,ccsdt
None of the parameters is required and thus an empty dictionary will be considered as valid.
Alternatively you can directly access turbomoleio.input.utils.define_parameters_validator
,
that is an instance of a cerberus Validator
, so that you can take full advantages of its features.
Being experimental this feature has not been extensively tested, and you should use it with a bit
of care. If the validation of one of your dictionaries fails but you are absolutely certain that
it is correct you can probably ignore the failure in the validation. In addition for some of the
parameters a validation of the possible values is missing. Just to mention one, the validation
only checks that the value of functional
is a string, but no check is performed to verify
that the value is the name of an existing functional.