Cif File Reader

CIF¶ The Crystallographic Information File (CIF) format is a text-based format primarily used in the context of crystallography. Class garnett.reader.CifFileReader (precision=None, tolerance=0.001) source ¶ CIF-file reader for the Glotzer Group, University of Michigan. Requires the PyCifRW package to parse CIF files. Authors: Matthew Spellings. Caltech Intermediate Form file extension or.cif file is a text file format that is used to render descriptive instructions for integrated circuits or most commonly known as a chip. The chip is so miniscule that applying any creation or alteration using unaided human capacity is a hard task. CIF addresses this problem, and it is now widely used.

The iotbx.file_reader module is intended to provide a single entry pointfor reading most common crystallographic file formats. This allows theprogrammer to use the underlying input functions without needing to know thespecific APIs in detail, although the resulting objects will still beformat-specific. It is also designed to support automatic file typedetermination, first by guessing the format based on the file extension, thenby trying a succession of input methods until one finishes without an error.This facility is used both for processing command-line arguments (especiallyvia the iotbx.phil extensions), and for handling file input in thePhenix GUI.

Online Cif File Reader

In the simplest case, reading a file requires only a single method:

Note that if the extension does not imply a particular format to try first, orif parsing using the appropriate module fails due to corrupted file data, thismay be more inefficient than explicitly specifying the filetype, and should be used only when the format is not known in advance. Youcan alternately specify which input module to use:

This will skip the automatic format detection and only try the specifiedinput method. Several options are available for error handling; the defaultbehavior when force_type is set is to pass through any exceptionsencountered when calling the underlying input method:

This is fine for internal use when an unexpected file parsing error is likelyto be a bug in the code, but less suitable when processing user input.Alternately, a libtbx.utils.Sorry exception may be raised instead:

For PDB, MTZ, and CIF files (the most commonly used formats in macromolecularcrystallograph), it is also possible to get similar behavior by treating thefile extension as an implicit replacement for force_type:

The allowed file types are specified in the module:

However, in most cases only a subset of these will be tried automatically.

API documentation¶

exception iotbx.file_reader.FormatError

Bases: exceptions.Sorry

iotbx.file_reader.any_file(file_name, get_processed_file=False, valid_types=['pdb', 'hkl', 'cif', 'pkl', 'seq', 'phil', 'aln', 'txt', 'xplor_map', 'ccp4_map'], allow_directories=False, force_type=None, input_class=None, raise_sorry_if_errors=False, raise_sorry_if_not_expected_format=False)

Cif File Reader

Main input method, wrapper for any_file_input class.

Cif file reader free download
Parameters:
  • file_name – path to file (relative or absolute)
  • get_processed_file – TODO
  • valid_types – file types to consider
  • allow_directories – process directory if given as file_name
  • force_type – read as this format, don’t try any others
  • input_class – optional substitute for any_file_input, with additionalparsers
  • raise_sorry_if_errors – raise a Sorry exception if parsing fails (usedwith force_type)
  • raise_sorry_if_not_expected_format – raise a Sorry exception if thefile extension does not match the parsed file type
Returns:

any_file_input object, or an instance of the input_class param

iotbx.file_reader.any_file_fast(file_name, get_processed_file=False, valid_types=['pdb', 'hkl', 'cif', 'pkl', 'seq', 'phil', 'aln', 'txt', 'xplor_map', 'ccp4_map'], allow_directories=False, force_type=None, input_class=None)

mimics any_file, but without parsing - will instead guess the file type fromthe extension. for most output files produced by cctbx/phenix this isrelatively safe; for files of unknown provenance it is less effective.

class iotbx.file_reader.any_file_fast_input(file_name, valid_types)

Bases: object

class iotbx.file_reader.any_file_input(file_name, get_processed_file, valid_types, force_type, raise_sorry_if_errors=False, raise_sorry_if_not_expected_format=False)

Bases: object

Container for file data of any supported type. Usually obtained via theany_file() function rather than being instantiated directly.

Attributes

file_contentReturn the underlying format-specific object containing file data.
file_name
file_objectSynonym for file_content()
file_serverFor reflection files only, returns an iotbx.reflection_file_utils.reflection_file_server object containing the extracted Miller arrays.
file_typeReturn a string representing the generic data type, for example ‘pdb’ or ‘hkl’.

Methods

assert_file_type(expected_type)Verify that the automatically determined file type is the expected format.
check_file_type([expected_type, ...])Verify that the automatically determined file type is the expected format, with the option to consider multiple formats.
crystal_symmetry()Extract the crystal symmetry (if any).
file_info([show_file_size])Format a string containing the file type and size.
set_file_type(file_type)
show_summary([out])Print out some basic information about the file.
try_all_types()
assert_file_type(expected_type)

Verify that the automatically determined file type is the expected format.

check_file_type(expected_type=None, multiple_formats=())

Verify that the automatically determined file type is the expected format,with the option to consider multiple formats.

crystal_symmetry()

Extract the crystal symmetry (if any). Only valid for model (PDB/mmCIF)and reflection files.

file_content

Return the underlying format-specific object containing file data.

file_info(show_file_size=True)
Reader

Format a string containing the file type and size.

file_name
file_object

Synonym for file_content()

file_server

For reflection files only, returns aniotbx.reflection_file_utils.reflection_file_server objectcontaining the extracted Miller arrays. Note that this will implicitlymerge any non-unique observations.

file_type

Return a string representing the generic data type, for example ‘pdb’ or‘hkl’. Note that this is not necessarily the same as the underlyingformat, for example ‘pdb’ can mean either PDB or mmCIF format, and ‘hkl’could mean MTZ, CIF, XDS, Scalepack, or SHELX format.

set_file_type(file_type)

Cif File Reader Software

show_summary(out=<open file '<stdout>', mode 'w' at 0x7f1177e9a150>)

Print out some basic information about the file.

try_all_types()
class iotbx.file_reader.directory_input(dir_name)

Bases: object

Methods

file_info([show_file_size])
file_info(show_file_size=False)
iotbx.file_reader.find_closest_base_name(file_name, base_name, templates)
iotbx.file_reader.get_wildcard_string(format)
iotbx.file_reader.get_wildcard_strings(formats, include_any=True)
class iotbx.file_reader.group_files(file_names, template_format='pdb', group_by_directory=True)

Bases: object

iotbx.file_reader.guess_file_type(file_name, extensions={'xml': ['xml'], 'map': ['xplor', 'map', 'ccp4'], 'cif': ['cif', 'mmcif'], 'seq': ['fa', 'faa', 'seq', 'pir', 'dat', 'fasta'], 'hhr': ['hhr'], 'phil': ['params', 'eff', 'def', 'phil', 'param'], 'smi': ['smi'], 'ccp4_map': ['ccp4', 'map', 'mrc'], 'mtz': ['mtz'], 'img': ['img', 'osc', 'mccd', 'cbf'], 'sdf': ['sdf'], 'rosetta': ['gz'], 'hkl': ['mtz', 'hkl', 'sca', 'cns', 'xplor', 'cv', 'ref', 'fobs'], 'aln': ['aln', 'ali', 'clustal'], 'txt': ['txt', 'log', 'html', 'geo'], 'pdb': ['pdb', 'ent'], 'pkl': ['pickle', 'pkl'], 'xplor_map': ['xplor', 'map']})
iotbx.file_reader.sort_by_file_type(file_names, sort_order=None)
iotbx.file_reader.splitext(file_name)
iotbx.file_reader.strip_shelx_format_extension(file_name)

Structure files

The structure of a to-be-analyzed material has to be provided in one of Zeo++ compatible file format: CSSR, CUC, V1, CIF, CAR, DLP and PDB. Additional requirement for the formats is that the structure file contains the complete unit cell (P1 symmetry). Therefore, any structure represented by an assymetric unit and symmetry information has to be extended to P1 symmetry (i.e. the unit cell needs to be constructed) before it can be processed with Zeo++. CIF files can contain either the entire unit cell with all atoms or an asymmetric unit with symmetric information. In the current version the latter has to be specified in terms of a list of symmetry operations in order to have Zeo++ build the full unit cell.
Moreover Zeo++ can also read structures from MOPAC's .arc output files.

Discussed file formats: CSSR | CUC | V1 | CIF |ARC and others.

CSSR file format

CSSR format is a file format used by many molecular simulation packages and therefore it was adopted in Zeo++. A CSSR file contains unit cell definition and a list of atoms with their fractional coordinates.

An example of CSSR file for IZA's EDI zeolite is provided below:

Cif File Reader Chrome

We also noticed that CSSR files generated by Open Babel have slightly different format. In order to read those into Zeo++, please change their names to .obcssr before running Zeo++.


CUC file format (subject to change)

CUC is a custom format introduced on the course of development of Zeo++. It contains the same information as the CSSR format but it has a simpler format.The first line is a comment/title line. The second line contains periodic unit cell information: lengths of a, b and c vectors and three unit cell angels. The third and following lines specify atoms and their fractional coordinates.

An example of CUC file for IZA's EDI zeolite is provided below:


V1 file format

Cif File Reader Mac

V1 is a custom format introducted on the course of development of Zeo++. It contains periodic unit cell definition and a list of atoms. The unit cell is defined using three vectors. It is required that y and z coordinates of the first vectors as well as z coordinate of the third vector are set to zero (vectors alligned with axis of the coordinate system). The unit cell definition is followed by an integer - a number of atoms, and then a list of atoms with their Cartesian coordinates.

An example of v1 file for IZA's EDI zeolite is provided below:


CIF file format

CIF file format is a common format for storing crystal structure information (esp. experimental data). The CIF file reader in Zeo++ is under development: it can handle most of the files but occassionally fails on misformetted CIFs. Please contact us if it happens. Alternatively, other codes (e.g. CCTBX) can be used for converting your files into CSSR or other formats handled by Zeo++.

An example of cif file for IZA's EDI zeolite.


ARC and others

The recent versions of semiempirical electronic structure code, MOPAC, can handle periodic systems. It makes it suitable to investigate porous polymers, structure of which can be also analyzed using Zeo++.

Cif File Reader Pro

Other files formats such as .dlp, .car, .pdb can be also read by Zeo++. However, we noticed some variations in these and sometime the source code needs to be altered a bit to process a particular file.