easYPipe ‘launch’¶

‘launch’ subcommand runs phenix.ligand_pipeline [1] on all the mtz (several processed data, several datasets) according to options and information in ‘mtz_to_treat_ALL.csv’ file.

Usage¶

easypipe.py data launch [-h] [-m {fast,full,allsg}] [-l] [-n NUMBER] [-c NUMBER] [-b NUMBER | -a] [-s] [-t TEMPLATE] ref

arguments	description
ref	folder with fasta file and pdb file for replacement, and cif(s) if ligand(s) in the model

Warning

reference pdb files should include the row starting with ‘CRYST1’ containing information on space group

optional arguments	description
-h, –help	show this help message and exit
-m {fast,full,allsg}, –mode {fast,full,allsg}	running mode: fast, full, or allsg (default = fast)
-l, –lig	for ligand search and placement
-n NUMBER, –nblig NUMBER	number of ligand copies to be searched (default = 1, max 9 for the moment).
-c NUMBER, –cclig NUMBER	minimum CC to consider a ligand placement correct (default = 0.7). Ligands with at least this CC will be incorporated into the current model for refinement.
-b NUMBER, –best NUMBER	launch only for mtz with best completeness, NUMBER indicates how many mtz to treat (default 1), ex: –best 2
-a, –autoproc	launch only for mtz from autoPROC, or if none launch for mtz with best completeness
-w, –whole	launch for the whole mtz processes
-s, –simulate	only simulate, generate a csv file according to the future launch options. Give the possibility to modify the csv file to choose not to launch certain treatments, before launching again without simulation mode.
-t TEMPLATE, –template TEMPLATE	optional template name for log files and result folders, in case re-launching with different reference pdb of the same space group (else will be treated in existing folder and not launched again since it already exists).

Example:

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --best 2 --cclig 0.6
equivalent to:
$ easypipe.py PROCESSED_DATA launch my_ref_folder -m full -l -b 2 -c 0.6

What does it do ?¶

1. Sort mtz files according to space group in reference pdb, and decreasing completeness¶

If there are datasets without any mtz to treat according to space group, these datasets are listed in another csv file (“datasets_without_mtz_<sg_ref>.csv”).

2. List mtz files according to option ‘best’, ‘autoproc’ or ‘whole’¶

Option example: --best 1 (default)

List only mtz with best completeness for each dataset.

Option example: --best 2

List only 2 first mtz, when exist, with best completeness, for each dataset.

Option example: --autoproc

List only mtz from autoPROC, or if none list mtz with best completeness, for each dataset.

Option example: --whole

Whereas it is not recommended because it is time demanding, for problematic data it could be usefull to treat the whole mtz processed. You also can launch –whole option in simulate mode, and choose for processes to be treated or not, before launching again.

3. List mtz files with mode and ligand information for running Phenix¶

Note

Phenix options for the different modes are specified hereafter.

For each dataset, write in a ‘launch csv’ file:

if ligand cif file is present for search when asked
mode that will be launched depending on mode asked, the presence (or not) of ligand cif file and data quality
information in case mode is different from mode asked
result folder name

Limits for poor data: There are minimum limits to process in ‘full’ or ‘allsg’ modes. These limits can be modified in config.py file (after what easypipe should be reinstalled).

minimum completeness (default = 70%)

minimum resolution (default = 3.75)

Poor data will be treated in ‘fast’ mode.

Option examples:

Option example: --mode fast (default)

Phenix uses a simple rigid-body refinement for model placement, which is faster and most of the time sufficient if the input model is already close enough to the target structure.

Option example: --mode full

Phenix will try rigid-body refinement first, then run Phaser if the R-free is too high (>0.4), it will run AutoBuild after initial refinement only if R-free is greater than the max_r_free cutoff = 0.3.

Option example: --mode allsg

In this mode, mtz will be treated regardless of the space group. Phenix will run Phaser, then run AutoBuild after initial refinement only if R-free is greater than the max_r_free cutoff = 0.3.

Option example: --mode full –lig

Phenix will be run in ‘full’ mode. Then ligand will be searched with LigandFit [2] and placed if cutoff model-to-map CC is more than 0.7 (default). This cutoff can be changed with ‘–cclig’ option. The number of ligands to be placed (default=1) can be changed with ‘–nblig’ option.

4. Launch Phenix according to chosen mode and options - Simulation mode¶

phenix.ligand_pipeline [1] is launched for each mtz file according to chosen mode and options, as listed in the ‘launch csv’ file (see 3. above).

If this ‘launch csv’ exists and you have modified something like adding a ligand cif for example, ‘launch’ mode should be run again, but in simulation mode so as it generates a new correct launch csv file instead of using existing one. When a new ‘launch csv’ file has been generated, just run the same command without simulation mode.

Example:

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --autoproc --simulate
then:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --autoproc

Simulation mode also allows to modify the ‘to treat’ column of the ‘launch csv’ file (replacing ‘yes’ by ‘no’). Useful if you want to run some options only on some mtz. Then just run the same command without simulation mode. You can also modify the following columns: ‘mode’, ‘ligand search’, ‘CC’, ‘nb ligands’, as long as you know what you are doing.

5. Write results¶

At the end of each ‘launch’ subcommand, results are copied in a ‘RESULTS’ folder.

In datasets folders, copy of:

corresponding processed data and logs (useful for deposition at the PDB)

pdb and mtz result files

phenix cif file if ligand found

ligand folder, if exists

pdb of ligand(s) placed by LigandFit (all CC)

In a ‘ALL_pdb_mtz’ folder, for a rapid and easier visualization of all results in a same coot session, copy of all the pdb and mtz files of datasets treated in this ‘RESULTS’ folder.

In a ‘_mtz_treated’ folder, copy of:

csv listing datasets without mtz file

csv with mtz list

csv with mtz list after reindexing

csv with mtz list sorted according to reference space group

all ‘launch’ csv files, with a counter at the end of the names in case of several launches (with handmade modifications of launch csv file for example)

For each ‘launch’ subcommand, a csv file is created that summarizes the corresponding results for each dataset, with information on:

success of Phenix

failing step (in case success = no)

resolution (from pdb file, if failed from mtz data file)

completeness (from pdb file, if failed from mtz data file)

Rwork / Rfree

space group (from pdb file, if failed from mtz data file)

unit cell (from pdb file, if failed from mtz data file)

if ligand has been placed, number of ligands found, corresponding CC

Option example: -a –mode full –lig –nblig 9 –cclig 0.6

_images/RESULT-file-a-Full-lig_nblig9_CC060_NEW-cellparam.jpg

Finally, the results of all ‘launch’ subcommands you have run are compiled by running automatically the ‘summary’ subcommand.

Note

If there are several RESULTS folders (case when launched for several space groups, or different templates), a global SUMMARY file that compiles all SUMMARY files can be created by manually running ‘summary’ subcommand.

Phenix options according to modes (only for information)¶

phenix.ligand_pipeline [1] options are the following:

common options:

nproc=Auto

preserve_chain_id=True: Preserves the original chain ID

refine.after_ligand.hydrogens=False: Hydrogen atoms won’t be added prior to the final refinement step (else refinement significantly slower)

prune=False: disable Prune the model after refinement to remove residues and sidechains in poor density

keep_hetatms=True: prevent Phaser from resetting HETATMs occupancies to zero

refine.after_mr.update_waters=False: don’t add/remove waters automatically

‘fast’ mode:

skip_xtriage=True

mr=False: rigid-body refinement will be used

quick_refine=True: which will shorten both refinement steps from 6 to 3 cycles, and disable weight optimization.

build=False

skip_ligand=True

reference_structure=’model.pdb’: If specified, phenix.find_alt_orig_sym_mate will be applied to map the solution to the reference structure (not working when Phaser with several monomers)

‘full’ mode:

mr=Auto: the program will try rigid-body refinement first, then run Phaser if the R-free is too high (>0.4)

build=Auto: Run AutoBuild after initial refinement. By default, this will be done if R-free is greater than the max_r_free cutoff = 0.3

autobuild.quick=True: Run AutoBuild in quick mode. Inferior results, but a huge time-saver

quick_refine=True: which will shorten both refinement steps from 6 to 3 cycles, and disable weight optimization.

‘allsg’ mode:

mr=True

quick_refine=False

if ligand search:

ligand_copies=1 (except if option –nblig >1)

keep_input_restraints=True : if the input files include pre-calculated restraints for the target ligand, eLBOW will propagate these restraints instead of generating new ones.

easYPipe ‘launch’¶

Usage¶

What does it do ?¶

1. Sort mtz files according to space group in reference pdb, and decreasing completeness¶

2. List mtz files according to option ‘best’, ‘autoproc’ or ‘whole’¶

3. List mtz files with mode and ligand information for running Phenix¶

4. Launch Phenix according to chosen mode and options - Simulation mode¶

5. Write results¶

Phenix options according to modes (only for information)¶

References¶

Table of Contents

Previous topic

Next topic

This Page