easYPipe quickstart guide

1. Retrieve and organize your processed data

Note

You can retrieve your processed data from synchrotron using easYGet.

Processed data should be in datasets folders, all grouped in a folder. More information on how to organize your data here.

2. Prepare the data with ‘prep’

The first step is intended to list mtz to be treated:

$ easypipe.py PROCESSED_DATA prep

where here ‘PROCESSED_DATA’ is the folder with your datasets.

Warning

For Windows users, $ is the Linux prompt that corresponds to C:> in Windows command prompt, and should not be written

Now, you can have a look at /easypipe/1a_prep/mtz_to_treat_ALL.csv file that lists mtz found in your processed data with information like resolution, completeness or space group.

For more details on this step see here.

3. Reindex if necessary with ‘reindex’

If you see that some mtz should be in higher symmetry space group (in /easypipe/1a_prep/mtz_to_treat_ALL.csv file), then you can try to reindex.

Run:

$ easypipe.py PROCESSED_DATA reindex P41212
equivalent to:
$ easypipe.py PROCESSED_DATA reindex 92

For more details on this step see here.

4. Add ligands with ‘ligands’

This step is necessary if you want Phenix to try to find and place ligands, or if you want to automatically generate the CIF and PDB of your ligands.

First, you have to fill in the fields <ligand name> and <ligand smiles> of /1c_ligands/ligands_for_datasets.csv file.

Then, run:

$ easypipe.py PROCESSED_DATA ligands easYPipe/1c_ligands/ligands_for_datasets_OK.csv

where here ligands_for_datasets_OK.csv is the name of your filled ligand csv file.

For more details on this step see here.

5. Process the data with ‘launch’

Now you can run Phenix on your processed mtz.

Mode

Default mode, is ‘fast’ mode. This mode uses rigid body refinement and can be run to get a first result rapidly.

Example:

$ easypipe.py PROCESSED_DATA launch my_ref_folder

where my_ref_folder gather fasta file and pdb files for replacement, and cif file if there is a ligand in the model.

Warning

pdb files should include the row starting with ‘CRYST1’ containing information on space group

Now, have a look at your results in the corresponding ‘RESULTS’ csv file.

If some processes failed, they probably need longer calculations. You can try ‘full’ mode:

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full

In case your protein changes its space group, with ligand for example, you can ask not to fix space group. As a result, all mtz could be treated even with ‘bad’ space group. The duration for this will be much longer. But you can only do it for some using simulation mode first (see above):

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode allsg

Datasets to treat

Default behavior is to run phenix.ligand_pipeline on the mtz of best completeness for each dataset, you can start with it.

If there are failures in the treatment of ‘best completeness’ mtz, you can try to treat a higher number of mtz for each dataset.

You can first start by running on mtz from autoPROC process which is generally a good compromise between resolution and completeness:

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --autoproc

Or you can run on the two first mtz of best completeness for each dataset:

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --best 2

or more …:

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --best 5

or on the whole processed mtz files:

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --whole

If only some datasets are problematic, you can run in simulation mode first, modify the corresponding ‘launch’ csv file in /easYPipe/2_launch/ (replace ‘yes’ by ‘no’ in the ‘to treat’ column, for those not to process), then run again:

$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --whole --simulate
then, after modification of the 'launch' csv file:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --whole

Then, only selected mtz will be treated, reducing the duration of the treatment.

For more details on this step see here.

6. Compile results in a summary file

If you have run several times ‘launch’ subcommand, you will have several ‘RESULT’ csv files that you probably wish to compile and clean.

Then run:

$ easypipe.py PROCESSED_DATA summary

7. Automatic mode

This mode allows to run main easYPipe steps (prep, reindex, launch, summary) without any intervention. It could be a good starting point before running more ‘launch’ commands or ligand search.

Example:

$ easypipe.py PROCESSED_DATA auto my_ref_folder --best 2 --mode full

Note

Ligand search is not supported at this time in this mode.

For more details on this mode see here.