easYPipe quickstart guide¶
1. Retrieve and organize your processed data¶
Note
You can retrieve your processed data from synchrotron using easYGet.
Processed data should be in datasets folders, all grouped in a folder. More information on how to organize your data here.
2. Prepare the data with ‘prep’¶
The first step is intended to list mtz to be treated:
$ easypipe.py PROCESSED_DATA prep
where here ‘PROCESSED_DATA’ is the folder with your datasets.
Warning
For Windows users, $ is the Linux prompt that corresponds to C:> in Windows command prompt, and should not be written
Now, you can have a look at /easypipe/1a_prep/mtz_to_treat_ALL.csv file that lists mtz found in your processed data with information like resolution, completeness or space group.
For more details on this step see here.
3. Reindex if necessary with ‘reindex’¶
If you see that some mtz should be in higher symmetry space group (in /easypipe/1a_prep/mtz_to_treat_ALL.csv file), then you can try to reindex.
Run:
$ easypipe.py PROCESSED_DATA reindex P41212
equivalent to:
$ easypipe.py PROCESSED_DATA reindex 92
For more details on this step see here.
4. Add ligands with ‘ligands’¶
This step is necessary if you want Phenix to try to find and place ligands, or if you want to automatically generate the CIF and PDB of your ligands.
First, you have to fill in the fields <ligand name> and <ligand smiles> of /1c_ligands/ligands_for_datasets.csv file.
Then, run:
$ easypipe.py PROCESSED_DATA ligands easYPipe/1c_ligands/ligands_for_datasets_OK.csv
where here ligands_for_datasets_OK.csv is the name of your filled ligand csv file.
For more details on this step see here.
5. Process the data with ‘launch’¶
Now you can run Phenix on your processed mtz.
Mode¶
Default mode, is ‘fast’ mode. This mode uses rigid body refinement and can be run to get a first result rapidly.
Example:
$ easypipe.py PROCESSED_DATA launch my_ref_folder
where my_ref_folder gather fasta file and pdb files for replacement, and cif file if there is a ligand in the model.
Warning
pdb files should include the row starting with ‘CRYST1’ containing information on space group
Now, have a look at your results in the corresponding ‘RESULTS’ csv file.
If some processes failed, they probably need longer calculations. You can try ‘full’ mode:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full
In case your protein changes its space group, with ligand for example, you can ask not to fix space group. As a result, all mtz could be treated even with ‘bad’ space group. The duration for this will be much longer. But you can only do it for some using simulation mode first (see above):
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode allsg
Ligand search¶
If you want LigandFit to place ligands, you first have to run ‘ligand’ subcommand (see above).
Then just add ‘–lig’ option:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig
The default cutoff for LigandFit to place a ligand is 0.7, but you can change it if you see that it is too high, with ‘–cclig’ option:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --cclig 0.6
If several ligands are supposed to fix, you can ask for LigandFit to place more than one ligand, with ‘–nblig’ option:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --cclig 0.6 --nblig 5
Datasets to treat¶
Default behavior is to run phenix.ligand_pipeline on the mtz of best completeness for each dataset, you can start with it.
If there are failures in the treatment of ‘best completeness’ mtz, you can try to treat a higher number of mtz for each dataset.
You can first start by running on mtz from autoPROC process which is generally a good compromise between resolution and completeness:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --autoproc
Or you can run on the two first mtz of best completeness for each dataset:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --best 2
or more …:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --best 5
or on the whole processed mtz files:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --whole
If only some datasets are problematic, you can run in simulation mode first, modify the corresponding ‘launch’ csv file in /easYPipe/2_launch/ (replace ‘yes’ by ‘no’ in the ‘to treat’ column, for those not to process), then run again:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --whole --simulate
then, after modification of the 'launch' csv file:
$ easypipe.py PROCESSED_DATA launch my_ref_folder --mode full --lig --whole
Then, only selected mtz will be treated, reducing the duration of the treatment.
For more details on this step see here.
6. Compile results in a summary file¶
If you have run several times the ‘launch’ subcommand, you will have several ‘RESULT’ csv files in the RESULTS folder, that you probably wish to compile and clean.
The ‘summary’ subcommand is automatically run at the end of each ‘launch’ subcommand.
If you have done several ‘launch’ with different space group for example, you will have to run manually the ‘summary’ subcommand.
Then a global SUMMARY file will be created, that compiles all SUMMARY files present in RESULTS folders.
Then run:
$ easypipe.py PROCESSED_DATA summary
7. Automatic mode¶
This mode allows to run main easYPipe steps (prep, reindex, launch, summary) without any intervention. It could be a good starting point before running more ‘launch’ commands or ligand search.
Example:
$ easypipe.py PROCESSED_DATA auto my_ref_folder --best 2 --mode full
Note
Ligand search is not supported at this time in this mode.
For more details on this mode see here.