easYPipe ‘prep’

Important

This step is a first mandatory step for the preparation of the data.

Usage

easypipe.py data prep [-h]

Example:

$ easypipe.py PROCESSED_DATA prep

How the data should be organized ?

The data folder (whatever it’s name) must contain only datasets folders.

Within each dataset folder, the processed data can be organized in several ways:

  • a mtz file directly in dataset folder

  • a mtz file in a sub-folder, or in a sub-sub-folder … of dataset folder

  • several processes are possible for a dataset, better if they are in different sub-folders, but not mandatory

  • if several mtz files are present in the same sub-folder, only the ones fitting the templates (from EDNA processes) will be treated, or if none fits only the first mtz file will be considered

_images/how-data-should-be.jpg

Note

Data downloaded with easYGet are directly in the right tree organization.

What does it do ?

In an ‘easYPipe’ folder created at the place where it is executed, ‘prep’ copies each processed data mtz in a sub-folder of the dataset in this way:

  • creation of an ‘easYPipe’ treatment directory where it is run

  • creation of a subdirectory ‘0_processed_datasets’ where all the datasets folder are created

  • creation of a ‘data’ folder in each dataset folder and copy in this folder of processed mtz and log files

  • if there are several mtz in a folder, search for ‘EDNA’ treatment template and selects the right mtz files

Note

if you add a process for a dataset after a first ‘prep’, you can launch ‘prep’ sub-command again, this process will be added to the processes already copied

Then:

  • launch of xtriage [1] for each mtz to get resolution, completeness, space group and cell parameters

_images/processed_datasets_tree_New2.jpg
  • information on mtz files to be treated written in ‘/easypipe/1a_prep/mtz_to_treat_ALL.csv’ file

_images/mtz-to-treat-ALL-csv_NEW.jpg

-creation of a csv file ‘/easypipe/1c_ligands/ligands_for_datasets.csv’ for future ligand generation with eLBOW [2]

_images/csv-ligands-for-datasets.jpg

You have to fill ‘ligand name’ and ‘ligand smiles’ fields before running ‘easYPipe ligands subcommand’.

Caution

Save the modified csv file somewhere else or with another name if you don’t want to overwrite it in case you launch ‘prep’ sub-command again …

You can also run ‘easYPipe reindex subcommand’ if some mtz should be in higher symmetry space group.

If you are not interested in ligand placement or reindexation, you can directly run ‘easYPipe launch subcommand’.

References