MACROMOLECULAR CHARGE FLIPPING


Adapting the Charge Flipping algorithm to biological macromolecule diffraction data.

The charge-flipping algorithm introduced by Oszlányi and Sütő for single crystals in 2004 has been adapted to accommodate protein crystals diffraction data in the computer program SUPERFLIP. A flow diagram of the procedure is given below.

Two main applications are described:

* ab initio procedure for the determination of protein crystal structures using diffraction data at atomic resolution;

* procedure for heavy atom or anomalous scatterers substructure determination from isomorphous or anomalous differences.


Recent changes:

 14 december 2015: 


                                                         Flowchart                                          

References:


SUPERFLIP program and utilities:

We refer to the official SUPERFLIP site at the Department of Structure Analysis, Institut of Physics, Praha and the École Polytechnique Fédérale de Lausanne (EPFL) for source files, documentation, and license agreement

Download source code or the appropriate binaries for your system => Current Version: 02/13/14  8:48

Source code,  executables for MacOSX (Intel) or Windows, GNU-Linux x86 (32-bit statically linked) or GNU-Linux x86-64 (64-bit statically linked)

Uncompress the binary, rename it to superflip, make it executable (chmod +x superflip) and move it in your $PATH  (/usr/local/bin or ~/bin are good places).

Macromolecular structures can be solved by SUPERFLIP in two ways:

* by setting up an input file to be used with a user-provided hkl-file and running superflip program: $ superflip example.inflip

      Two examples (input and log files) can be found here:

        ◊   heavy atom sub-structure solution:

        ◊   ab initio structure solution at atomic resolution  protein.inflip   protein.sflog         

* by using C-shell scripts (recommended for Linux and Mac OSX): flipsub for heavy-atom substructure solution and fliprot for ab initio structure solution at atomic resolution.

                       These scripts create the SUPERFLIP input file on the fly using a limited number of command line options,

Download the C-shell scripts using CCP4 environment (version 6.5 or 6.4.x) :    
   fliprot (version 06/01/2015)        flipsub (version 14/12/2015)

The user should install fliprot or flipsub file in a path directory (see your $PATH) and make them executable (chmod +x flipsub fliprot).

If you do not have either csh or tcsh installed on your computer: sudo apt-get install tcsh (on Ubuntu) or yum install tcsh (on Fedora, Redhat, Centos).

Various application examples follow here.


Examples of applications and test data

Ab initio protein structure solution at atomic resolution (beyond 1.1 - 1.2 Å)

usage: fliprot mydata.mtz FP=label      or     fliprot 4xxx-sf.cif

where:
mydata.mtz (or pdbcode-sf.cif) input structure factor file in MTZ or mmCIF format.

optional key words:
SG=18         ...... space group number (read from mtz file, required for some mmCIF files)
FP=Fobs       ...... MTZ label assignment for amplitude (default FP=FP, not required for mmCIF format)
name=flip     ...... generic name for output files (default fliprot)
1.05A         ...... dmin resolution (default all input reflections, no resolution cutoff)
ked=1.25      ...... coefficient for delta threshold parameter (default 1.3)
weak=0.1      ...... weak reflection threshold (default 0.05)
trial=5       ...... number of repeated trials (default 1 repeat=never)
maxcycl=5000  ...... maximum number of cycles per trial (default 2000)
mode=peakiness...... convergence detection mode=peakiness (default for input SG=
P1) or symmetry (default for non-P1 space group)
conv=84.0     ...... convergence threshold criterion (default 75.0 for mode=symmetry, default 3.0 for mode=peakiness)




example 1:

Test data used: pdb code 1mfm
1152  non-H protein atoms, 283 waters &
Cd/Cu/Zn atoms in the asymmetric unit, space group P212121, 1.03 Å resolution

Ab initio phasing of superoxyde dismutase using charge flipping:
C. Dumas & A. van der Lee, Acta Cryst. D64, 864-873

Download 1mfm-sf.cif and 1MFM.pdb from PDB site 
and use it as input file for
fliprot script.



Command:   fliprot 1mfm-sf.cif name=mfm

The procedure optionally asks SG number (here SG=19) and unit cell parameters (if not in the cif file):
  CRYST1 from pdb file:   34.99  48.11  81.08   90.0 90.0 90.0

Annotated log file (typical cpu-time 2 to 3 minutes on an Intel 2.4GHz cpu processor)

Then use mfm.mtz file for automatic model building (  ARP/wARP  or Phenix.AutoBuild softwares  ) 

The quality of the phase determination by CFA can be evaluated by superimposition of the resulting map and the reference model (1mfm.pdb) as shown on this figure.

Typically, the correct enantiomorph will produce an overall correlation coefficient CC=0.8.  Use the following PHENIX commands

phenix.get_cc_mtz_pdb mfm.mtz 1MFM.pdb any_offset=true labin="FP=Fobs PHIB=PHIcf"

phenix.get_cc_mtz_pdb mfm.mtz 1MFM.pdb any_offset=true labin="FP=Fobs PHIB=PHIcfi"

Display mfm.map and offset.pdb using COOT or CHIMERA.



example 2:

Test data used:  pdb code 2anv  [PubMed]

2385  non-H atoms, 517 waters &
(Sm,I,Mg,SO4) atoms in the asymmetric unit, space group C2

Ab initio phasing of lysozyme from f22 bacteriophage using 
charge flipping: electron density map at 1.04 Å
resolution 
(C. Dumas & A. van der Lee, Acta Cryst. D64, 864-873)

Download 2anv-sf.cif  and 2ANV.pdb from PDB site and 
use it as input file for
fliprot script.


Command  fliprot  2anv-sf.cif  name=anv

Annotated log file (typical cpu-time 3 to 5 minutes on an Intel 2.4GHz cpu processor)

Then use anv.mtz file for automatic model building (ARP/wARP  or Phenix.AutoBuild softwares)

The quality of the phase determination by CFA can be evaluated by superimposition of the resulting map and the reference model (2anv.pdb) as shown on this figure.

Typically, the correct enantiomorph will produce an overall correlation coefficient CC=0.8.  Use the following PHENIX commands


phenix.get_cc_mtz_pdb anv.mtz 2ANV.pdb any_offset=true labin="FP=FP PHIB=PHIcf"
 phenix.get_cc_mtz_pdb anv.mtz 2ANV.pdb any_offset=true labin="FP=FP PHIB=PHIcfi"

Display anv.map and offset.pdb using COOT or CHIMERA.


 
Heavy atom or anomalous scatterers substructure determination

Procedure for heavy atom or anomalous scatterers substructure determination from anomalous  or isomorphous differences. The input reflection data file is in mtz or scalepack format.   
New option:   Phenix.autosol can be used directly from flipsub in order to solve the SAD phase problem and build a model.


Usage:      flipsub sad.mtz DANO=label-name     or      flipsub sad.mtz   F1=label1 F2=label2   

This command is used to solve the heavy-atom substructure using anomalous diffraction data from sad.mtz file (anomalous differences data correspond to label columns DANO=label or F1=label1 F2=label2. The generic-name is used to create output files (pdb, map and log). The optional parameter 3A means that the data up to 3 angstrom resolution were used (default uses all data).

optional keywords: (command flipsub -h )
name=HAtest   ...... generic name for output files (default flipsub)
2.5A          ...... high resolution cutoff (default all input reflections, no resolution cutoff)
conv=4.0      ...... convergence criterion threshold (for peakiness mode, default 2.5 and for symmetry mode 85.0)
norm=wilson   ...... normalization of amplitude differences using Wilson method (default norm=local)
ked=1.15      ...... coefficient for delta flipping  parameter (default 1.25)
weak=0.25     ...... weak reflection threshold (default 0.15)
trial=10      ...... number of repeated trials (default 5)
maxcycl=3000  ...... maximum number of cycles per trial (default 2000)

sites=10      ...... number of expected heavy atom sites in the asymmetric unit (required for norm=wilson and solve option)
full          ...... automatic exploration of ked/weak parameters and resolution cutoff (see below)
verbose=no    ...... verbose mode off (default verbose=yes except for full mode)
solve         ...... submit phenix.autosolve if substructure determination successful. Additional keywords required: seq_file, lambda, copies
atnam=Br      ...... atom type (used by phenix.autosol) (Default Se atoms)
lambda        ...... x-rays wavelength (used by phenix.autosol)  (Default value lambda=0.9790 for Se peak)

copies        ...... number of NCS copies (used by phenix.autosol)   (Default copies=1)

seq_file=prot.fasta  protein sequence file (used by phenix.autosolv). If not given, a poly-Ala backbone is built.


If necessary, in difficult cases, flipsub automatically explores various combinations of ked {1.15, 1.2, 1.25} and weak {0.15, 0.2, 0.3} parameters,
and also tries several resolution cutoffs. In this case, add the keywork
full:


flipsub sad.mtz  F1="F(+)" F2="F(-)"  name=CFA4   full

output files:
CCP4 CF heavy-atom map in P1 space group   ............. generic-name.map
PDB file for Heavy-atom positions (asymmetric unit)   .. generic-name-au.pdb
Heavy atom positions in fractional coordinates   ....... generic-name-au.ha

SUPERFLIP log file               ....................... generic-name.sflog
flipsub log file                  ...................... generic-name.log

Phenix.autosol log file ................................. generic-name-autosol.log

directory for Phenix.Autosol wizard...................... AutoSol_run_#

The resulting coordinate file generic-name-au.pdb or generic-name-au.ha can be used as input file for your favourite phasing program SHARP, PHENIX (Autosol/Phaser-EP), CCP4,

Typically edit the xxx-au.pdb file (or xxx-au.ha file, in fractional units) to select the appropriate number of heavy-atom sites in the asymmetric unit and remove non-significant sites. 


Various test datasets for MAD, SAD phasing are available here:  

example 1:   Locating heavy-atom substructure containing 20-22 bromide sites used for SAD phasing.
Download sfdata-haptbr.tgz   (AUTOSTRUCT / CCP4 site)  untar the archive and use haptbr.mtz as input data for flipsub script 
Commands:       flipsub   haptbr.mtz   DANO="DANO"   2A  name=haptbr
                                   (using normalized anomalous differences up to 2 Å resolution, using symmetry score to detect convergence)

            flipsub   haptbr.mtz   DANO="DANO"   name=haptbr  atnam=Br  lambda=0.92  solve
                                 (using normalized anomalous difference and solve the structure using phenix.autosol and build a poly-Ala model)


example 2:   Locating heavy-atom substructure containing 40 selenium sites used for SAD/MAD phasing (P1 space group).

Download sfdata-cynsemet.tgz , (AUTOSTRUCT / CCP4 site)  untar the archive and use cynsemet.mtz file as input data for flipsub scripts 

Commands:        flipsub  cynsemet.mtz  DANO="DANO_SE3"  name=cyn-pk
            (using normalized anomalous differences in the peak wavelength dataset, no resolution cutoff). The best trial gives 40 sites with 0.31Å rmsd (Se sites in PDB)

 
                          flipsub  cynsemet.mtz    F1="F_SE4"   F2="F_SE2"    name=diff42
            (using normalized amplitude differences between high energy remote and inflection wavelength data). The best trial gives 40 sites with 0.27Å rmsd (Se sites in PDB)

                         flipsub  cynsemet.mtz  2.6A  DANO="DANO_SE2"  name=cyn-ip
            (using normalized anomalous differences in the inflection wavelength dataset, up to 2.6 Å resolution)
                          flipsub  cynsemet.mtz  DANO="DANO_SE3"  name=cyn-pk  sites=40  solve
            (using anomalous differences in the peak wavelength dataset, no resolution cutoff and submit phenix.autosol for phasing and poly-Ala model building)


example 3:   Locating heavy-atom substructure containing 8 selenium sites.

Download sfdata-jia.tgz  (AUTOSTRUCT / CCP4 site)  untar the archive and use jia_peak.sca file (scalepack format) as input data for flipsub script:
Commands:       flipsub  jia_peak.sca  name=jia_peak
            (use normalized anomalous differences in the peak wavelength dataset, default parameters (ked, weak,5 trials), maximum resolution available.)
                       flipsub  jia_peak.sca  5A
            (using normalized anomalous differences in the peak wavelength dataset and 5 Å resolution cutoff.)
                       flipsub  jia_peak.sca  name=jia_peak2  trial=2  sites=8  copies=2  solve
            (use the peak wavelength dataset, 8 selenium sites predicted, only 2 trials of CFA then solve the SAD phases and built a poly-Ala model with phenix.solve).


example 4:   Locating heavy-atom substructure in 4CYI SAD data, 72 SeMet residues, space group P1.
Download the SAD dataset 4CYI-sf.cif (cif format) as input data for flipsub script:
    flipsub  4cyi-sf.cif   name=cyi   trial=2   
                      (typically, 62-65 Se sites were found with 0.5 to 0.6 Å rms coordinate differences with refined Se atoms in PDB file.)


example 5:   Locating heavy-atom substructure in CSN5 crystal SAD data, 20 SeMet residues and 2 zinc atoms (4F7O.pdb) in the asymmetric unit.

Download the SAD dataset 4F7O.mtz (mtz format, 2.6 Å resolution) as input data for flipsub script:

 Command:         flipsub  4F7O.mtz  DANO="DANO_x1"  name=CSN5
The substructure solution is solved in P1 by SUPERFLIP, using symmetry score to detect convergence (See log file).

The averaged HA map (best densities # 1, 6, 7 and 9) was used to extract heavy atom sites.

Using Phenix.autosol in flipsub in order to solve the SAD phase problem and built a model.
Download the CSN5 protein sequence csn5.fasta and restart flipsub using parameters for phenix.autosol: 22 heavy atom sites predicted for two molecules in the asymmetric unit (copies=2), wavelength in A. The flag solve indicates that the heavy atoms sites (bestxx-CSN5-au.pdb file) will be used for SAD phasing and model building (see directory Autosol_x).

Command:         
flipsub
  4F7O.mtz  DANO="DANO_x1"  name=CSN5p   sites=22   atnam=Se   copies=2    seq_file=csn5.fasta    lambda=0.9887   solve 


Other links:

RSCB Protein Data Bank: atomic coordinate files and structure factors of biological macromolecules;
CCP4:
software suite for macromolecular crystallography;

SHARP: software suite for experimental phasing of macromolecular crystal structures;
Phenix: software suite for the automated determination of macromolecular crystal structures;
SHELX: software suite for crystal structure determination from single-crystal diffraction data;
Chimera: visualisation of electron density maps;
Coot:
visualisation of electron density maps and model building;
Uppsala Software Factory:
  software for macromolecular crystallography;
ARP/wARP:
interpretation of  electron density maps and automatic construction of macromolecular models.



Contact information                    Christian.Dumas @ cbs.cnrs.fr     or      avderlee @ univ-montp2.fr


Last modifications: december 23, 2015