Bayesian polishing

relion also implements a Bayesian approach to per-particle, reference-based beam-induced motion correction. This approachs aims to optimise a regularised likelihood, which allows us to associate with each hypothetical set of particle trajectories a prior likelihood that favors spatially coherent and temporally smooth motion without imposing any hard constraints. The smoothness prior term requires three parameters that describe the statistics of the observed motion. To estimate the prior that yields the best motion tracks for this particular data set, we can first run the program in ‘training mode’. Once the estimates have been obtained, one can then run the program again to fit tracks for the motion of all particles in the data set and to produce adequately weighted averages of the aligned movie frames.

Running in training mode

Using 16 threads in parallel, this job took 1 hour and 15 minutes on our computer. If you do not want to wait for this, you can just proceed to the next section and use the sigma-values from our precalculated results, which are already given in that section.

If you do want to run this job yourself, on the I/O tab of the Bayesian polishing job-type set:

Micrographs (from MotionCorr)::

MotionCorr/relioncor2/corrected_micrographs.star

(It is important that this Motion correction job has been run in relion-3.0, or above. Motion correction jobs run relion-2.1 or below will NOT work, as required metadata about the motion correction is not written out.

Particles (from Refine3D or CtfRefine)::

Refine3D/ctfrefined/particles_ctf_refine.star

(These particles will be polished)

Postprocess STAR file:

PostProcess/ctfrefined/postprocess.star

(the mask and FSC curve from this job will be used in the polishing prceodure.)

First movie frame::

1

Last movie frame::

-1

(Some people throw away the first or last frames from their movies. Note that this is not recommended when performing Bayesian polishing in relion. The B-factor weighting of the movie frames will automatically optimise the signal-to-noise ratio in the shiny particles, so it is best to include all movie frames.)

On the Train tab set:

Train optimal parameters?:

Yes

Fraction of Fourier pixels for testing::

0.5

(Just leave the default here)

Use this many particles::

4000

(That’s almost all we have anyway. Note that the more particles, the more RAM this program will take. If you run out of memory, try training with fewer particles. Using much fewer than 4000 particles is not recommended.)

On the Polish tab make sure you set:

Perform particle polishing?:

No

Note that the training step of this program has not been MPI-parallelised. Therefore, make sure you use only a single MPI process. We ran the program with 16 threads to speed it up. Still, the calculation took more than 1 hour. We used an alias of train.

Running in polishing mode

Once the training step is finished, the program will write out a text file called Polish/train/opt_params.txt. To use these parameters to polish your particles, click on the job-type menu on the left to select a new Bayesian polishing job. Keep the parameters on the I/O tab the same as before, and on the Train tab, make sure you switch the training off. Then, on the Polish tab set:

Perform particle polishing?:

Yes

Optimised parameter file::

Polish/train/opt_params.txt

OR use your own parameters?:

No

Minimum resolution for B-factor fit (A)::

20

Maximum resolution for B-factor fit (A)::

-1

(just leave the defaults for these last two parameters)

Alternatively, if you decided to skip the training set, then you can fill in the Polish tab with the sigma-parameters that we obtained in our run:

Perform particle polishing?:

Yes

Optimised parameter file::

(leave this empty to use the optimal parameters we got as per below.)

OR use your own parameters?:

Yes

Sigma for velocity (A/dose):

0.42

Sigma for divergence (A):

1600

Sigma for acceleration (A/dose):

2.61

Minimum resolution for B-factor fit (A)::

20

Maximum resolution for B-factor fit (A)::

-1

(just leave the defaults for these last two parameters)

This part of the program is MPI-parallelised. Using 3 MPI processes, each with 16 threads, our run finished in two minutes. We used an alias of polish.

Analysing the results

The Bayesian polishing job outputs a STAR file with the polished particles called shiny.star and a PDF logfile. The latter contains plots of the scale and B-factors used for the radiation-damage weighting, plus plots of the refined particle tracks for all included particles on all micrographs. Looking at the plots for this data set, it appeared that the stage was a bit drifty: almost all particles move from the top right to the bottom left during the movies.

After polishing, the signal-to-noise ratio in the particles has improved, and one should submit a new 3D auto-refine job and a corrsponding Post-processing job. We chose to run the 3D auto-refine job with the shiny particles using the following option on the I/O tab:

Reference mask (optional)::

MaskCreate/first3dref/mask.mrc

(this is the mask we made for the first Post-processing job. Using this option, the solvent will be set to zero for all pixels outside the mask. This reduces noise in the reference, and thus lead to better orientation assignments and thus reconstructions.)

and this option on the Optmisation tab:

Use solvent-flattened FSCs?:

Yes

(Using this option, the refinement will use a solvent-correction on the gold-standard FSC curve at every iteration, very much like the one used in the Post-processing job-type. This option is particularly useful when the protein occupies a relatively small volume inside the particle box, e.g. with very elongated molecules, or when one focusses refinement on a small part using a mask. The default way of calculating FSCs in the 3D auto-refinement is without masking the (gold-standard) half-maps, which systematically under-estimates the resolution during refinement. This is remediated by calculating phase-randomised solvent-corrected FSC curves at every iteration, and this generally leads to a noticeable improvement in resolution.)

As you can see in the pre-calculated results, we obtained a final resolution just beyond 2.8 Å. Not bad for 3GB of data, right?

When and how to run CTF refinement and Bayesian polishing

Both Bayesian polishing and CTF refinement, which comprises per-particle defocus, magnification and higher-order aberration estimation, may improve the resolution of the reconstruction. This raises a question of which one to apply first. In this example, we first refined the aberrations, the magnification, and then the per-particle defocus values. We then followed up with polishing, but we could have also performed the polishing before any of the CTF refinements. Both approaches benefit from higher resolution models, so an iterative procedure may be beneficial. For example, one could repeat the CTF refinement after the Bayesian polishing. In general, it is probably best to tackle the biggest problem first, and some trial and error may be necessary.

Moreover, we have seen for some cases that the training prodcedure of Bayesian polishing yields inconsistent results: i.e. multiple runs yield very different sigma values. However, we have also observed that often the actual sigma values used for the polishing do not matter much for the resolution of the map after re-refining the shiny particles. Therefore, and also because the training is computationally expensive, it may be just as well to run the polishing directly with the default parameters (\(\sigma_{\text{vel}}=0.2; \sigma_{\text{div}}=5000; \sigma_{\text{acc}}=2\)), i.e. without training for your specific data set.