FARFAR2 Server Documentation





Overview:

FARFAR2 is a tool for building a 'full model' of a medium-size noncoding RNA. The goal of FARFAR2 is to provide a flexible and extensible fragment assembly protocol, capable of stitching together multiple subsections of an RNA structure or building a model entirely fron scratch. We have exposed options sufficient to allow the user to reproduce any benchmark case run in (Watkins and Das, 2020), with helix flexibility provided in a kinematically realistic way using libraries of base pair steps.

We've provided two interfaces to the FARFAR2 webserver, because we expect the majority of our users will be happy just to input a sequence and a secondary structure -- the simple interface -- but we want to make sure that a lot of options are available for users with more complicated modeling tasks. When providing a secondary structure (in "dot-bracket" notation) you will want to confirm that all the base pairs you specify are canonical, Watson-Crick base pairs or wobbles. Noncanonical base pairs may be supplied via the advanced interface as part of a "general" secondary structure. Finally, while it is possible to model multi-chain RNAs with the "simple" interface (just separate distinct chains in both your sequence and secondary structure with commas), all the chains need to be connected by base pairs. (It is also a little easier to model multi-chain RNAs by providing a FASTA file, since that way you can control what the chain letters and numbering will be after modeling.)

The FARFAR2 advanced interface provides many additional input options. You may supply local templates as PDB files: these would be previously solved regions of structure that you suspect will reliably fold into the same conformation in this modeling problem as well. (Classic examples where this strategy works well include well defined motifs like kink-turns or loop-E motifs, as well as well-conserved ligand binding sites like the S-adenosyl methionine four way junction binding site common to multiple SAM riboswitches.) Conversely, if you know the approximate relative orientation of some segments of RNA structure -- suppose you know which pairs of helices are stacked around a multi-way junction -- you can supply an "alignment" file to constrain the helices in question to that relative orientation.

We provide three options for scoring functions. The first is the original method used by FARFAR (Das, 2010). The second is the optimized method from the FARFAR2 paper (Watkins, 2020), and is the default. The third is a 'beta' method that may be of some interest to specific researchers.

Using the advanced interface, you may also select different fragments sources. We suggest the most up-to-date fragment library created for the FARFAR2 publication, but as controls older sources are perfectly feasible. We also provide a number of options that control how the fragment library is used. For example, we have developed a feature that allows users to make predictions of previously deposited structures under "like-blind" conditions, which of course requires any RNAs that are too similar to be eliminated from the fragment library. We allow fragments from the desired library to be eliminated if they are locally too similar in conformation to the native; users of this server may select that RMSD radius, as well as the sstringency with which the fragment sequence is matched to the native. Finally, we allow the generation of additional random fragment samples within a customizable dihedral distance of the library fragment.

To specify the actual input modeling problem, the advanced interface allows the provision of a FASTA file. The reason this is so important is because it allows the user to number input helices or other local templates so that they correspond to the overall modeling problem. Don't supply a completed structure as the input to FARFAR2 -- it will notice that your structure is identical to your supplied FASTA and it will just do no work! Instead, supply what residues you do know, and FARFAR2 will fill in the rest.

If you provide a native structure, then FARFAR2 will compute the RMSD to that native structure for each decoy. (Make sure that the chains and residue numbering of your native matches the FASTA file you provided, so that FARFAR2 knows how to build a correspondence from models to the native so that it can calculate RMSD.) If no native is provided (perhaps none has been solved yet!), then the server will instead determine the lowest-energy model and compute RMSDs to that model. The resulting folding funnel can be a useful visual reference for how well the models are converging, and thus how likely your predictions are to be correct.

Finally, FARFAR2 has a number of methods for incorporating experimental data. If you have an RDAT file describing chemical mapping data, or a file of NMR chemical shifts, either can guide modeling.

Similarly, if you have experimental knowledge of that could translate into restraints, they may be provided as a Rosetta constraint file. A simple exmaple might be knowledge of the distance between a pair of atoms or the likely value of a dihedral angle. (You may intend something subtle here: for example, rather than believing that a certain atom pair distance is necessarily true, you may be interested in answering the question "what would the resulting structural ensemble look like if these two atoms had to be at this distance?" and comparing the properties of that ensemble to one predicted without restraints. If the imposition of a restraint results in an ensemble of (heuristically) terrible structures, that restraint may be implausible. You may refer to documentation of Rosetta constraint file format here. Those restraints may be applied all at once or progressively based on the primary sequence separation of the residues in question -- this "staging" of restraints is recommended for one particular experimental application.

We have found that restraints are an excellent way to encode the results of MOHCA-seq experiments, and that the resulting simulations produce accurate predictions of experimental structures. MOHCA-seq experiments produce "strong" and "weak" signals of nucleotide-nucleotide proximity whose functional form may be expressed as the sum of two functions, whose weights are given by the strength of the constraint. A "strong" restraint between residues 2 and 38 would be specified via:

    AtomPair O2' 2 C4' 38 FADE   0 30 15 -4.00  4.00
    AtomPair O2' 2 C4' 38 FADE -99 60 30 -36.00 36.00
    
while a "weak" restraint would be:
    AtomPair O2' 2 C4' 38 FADE   0 30 15 -0.80  0.80
    AtomPair O2' 2 C4' 38 FADE -99 60 30 -7.20 7.20
    
that is, one-fifth the strength. For a specific example of a simulation run on ROSIE using MOHCA-seq style constraints, please see this repository.

There are a couple of settings that are no longer supported in FARFAR2 that were unique to the original FARFAR webserver. If for some reason (benchmarking consistency or a special use case) you very much would like to use the 2012 force field, a particular bulge entropy score term, or permit variable bond lengths and angles, feel free to use that webserver instead.


Tips



Please cite the following article when referring to results from our ROSIE server:

  1. Watkins, A. M.; Rangan, R.; Das, R. “FARFAR2: Improved de novo Rosetta prediction of complex global RNA folds.” Structure, 2020, 28: 963-976.; doi: https://doi.org/10.1016/j.str.2020.05.011
  2. Watkins, A. M.; Das, R. "RNA 3D modeling with FARFAR2, online." bioRxiv 2020.11.26.399451; doi: https://doi.org/10.1101/2020.11.26.399451

  3. Lyskov S, Chou FC, Conchúir SÓ, Der BS, Drew K, Kuroda D, Xu J, Weitzner BD, Renfrew PD, Sripakdeevong P, Borgo B, Havranek JJ, Kuhlman B, Kortemme T, Bonneau R, Gray JJ, Das R., "Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE)". PLoS One. 2013 May 22;8(5):e63906. doi: 10.1371/journal.pone.0063906. Print 2013. Link

We welcome scientific and technical comments on our server. For support please contact us at Rosetta Forums with any comments, questions or concerns.


Modeling tools developed by the Das Lab at Stanford University. The Rosie implementation was developed by Andrew Watkins and Sergey Lyskov.