ConAn: understanding MD simulations through contact map analysis: Contact map-based alternatives for common measures

Contact maps can provide alternatives for widely used statistics on MD simulation. These can be defined in several different ways, here I present the way ConAn implements them.

RMSF: This is commonly defined as the fluctuation of a residue around its average position, which means one fits the whole protein on a reference structure and measures distance to those of the reference structure. However, if large-scale conformational changes happen, these average positions can be meaningless and the fits particularly unreliable. If we measure root-mean-square fluctuations of inter-residue distances, we are using a more robust metric unaffected by any fitting. A further plus is that we (possibly) gain more insight into the relevant motions (which distances fluctuate and by how much?). This is a comparison of ubiquitin RMSF with fluctuations of the contact map:
In this case, we learn very similar information to the "old way" (the linker of the final few residues dominate the RMSF), although we know the most important distances that change (roughly, linker - β3)
RMSD: the root-mean-squared displacement shows how far our structure is from some reference (often, the initial frame). This is again done by a fit and a way around this fitting is again possible by getting individual differences in inter-residue distances. This way, we ignore large collective motions and catch changes in small distances (this is a trade-off, not an obvious plus!). This is our RMSD on the contact map for our 50 ns simulation of ubiquitin:

These values are quite a bit smaller than normal RMSD's since we "fit" every pair separately (i.e., we don't fit them at all!).
Inter-residue cross-correlation: getting the cross-correlation of pairs with other pairs would leave us with a 4D object, which is a bit too much to visualize. However, we can project distances on residues using some formula. In the current implementation of ConAn, we use the simple formula:

If we then correlate the time series of this measure for various residue pairs, we can get an idea of where the protein is more or less folded.

This measure could be thought of an alternative with the relatively well-known cross-correlation analysis that correlates the time series of deviation from average positions (there is that concept again!).

If two residues correlate in our measure, it is likely the case that they are directly linked or their environments change concomitantly. Anti-correlation would be characteristic to competitive binding to a third residue (which would then correlate to both of the external ones). Without further ado, this is the inter-residue cross-correlation of our old friend ubiquitin:

Most of the correlation is indeed where we would expect it to be (along the main diagonal), but there is also some interesting anti-correlations. If we look at Arg42, for example, we see that it correlates with its direct neighborhood, then anti-correlates strongly with Asp58, then correlates again:

Now that we know what we are looking for, we can easily identify a major shift in interaction networks: Arg42 interacted with Gln48 (orange) but after the linker changed conformation, it is Asp58 (blue) that gains an interaction partner in Lys49. Similar anti-correlations can often yield interesting insight into the dynamics of the protein (not necessarily time-evolution, but possible interaction networks).

The initial conformation: Arg42 interacts with Gln48.

The final conformation: Asp58 interacts with Lys49,
making the previous interaction impossible.

Cluster analysis on residues: this can be an interesting way of finding clusters of interacting partners. (soon to be described/expanded...) For example, this is what three clusters of residues of ubiquitin look like, where each cluster is characterized by a high interaction lifetime (described in the previous post) to a central residue (represented here by opaque licorice). This is done by a k-medoid method...

The inter-residue "distance metric" in this
case is a % of interaction time. For cluster analysis,
we will use 100%-(interaction time).

The three identified clusters. along with their central residues (in opaque).

Cluster analysis on the trajectory: the same as normal cluster analysis, but (as usual) without fitting structures! The distance metric in this case is inter-frame RMSD. This is what the inter-frame RMSD looks like in this case:

and the clusters identified:

There is again the same tradeoff as mentioned in the RMSD section. Deviations in the contact map could be more interesting for specific, important interactions, while "normal RMSD" will detect large-scale conformational changes. Cluster analysis based on inter-residue distances can be useful if one is interested in intra-domain motion and perhaps wants to ignore relatively large "macro-changes" that do not change the energy too much (e.g., if the system contains a flexible linker).
Principal component analysis: in this case, we do not try to decompose deviations from an average structure as principal components, but rather deviations from an average contact map. Average contact maps can be a better measure than an average structure, particularly if the conformation of the protein drastically changes. These are the first three principal components of the dynamical contact map of ubiquitin (always ordered to have a positive time correlation):

And the projections are:

ConAn: understanding MD simulations through contact map analysis

Thursday, March 23, 2017

Contact map-based alternatives for common measures

No comments:

Post a Comment