One common protein-contact prediction is that, if the side chain of one member of a pair of amino acids brought close together by folding is long, then that of the other member will be short, and vice versa.
一種比較普通的蛋白質接觸預測就是,如果一對氨基酸的一個側鏈折疊后很長,那么里一個側鏈就會很短,反之亦然。
In other words, the sum of the two lengths is constant.
換句話說,兩個氨基酸側鏈的總長度是恒定的。
If you have but a single protein sequence available, knowing this is not much use.
只知道一個蛋白質中氨基酸分子的排列順序沒有太大用處。
Recent developments in genomics, however, mean that the DNA sequences of lots of different species are now available.
不過基因組學最近的進展表示,現在許多不同物種的DNA分子的順序是可以獲取的。

Since DNA encodes the amino-acid sequences of an organism’s proteins, the composition of those species’ proteins is now known, too.
因為DNA分子編碼生物體蛋白質中氨基酸分子的順序,那么這些物種的蛋白質的組成也就可以獲悉。
That means slightly different versions, from related species, of what is essentially the same protein can be compared.
這意味著,功能相同,但屬于近緣物種的,在組成上稍有不同的蛋白質,可以進行比較。
The latest version of Rosetta does so, looking for co-variation (eg, in this case, two places along the length of the proteins’ chains where a shortening of an amino acid’s side chain in one is always accompanied by a lengthening of it in the other).
最新版本的Rosetta所做的就是尋找蛋白質的相關變異。(比如:在這個例子中,沿著蛋白質鏈長度方向的兩個地方,如果一個氨基酸的側鏈變短了,另一個氨基酸的側鏈就會變長)。
In this way, it can identify parts of the folded structure that are close together.
用這種方法可以辨別緊密接觸的折疊氨基酸的結構。
Though it is still early days, the method seems to work.
雖然現在是初期階段,不過這個方法還是有用的。
None of the 614 structures Dr Baker modelled most recently has yet been elucidated by crystallography or NMR, but six of the previous 58 have.
Baker博士近期所建立的614種蛋白質模型中,沒有被晶體學或者磁共振所證實的,但是之前的58個模型中有6個被證實。
In each case the prediction closely matched reality.
在每一個模型中,預測的蛋白質結構與實際蛋白質分子的結構相差無幾。
Moreover, when used to “hindcast” the shapes of 81 proteins with known structures, the protein-contact-prediction version of Rosetta got them all right.
此外,應用最新版本的Rosetta對已知結構的81個蛋白質進行“追算”,結果表明,蛋白質接觸預測的蛋白質結構都是正確。
There is a limitation, though.
然而它是有局限性的。
Of the genomes well-enough known to use for this trick, 88,000 belong to bacteria, the most speciose type of life on Earth.
已熟知的,并且適用這種方法的基因組中,有88000種屬于地球上最多的物種-細菌。
Only 4,000 belong to eukaryotes—the branch of life, made of complex cells, which includes plants, fungi and animals.
僅僅有4000中屬于真核生物,生命的另一種形式。它是由復雜的細胞組成,有動物、植物、真菌。
There are, then, not yet enough relatives of human beings in the mix to look for the co-variation Dr Baker’s method relies on.
然而,在這個大家族中,沒有足夠多的與人類具有親緣關系的物種,所以無法研究相關變異,而這是 Baker博士的方法所需要的條件。
Others think they have an answer to that problem.
對于這個問題,其他人認為他們有解決方法。
They are trying to extend protein-contact prediction to look for relationships between more than two amino acids in a chain.
他們嘗試擴展蛋白質接觸預測的范圍,在一條鏈中尋找不止2個氨基酸的相互關系。
This would reduce the number of related proteins needed to draw structural inferences and might thus bring human proteins within range of the technique.
這將會減少結構上不同的相關蛋白質的數目,并可能因此將人類蛋白質引入技術范圍內。
But to do so, you need a different computational approach.
但是如果這么做的話,就需要一個不同的計算方法。
Those attempting it are testing out the branch of artificial intelligence known as deep learning.
想要嘗試的人正在對以深度學習為人熟知的人工智能的分支技術進行檢測。
Deep learning employs pieces of software called artificial neural networks to fossick out otherwise-abstruse patterns.
深度學習采用一些稱為人工神經網絡的軟件來搜尋其他深奧的模式。
It is the basis of image- and speech-recognition programs, and also of the game-playing programs that have recently beaten human champions at Go and poker.
它是圖像和語音識別程序的基礎,也是最近在圍棋和紙牌游戲中打敗人類冠軍的游戲程序的基礎。
Jianlin Cheng, of the University of Missouri, in Columbia, who was one of the first to apply deep learning in this way, says such programs should be able to spot correlations between three, four or more amino acids, and thus need fewer related proteins to predict structures.
哥倫比亞的密蘇里州的大學的程建林最先把深度學習應用到這個方面。他說,這個程序能夠找到三個、四個或者更多氨基酸之間的相互關性。并且需要更少的相關的蛋白質分子來預測其結構。
Jinbo Xu, of the Toyota Technological Institute in Chicago, claims to have achieved this already.
芝加哥豐田技術研究所的徐金波聲稱現在已經達到這種技術水平。
He and his colleagues published their method in PLOS Computational Biology, in January, and it is now being tested.
他和他同事在一月份將這一方法發表在《PLOS計算生物學》上,現處于測試階段。
If the deep-learning approach to protein folding lives up to its promise, the number of known protein structures should multiply rapidly.
對于蛋白質分子折疊,如果深度學習的方法達到了預期的效果,那么已知蛋白質結構的數目應該會迅速增加。
More importantly, so should the number that belong to human proteins.
更為重要的是,對人類蛋白質結構的了解也會增加。
That will be of immediate value to drug makers.
對于制藥公司來說將會有即時的好處。
It will also help biologists understand better the fundamental workings of cells—and thus what, at a molecular level, it truly means to be alive.
這也將會幫助生物學家更好的理解細胞的基本功能。如此一來,意味著分子水平的研究真正開始了。