And the problem of how to go about this is quite obvious.
至于要如何做的問題,也是相當簡單明了。
It's not like Plato woke up one day and then he wrote,
但我的意思并非,比如,柏拉圖有一天突然醒來說,
"Hello, I'm Plato, and as of today, I have a fully introspective consciousness."
“哈啰!我是柏拉圖,我今天,擁有完整的自省意識了” 那樣的簡單而已。
And this tells us actually what is the essence of the problem.
而這告訴我們,我們要找出,問題的本質為何。
We need to find the emergence of a concept that's never said.
我們必須找到從來沒有被談論過的概念。
The word introspection does not appear a single time in the books we want to analyze.
在這些書本中從未出現過一次“自省”這個字,
So our way to solve this is to build the space of words.
所以為了解決這個問題,我們要建立一個文字的空間。
This is a huge space that contains all words
在這個大空間里,包含了相當多的字,
in such a way that the distance between any two of them is indicative of how closely related they are.
用這種方式可以量測出兩個字彼此之間的關聯性程度。

So for instance, you want the words "dog" and "cat" to be very close together,
舉個例子,你會想,“狗”、“貓” 應該是比較有關聯性的,
but the words "grapefruit" and "logarithm" to be very far away.
但“葡萄柚”和“對數” 就沒甚么關聯了。
And this has to be true for any two words within the space.
而在這個空間里的任何兩個字,都必須是可以被量測出來的。
And there are different ways that we can construct the space of words.
而我們有很多方式可以建立起這些字的空間架構。
One is just asking the experts, a bit like we do with dictionaries.
方法一是只要請教專家就行了,有點類似查字典。
Another possibility is following the simple assumption that when two words are related,
另一個可行的方法是,當兩個字出現關聯性時,去追蹤它們的預設狀況,
they tend to appear in the same sentences, in the same paragraphs, in the same documents,
它們可能會出現在同一句、同一段落、或同一文件中,
more often than would be expected just by pure chance.
多于“偶然”地出現。
And this simple hypothesis, this simple method,
在這個簡單的前提下,
with some computational tricks that have to do with the fact
這個單純且帶有運算技巧的方法必須好用,
that this is a very complex and high-dimensional space, turns out to be quite effective.
而這個復雜且高維度的空間,事后證明,相當有效。
And just to give you a flavor of how well this works,
向各位介紹一下它多有效,
this is the result we get when we analyze this for some familiar words.
我們分析了一些經常用到的字,
And you can see first that words automatically organize into semantic neighborhoods.
首先你可以看到,這些詞匯會自動地歸納成語義相近的相鄰群組,
So you get the fruits, the body parts, the computer parts, the scientific terms and so on.
所以你可看到,水果跟身體部位,計算機與科學字匯等等。
The algorithm also identifies that we organize concepts in a hierarchy.
算法也可以把我們要整理的概念分門別類出來。
So for instance, you can see that
舉個例子,你可以看到,
the scientific terms break down into two subcategories of the astronomic and the physics terms.
科學的字匯被拆解成兩個子類,分別是太空與物理的詞匯。
And then there are very fine things.
然后你會發現一件好玩的事,
For instance, the word astronomy, which seems a bit bizarre where it is,
舉個例子,“天文學”這個詞匯,它應該擺的位置
is actually exactly where it should be,
與它現在的位置好像不太搭嘎,
between what it is, an actual science, and between what it describes, the astronomical terms.
它現在介于真實科學與天文學之間,偏向科學的位置,而它自己卻是一個天文學的字匯。
And we could go on and on with this.
我們可以持續尋找其它類似的情況。
Actually, if you stare at this for a while,
實際上,如果你盯著這些字一陣子,
and you just build random trajectories,
然后隨機搭配連結一下這些字,
you will see that it actually feels a bit like doing poetry.
你會覺得好像自己在吟詩。
And this is because, in a way, walking in this space is like walking in the mind.
那是因為在某種程度上,在這些空間字匯里漫游就像是在腦海中吟詩一樣。
And the last thing is that this algorithm also identifies what are our intuitions,
最后,算法也能辨識出人類的直覺字匯,
of which words should lead in the neighborhood of introspection.
并歸納到內省的相鄰字匯中。
So for instance, words such as "self," "guilt," "reason," "emotion," are very close to "introspection,"
舉個例子,像是自我、內疚、理由、情緒與內省相關的字匯非常接近,
but other words, such as "red," "football," "candle," "banana," are just very far away.
但其它的字,像是紅色、足球、蠟燭、香蕉就差很遠了。
And so once we've built the space, the question of the history of introspection,
所以一旦我們建立起這樣的詞匯空間,
or of the history of any concept which before could seem abstract and somehow vague,
有關于內省的歷史,有關與任何概念的歷史,以前被認為是抽象或是有點模糊的字匯,
becomes concrete, becomes amenable to quantitative science.
都可以變成扎扎實實可以被量化的科學。
All that we have to do is take the books, we digitize them,
而我們要做的就是,拿起這些書把它們數字化,
and we take this stream of words as a trajectory and project them into the space,
然后把這些字像子彈一樣射到這些字匯空間里面,
and then we ask whether this trajectory spends significant time circling closely to the concept of introspection.
然后我們問計算機這些字匯所行經的軌跡花了多少的時間才達到內省概念的字匯中。