My focus and my aim was to capture the phase
我的研究重點(diǎn)以及目標(biāo)在于
up to the two-word utterance
捕捉逐漸形成"雙詞"的過(guò)程
and that could happen anywhere
這一過(guò)程可能出現(xiàn)在
between second and third birthday.
兩周歲到三周歲之間的任何時(shí)間
It turned out my son was an early talker
事實(shí)證明我兒子屬于學(xué)說(shuō)話很早的孩子
so by the time his second birthday arrived
所以到他兩歲生日的時(shí)候
we had the main data set we wanted.
就已獲取到了我們所需的主要數(shù)據(jù)
By the time recording was complete,
錄制過(guò)程結(jié)束的時(shí)候
more than 240,000 hours of information
他們收集了超過(guò)二十四萬(wàn)小時(shí)的信息
and 16 million words had been collected.
以及一千六百萬(wàn)個(gè)單詞
It's a lot of data but in its raw form it's useless
數(shù)據(jù)很多 可原始數(shù)據(jù)沒(méi)什么用
and so the challenges this now sets up for us is
因此現(xiàn)階段我們面臨的挑戰(zhàn)是
how do you start extracting the right kind of metadata,
如何著手提取出有用的元數(shù)據(jù)
transcripts of who said what,
誰(shuí)說(shuō)過(guò)哪些話的文字記錄
annotations of where those people were,
那些人身處何方的注解
annotations of how they're moving
他們?nèi)绾我苿?dòng)的注解
and the relationships that they were in as they were speaking.
還有講話人處于怎樣的關(guān)系之中
And these are the, the tools
這些都是我們正在制作的
that we are now building to analyse the raw data,
用來(lái)分析原始數(shù)據(jù)的工具
and from that, we're starting to see some, some
之后我們就可以開(kāi)始觀察
early insights into the patterns of language development.
語(yǔ)言發(fā)展的早期模式了