The availability of large databases of written texts allow for improved quantitative investigations of dynamical aspects of language usage. Language shows remarkable statistical properties such as the presence of long-range correlation in long literary texts and the sub-linear growth of the vocabulary size (number of distinct words) with database size. This talk will show how simple models explain these observations and allow for an improved understanding of dynamical aspects of language usage both on the scale of individual texts and on historical time scales.
-M. Gerlach and E. G. Altmann, "Stochastic model for the vocabulary growth in natural languages", Phys. Rev. X 3, 021006 (2013) - E. G. Altmann, G. Cristadoro, and M. Degli Esposti, "On the origin of long-range correlations in texts", Proc. Natl. Acad. Sci. USA 109, 11582 (2012) |