Identifying Quantum Mechanical Statistics in Italian Corpora

Aerts, Diederik; Jonito Aerts Arguëlles,; Beltran, Lester; Massimiliano Sassoli De Bianchi,; Sozzo, Sandro

doi:10.1007/s10773-025-06006-5

We present a theoretical and empirical investigation of the statistical behaviour of the words in a text produced by human language. To this aim, we analyse the word distribution of various texts of Italian language selected from a specific literary corpus. We firstly generalise a theoretical framework elaborated by ourselves to identify `quantum mechanical statistics' in large-size texts. Then, we show that, in all analysed texts, words distribute according to `Bose-Einstein statistics' and show significant deviations from `Maxwell-Boltzmann statistics'. Next, we introduce an effect of `word randomization' which instead indicates that the difference between the two statistical models is not as pronounced as in the original cases. These results confirm the empirical patterns obtained in texts of English language and strongly indicate that identical words tend to `clump together' as a consequence of their meaning, which can be explained as an effect of `quantum entanglement' produced through a phenomenon of `contextual updating'. More, word randomization can be seen as the linguistic-conceptual equivalent of an increase of temperature which destroys `coherence' and makes classical statistics prevail over quantum statistics. Some insights into the origin of quantum statistics in physics are finally provided.