We present a theoretical and empirical investigation of the statistical behaviour of the words in a text produced by human language. To this aim, we analyse the word distribution of various texts of Italian language selected from a specific literary corpus. We firstly generalise a theoretical framework elaborated by ourselves to identify `quantum mechanical statistics' in large-size texts. Then, we show that, in all analysed texts, words distribute according to `Bose-Einstein statistics' and show significant deviations from `Maxwell-Boltzmann statistics'. Next, we introduce an effect of `word randomization' which instead indicates that the difference between the two statistical models is not as pronounced as in the original cases. These results confirm the empirical patterns obtained in texts of English language and strongly indicate that identical words tend to `clump together' as a consequence of their meaning, which can be explained as an effect of `quantum entanglement' produced through a phenomenon of `contextual updating'. More, word randomization can be seen as the linguistic-conceptual equivalent of an increase of temperature which destroys `coherence' and makes classical statistics prevail over quantum statistics. Some insights into the origin of quantum statistics in physics are finally provided.

Identifying Quantum Mechanical Statistics in Italian Corpora

Sandro Sozzo
2025-01-01

Abstract

We present a theoretical and empirical investigation of the statistical behaviour of the words in a text produced by human language. To this aim, we analyse the word distribution of various texts of Italian language selected from a specific literary corpus. We firstly generalise a theoretical framework elaborated by ourselves to identify `quantum mechanical statistics' in large-size texts. Then, we show that, in all analysed texts, words distribute according to `Bose-Einstein statistics' and show significant deviations from `Maxwell-Boltzmann statistics'. Next, we introduce an effect of `word randomization' which instead indicates that the difference between the two statistical models is not as pronounced as in the original cases. These results confirm the empirical patterns obtained in texts of English language and strongly indicate that identical words tend to `clump together' as a consequence of their meaning, which can be explained as an effect of `quantum entanglement' produced through a phenomenon of `contextual updating'. More, word randomization can be seen as the linguistic-conceptual equivalent of an increase of temperature which destroys `coherence' and makes classical statistics prevail over quantum statistics. Some insights into the origin of quantum statistics in physics are finally provided.
File in questo prodotto:
File Dimensione Formato  
2412.07919v1.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 1.07 MB
Formato Adobe PDF
1.07 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1296987
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact