This article is an attempt to answer the most basic of questions: are the spaces between Voynich words arbitrary or purposeful?
Despite the essential simplicity of this question, it’s still a burning issue. If we can prove that spaces are arbitrary, then it’s a push towards the theories that the text is encoded or gibberish. But if we can prove that the spaces are purposeful, that they separate words in the same way as our modern usage, then it’s a push towards a natural or artificial language.
But how can we prove this either way?
Are words “words”?
As I have argued before, the text of the manuscript is divided up into clearly defined word-like glyph groups (what we would call words if we could assign a sense unit to each glyph group). These glyph groups have a non-trivial internal structure which is manifest in the severe restrictions imposed upon the positioning of glyphs within the glyph groups. From now on I will refer to these glyph groups as “words” (I am not a fan of Stolfi’s terminology of token as I find it confuses people).
Voynichese has a very strict phototactic structure for glyphs that appears to indicate that these words are assembled intentionally. They are bound together as if they were words.
We are used to the paradigm that words form a sentence with spaces between the words. The Voynich corpus (with the exception of labels, single words that are attached to images) appears to follow this paradigm (albeit with no punctuation). But it is possible that this is a deception. The spaces between words could be an encoded null character, or an arbitrary sp acet om akei t mo rediff cult for the uniniti atedto read.
If this were so, we would expect the words to have a low repeat value. Words would be broken up into sub-sections, or jumbled around, and this would mean that they would not repeat very often. On the other hand, if spaces are separating words, then we would expect words to be repeated throughout the corpus.
Knight and Reddy (What we know about the Voynich manuscript) prove that words are repeated throughout the VMS, and that furthermore the word frequency distribution of the manuscript follows Zipf’s law.
Furthermore, they note that Landini (2001) found that the corpus follows Zipf’s law of word lengths: there is an inverse relationship between the frequency and the length of a word.
From a slightly different angle, let us look at how often labels repeat within the corpus, as this allows us to see if words are repeated through different contexts. If the labels truly function as “labels”, ie sense units denominating illustrations or objects, we would expect a fair number of them to be repeated within the main corpus. And indeed we do: MarcoP found that 70% of all labels appear within the main corpus (study here).
So we find that voynich “words” obey frequency distribution laws; are repeated with a frequency which is normal for language; and furthermore that they are oft-repeated throughout different contexts.
These conclusions lead us towards the assertion that glyph groups are indeed words, and the spaces between said words are significant, serving to separate sense units.