A study upon word spaces in the Voynich manuscript

This article is an attempt to answer the most basic of questions: are the spaces between Voynich words arbitrary or purposeful?

Despite the essential simplicity of this question, it’s still a burning issue. If we can prove that spaces are arbitrary, then it’s a push towards the theories that the text is encoded or gibberish. But if we can prove that the spaces are purposeful, that they separate words in the same way as our modern usage, then it’s a push towards a natural or artificial language.

But how can we prove this either way?

Are words “words”?

As I have argued before, the text of the manuscript is divided up into clearly defined word-like glyph groups (what we would call words if we could assign a sense unit to each glyph group). These glyph groups have a non-trivial internal structure which is manifest in the severe restrictions imposed upon the positioning of glyphs within the glyph groups. From now on I will refer to these glyph groups as “words” (I am not a fan of Stolfi’s terminology of token as I find it confuses people).

Voynichese has a very strict phototactic structure for glyphs that appears to indicate that these words are assembled intentionally. They are bound together as if they were words.

We are used to the paradigm that words form a sentence with spaces between the words. The Voynich corpus (with the exception of labels, single words that are attached to images) appears to follow this paradigm (albeit with no punctuation). But it is possible that this is a deception. The spaces between words could be an encoded null character, or an arbitrary sp acet om akei t mo rediff cult for the uniniti atedto read.

If this were so, we would expect the words to have a low repeat value. Words would be broken up into sub-sections, or jumbled around, and this would mean that they would not repeat very often. On the other hand, if spaces are separating words, then we would expect words to be repeated throughout the corpus.

Knight and Reddy (What we know about the Voynich manuscript) prove that words are repeated throughout the VMS, and that furthermore the word frequency distribution of the manuscript follows Zipf’s law.

Furthermore, they note that Landini (2001) found that the corpus follows Zipf’s law of word lengths: there is an inverse relationship between the frequency and the length of a word.

From a slightly different angle, let us look at how often labels repeat within the corpus, as this allows us to see if words are repeated through different contexts. If the labels truly function as “labels”, ie sense units denominating illustrations or objects, we would expect a fair number of them to be repeated within the main corpus. And indeed we do: MarcoP found that 70% of all labels appear within the main corpus (study here).

So we find that voynich “words” obey frequency distribution laws; are repeated with a frequency which is normal for language; and furthermore that they are oft-repeated throughout different contexts.

These conclusions lead us towards the assertion that glyph groups are indeed words, and the spaces between said words are significant, serving to separate sense units.


3 thoughts on “A study upon word spaces in the Voynich manuscript

  1. Historically, there is no indication that anybody thought to hide word-spaces until 1520 or so (in Venice).

    Hence I would have thought that anyone who thinks that we are looking at a fifteenth century origin for the Voynich Manuscript should sensibly start from the expectation that its words are very probably indeed words, unless you can prove otherwise.

    Of course, whether those words are enciphered, encrypted, abbreviated, transposed or whatever is another matter entirely, but they are very probably words. 🙂

    What is interesting for me is that the “or or oro r” sequence (on folio 15v) leaves the final “r” adrift at the start of a word, a position where “r” never normally appears. And so this would appear to be the first glimmerings of what I have called a “space transposition cipher”, shuffling spaces around to try to hide the “orororor” structure that might otherwise be too much of a giveaway.

    1. A good point Nick. I didn’t bring it up because you and I both know the counter-argument from smartasses: if the scribe was clever enough to devise a code this good he would have thought of spaces, etc etc! 8)

      1. Ah, but my point is that every time someone makes an assertion about Voynichese that implicitly asks us to rewrite our cryptographic history books, we should be extremely slow to agree with it. 🙂

