A study upon word spaces in the Voynich manuscript

This article is an attempt to answer the most basic of questions: are the spaces between Voynich words arbitrary or purposeful?

Despite the essential simplicity of this question, it’s still a burning issue. If we can prove that spaces are arbitrary, then it’s a push towards the theories that the text is encoded or gibberish. But if we can prove that the spaces are purposeful, that they separate words in the same way as our modern usage, then it’s a push towards a natural or artificial language.

But how can we prove this either way?

Are words “words”?

As I have argued before, the text of the manuscript is divided up into clearly defined word-like glyph groups (what we would call words if we could assign a sense unit to each glyph group). These glyph groups have a non-trivial internal structure which is manifest in the severe restrictions imposed upon the positioning of glyphs within the glyph groups. From now on I will refer to these glyph groups as “words” (I am not a fan of Stolfi’s terminology of token as I find it confuses people).

Voynichese has a very strict phototactic structure for glyphs that appears to indicate that these words are assembled intentionally. They are bound together as if they were words.

We are used to the paradigm that words form a sentence with spaces between the words. The Voynich corpus (with the exception of labels, single words that are attached to images) appears to follow this paradigm (albeit with no punctuation). But it is possible that this is a deception. The spaces between words could be an encoded null character, or an arbitrary sp acet om akei t mo rediff cult for the uniniti atedto read.

If this were so, we would expect the words to have a low repeat value. Words would be broken up into sub-sections, or jumbled around, and this would mean that they would not repeat very often. On the other hand, if spaces are separating words, then we would expect words to be repeated throughout the corpus.

Knight and Reddy (What we know about the Voynich manuscript) prove that words are repeated throughout the VMS, and that furthermore the word frequency distribution of the manuscript follows Zipf’s law.

Furthermore, they note that Landini (2001) found that the corpus follows Zipf’s law of word lengths: there is an inverse relationship between the frequency and the length of a word.

From a slightly different angle, let us look at how often labels repeat within the corpus, as this allows us to see if words are repeated through different contexts. If the labels truly function as “labels”, ie sense units denominating illustrations or objects, we would expect a fair number of them to be repeated within the main corpus. And indeed we do: MarcoP found that 70% of all labels appear within the main corpus (study here).

So we find that voynich “words” obey frequency distribution laws; are repeated with a frequency which is normal for language; and furthermore that they are oft-repeated throughout different contexts.

These conclusions lead us towards the assertion that glyph groups are indeed words, and the spaces between said words are significant, serving to separate sense units.


On what the Voynich recipe section could be…

The “Recipe” section of the Voynich has many interesting properties, but since it is a purely textual part of the manuscript, it tends to get passed over.

Let’s have a quick overview of this section.

It is self-contained within the very last quire of the manuscript. It comprises many sequential paragraphs of text, with many paragraphs being illustrated (or marked) with stars. There are no drawings other than these stars.

The stars are not all the same. They have 7 or 8 arms. Their colouring changes, we find:

  • full dark shaded stars (example)
  • full light shaded stars (example)
  • only central shaded stars (example)
  • central shaded stars with a central dot, with different colours (example)
  • stars with no shading but a central dot, with different colours (example)
  • empty stars

The stars appear to be roughly in a sequential pattern. Therefore we see different patters emerging – ie, dark spot, light spot, dark spot, light spot etc.

Most of the stars have tails, which seem to have the function of marking several lines of text underneath the stars.

Sometimes we find an aberrant star, such as this tiny one here.

The obvious solution is that the stars are marking paragraphs.

I would suggest that the colour coding of the stars, along with the number of arms, indicates topic. Most likely, the type of colouring indicates a topic; the colour within then indicates a sub-category.

That the section is a “recipe book” is an old suggestion that appears to come from Currier’s original classification of the book. I understand that he was not suggestion food recipe, but rather pharmaceutical recipes, to go with the assumed medicinal purpose of the book.

However, I would suggest a different purpose to this section – it’s a florilegium.

In medieval works, a florilegium was a compendium of extracts and maxims derived from the great writers of the past. It then started to develop into other topics, bringing together maxims on different subjects such as ethical topics, civic behaviour, vices and virtues, and the like.

Florilegiums were common in scholar circles. Prof Mary Carruthers says that they were essentially aide de memoires for students, brief dictiones summarising the topic which are presented either ad verba (verbatim) or ad res (a summary).

One of the best known scholar’s florilegiums is De universo, a Carolingian encyclopaedia compiled by Hrabanus Maurus. Of course, this was Middle Ages, far before this book.

By the late Middle Ages / early Renaissance, the florilegium had developed into more of a layman’s treatise. Short homilies for personal study which were designed to remind the reader of the greater text.

They were short summaries of wisdom, designed to make the reader recall the main body of information.

This florilegium theory explains the short abbreviated nature of the paragraphs, together with the stars, which here serve as topic markers.

It doesn’t get us any closer to what the content actually means, but it does provide a new perspective for further analysis.

Incidentally, the term florilegium (which essentially means to gather flowers) was adopted afresh in the Renaissance to describe herbals.

First authorised copy of the Voynich has been commissioned by Yale

It seems the Beinecke has authorised the specialist manuscript producers “Siloé” from Spain to make the first ever authorised copy of the Voynich.

The project will start in February, when the specialists of the company will be given access for a whole week to make their own photos of the book and get “the feel” for it.

They will then start producing handdrawn exact copies on vellum for sale.

Siloé is one of the worlds premier manuscript makers, based in Burgos (Spain) and has made 34 official copies of ancient manuscripts in the last two decades, 14 of which have won international awards. They’ve been pestering Yale for the last decade to allow them access to the Voynich.

It seems Yale opened a selection process last year, and has this week confirmed Siloé has won it.

23 professionals will be working on the process, and the reproduction will be “100% identical” promises the firms director.

However, the first copy is not expected to be released until 2018.

No news on how much the copies will sell for – some of Siloé’s works sell for over €10,000. I understand the project is being financed by crowdfunding.

In all, 898 numbered editions will be issued.

Evidence of repair and trimming on final folio

I have written before on my blog about the repair carried out to the final folio of the VM, the “curse” page.

In short, we decided that the top right tear in the final folio was repaired by the parchment maker whilst it was still on its frame (note oval needle holes, a clue that the string used to stitch the parchment was under pressure).

But there is a further consideration to make from this which I am only just starting to think about.

Namely, the wormhole in the top corner (note the two dark circles on each folio):

Pab f116r

If the folio is spread out to be the same size as the preceding one, then the hole corresponds exactly with the wormhole on the previous folio, as is to be expected. So that suggests that at some point, both folios extended out to the same length, ie, their corners corresponded, and at this point the worm went through both folios.

Which logically means the stitch in the repair had been removed at this point, allowing the page to come out. Probably the string broke.

Now, there is evidence of this repair being again repaired – if you look at the recto side of the folio you can see smaller needle holes amongst the oval ones, holes that weren’t subjected to the same pressure as the original ones. It’s possible that the hair in the folio was thus stuffed back inside its hole and the stitch replaced at some point in its history. The stitch was certainly done before the text on the recto side of this folio was added, because the text takes the repair work into consideration.

Now this leads us to the question of the trimming. On this page we see writing that extends quite naturally to the very edge of the page. It has always been my contention that the writing was made on a full size folio, which was later trimmed to the very edge of the text (see my analysis of the supposed curse on this page for more).

See the second line?
Note how top line and images on left are flush with margin, but show no evidence of having been carefully written, evidence that they were written when the margin was wider.

The trimming was carried out to correspond with the position of the corner of the folio when the stitch is in place, ie, the top left corner is dragged into its current position. We can postulate that the trimming is not original but carried out by a later owner of the book, one who also repaired the stitch with a quick job.

I’m suggesting that originally that the final folio was the same size as the preceding one, with the top outermost corner being pulled in by the stitch, but the bottom outside corner still corresponding with the folio below. At some point the stitch came undone and the corner drifted back to its original position, at which point the wormhole was made. Now, if the lefthand margin was cut to its current point before that moment, the tops of both folios would not correspond, the top of f116 would be dragged downwards because there is not enough give in the parchment to allow it to correspond with f115 and the wormhole would not be where it is. So we can say the opposite: both folios were originally the same size, with the top outermost corner being dragged in. Take a piece of paper and experiment.

So it seems that when the stitch was repaired – after the inscription on f116v was made – the sewer decided it looked a mess and trimmed this folio to its current size, corresponding with the new location of the corners of the folios. When we look at the preceding folio we see a number of wormholes on the outermost margin that don’t exist in the folio in question, they were most probably the reason it was all cut away – the whole outermost margin was damaged anyway. Other wormholes inside the content of the folio do correspond with holes below, showing the rest of the page lay in its current position quite happily.

And there is one final piece of evidence. The top of the final folio shows water damage, the brownish stain across the whole of the top which has slightly blurred the ink in the first line of text. If the folio had been cut at this point to its current size, the water spill would have carried through to the visible portion of the preceding folio; but it hasn’t, which suggests the final folio was larger at the time of the spill, thus protecting the underlying folio. (Although we can say that possibly the folio was lying open when the spill occurred, away from the rest of the quire. But then we would expect to see some damage to the recto side, which would have been open, and this doesn’t seem to have happened).

In summary:

The final parchment of this final folio had an imperfection that was made good before the text was written on it. This caused the top outer corner of the folio to be dragged towards the centre, twisting the parchment. Both folios were aprox the same size, as we see elsewhere in the manuscript.

The folio then had text written on it on both sides. But at a later date the repair broke and the corner drifted back to its original position. At some point after this happened, a worm bored a hole through the corner of this folio and the one underneath it.

After this had happened, an owner of the VM repaired the imperfection, and noticed water had been spilt upon the folio. He then trimmed the folio down to the bare margins of the text, cutting away damaged parchment.

Exploring evolving epizeuxis within the Voynich Manuscript text

There is a unique feature within the Voynich Manuscript, namely the occurrence of very similar repeated sequences of text. These are a progression from Timms Pairs, an effect which is defined as two very similar words appearing within the same paragraph, usually with an additional suffix or prefix. However, an evolving epizauxis sequence, which I call Jackson sequences, are fragments of sentences in which a word appears to be repeated several times with slight modifications. The effect has been described before, from D’Imperio & Currier providing the first comment on the effect. However, as far as I am aware there has been no attempt to develop or analyse the effect. This is not an attempt to formally describe the effect but a quick overview of the characteristics that form the phenomenon and a suggestion for automatically detecting these sequences in the transcription files, which is building up to a more formal description of the reason driving the scribe who first penned this work.

Epizauxis is a term from formal rhetoric which describes the rapid repetition of a single word with no other words in between, albeit for the sake of emphasis. A classic example is from Macbeth: “O horror, horror, horror!”. There is, naturally enough, no term for repeating the same word with difference of spelling, as this makes no sense in natural language, outside of “stream of consciousness” writings which alliterate words, such as in a quasi-poetic style, crumble trumble bumble mumble… Or, of course, one of those clever poems designed to show how difficult it is to speak English:

Just compare heart, beard, and heard,
Dies and diet, lord and word,
Sword and sward, retain and Britain.
(Mind the latter, how it’s written.) [Excerpted from here]

Notwithstanding that, the Voynich manuscript contains many examples of Jackson sequences, especially in the text heavy pages towards the back of the manuscript. They are infrequent or non-existent in the light illustrated pages (although Timms Pairs do appear there) but appear numerous times on the later pages. They appear much more often in Currier B pages, although this may just be because the text heavy pages are written in B.

Note: All eva transcriptions are taken from the reading available in the VIB. However, the readings make more sense in the original document.

Let us look at a few examples:
<f112r.P.6;H> chedal.oteedy.okeey.qokeedy.olkeedy.oteey.oram
<f111r.P.3;H> dsheedy.lkeedy.chckhy.lchedy.qokeey.qokear.chal.qokeeas.cheokedy.sal.lokam

This example makes less sense in Eva, but if you start with the second word in this line and read along you can clearly see how the sequence evolves in a binary sequence. Words 1,3,6 form what is essentially a Jackson sequences, as do words 2,4,5,7.Captura de pantalla completa 06072015 160917Here is another example from the same page. I have copied two lines here because there are some beautiful examples of Timms Pairs (qokeey) here!

In this example we see how okeeo morphs into olchedy, lchedy, qokeey, okeeedy, okain, followed by a repeated chedy.
<f111r.P.9;H> ycheeodai!n.okeeo.olchedy.lchedy.qokeey.okeeedy.okai!n.chedy.chedy.teey.dal.lam


There are many other such examples, but they are omitted here for brevity’s sake – a quick visual search on any of the text rich pages  in Currier B will quickly bring your attention to them.

What we are clearly seeing here are words which are being repeated with modifications as the scribe writes. Duplicate pairs such as the chedy chedy cited in the above example can be dismissed as scribal errors errors, but this explanation does not explain away the nature of Jackson sequences.

Before delving into possible reasons for these sequences, is it possible to automatically detect them in the transcription files?

Well, the above sequences all share the same features. They are a succession of words, usually in a linear sequence, that are very similar. In a sentence where w x y z forms the Jackson sequence, the shorter word between any two adjacent glyphs will usually share at least 80% of its glyphs with its larger partner and y, z will normally remove or insert the most visually striking glyph present or missing in w x.

The difference between words quickly forms a pattern. A word is generated. The next word either omits or adds a letter (or bigram): d become qo for example, usually at the front or back of the word. The subsequent word has a glyph modified (a become d, or ee become a benched gallows, for example), and if a prominent glyph is present this is dropped. The last word has any suffix dropped. Usually after four or five words the Jackson sequence is abandoned.

This rule suggests that Jackson sequences can be automatically found. However, the transcription files do make arbitrary differences between glyphs that can confuse the parser. For example, ch & ee are both visually very similar bigrams which should be treated as the same glyph by any parser.

However, the discovery of a supposed pattern does not imply that there is a technical reason behind the formation of Jackson sequences. The short nature of these words – averaging 5.5 glyphs a word – means that removing or inserting just one or two glyphs forms the 80% rule.

The following possible reasons occur to me:

  1. This is an effect of any possible encryption process. For example, a similar effect would be found in a simple transposition cipher on similar words, ie the atbash cipher or “pig latin” code.
  2. The lines are poetic in nature and we are seeing alliteration in action.
  3. The text is actually phonetic in nature and we are seeing similar sounding words, as in the extract from the poem above.
  4. The text is random, and this is an unintended effect created by the scribe. It would thus be the same phenomenon that produces Timms Pairs – the scribe, intending to produce natural language like text, is copying previous words and modifying them as he goes along to make them look different. This would explain the appearance or disappearance of the most visually striking glyphs.
  5. Anton Alipov makes an interesting suggestion below.
  6. Declension of verbs (see more below).

Comments welcome!

It’s worth mentioning that to counter balance the above, there are repeating sequences throughout the Voynich. The longest are as follows (from Petr Kazil) and are added here for future contemplation:

The text contains a sequence which is repeated not just twice, but four
or more times. Significantly, all of the occurrences are in ``Author B''
<f83r.7> 2OEZC8.EZCC89.4CCC89.4OF9.O4OE.RZCC89.4OFC89.4OPCC89.4OPCC89-

There are also a few twice-only repeats:


<f26r.4> 4OFC89.SCO2.9PC89.4OFC89.9PC89.SCFC89.8AM.O8AJ.2AE89-
<f81v.12>  4OE.OE.S89.ZC89.4OFC89.9PC89.SCPC89.EFC8C9.9PC89-


Anton Alipov suggests that:

Don’t know how in English with its quite simple grammar, but, for example, in Russian this kind of repetition well might be not for the sake of emphasis, but just in the regular course of declension. For example:

Косил косой косой косой.

Here we have three identically looking words, and the first word is also similar to three others. This is a valid sentence and it means: “The boss-eyed [person] mowed [something] with a crooked scythe”. The first word “косил” is past tense, masculine gender for the verb “косить” (to mow). The second word “косой” is a designation of a person (like a nickname) and actually means “one who is suffering from strabismus”. The fourth word “косой” is ablative case, singular number for the feminine gender noun “коса” (scythe). The third word “косой” is feminine gender, singular case adjective “косая” relating to the noun and thus put in ablative. Probably all four words share the common etymology, but actually they are all different in terms of meaning, except that the adjective “косая” and the nickname “косой” share the common meaning like “not straight”, and also the verb “косить” and the noun “коса” are semantically related: you usually mow (“косить”) with a scythe (“коса”) and, alternatively, what you usually do with the help of the scythe (“коса”) is that you mow (“косить”) with it.

Russian is not like the Voynich Manuscript in terms of abundance of such repetitions, but again this is a valid and natural linguistic example.

Anton’s comments also made me think of declensions. Declension of verbs would show a similar effect, not in English (run,run,ran), but in most of the Romance languages or indeed, Latin itself which has four main patterns of conjugation (ie currō, currere, cucurrī, cursus (to run, to race)).

Is it worth trying to work out what the plants in the Voynich Manuscript are?

There are many “plants” (herbs if you will, although I doubt all them are) in the Voynich Manuscript. Is it worthwhile trying to identify them?
For any identification attempt is a two edged sword that can easily lead us astray.
First off, we have to consider whether
a) the plants are drawn in the traditional sense or
b) are the results of an individual working off their own experience.
or c)…… that they don’t actually have a maning.
If a),

then they are being copied from earlier sources, and hence will correspond to the bulk of the literary tradition in Europe. If we assume they are, then there will be many clues that give us access to their identifications as their use will be symbolised. Remember that there are many herbals in existence – most of them, as Don on the mailing list has been discovering, are just copies of earlier or contemporaneous works, following set patterns, even if the individual monastery did add commentaries to the “official” text.

People simply did not want innovation in their herbs – we are talking about medicine here. Without going deeply into the subject, the literary tradition of medicine was institutionalised, it was traditional. Herbals were part of a tradition from the past, based usually on the doctrine of signatures, medicine that was assumed to work, and nobody wanted to be the guinea pig for some quack with new ideas.
Herbals of the age followed the tradition. We obviously cannot know what local doctors (wise women or men, leechs, hedge magicians, call them what you will) knew or thought, for they left no written record, but it seems a safe bet that oral teaching would filter out from the monasteries, communicating their knowledge, and that this knowledge would be passed between villages and medics. We know that the common name for herbs changes drastically from region to region, even village to village in old England, but their essential purpose remains the same.
As an example, the Old English Herbolarium, an AngloSaxon turn of the millennium work, is a herbal written in Old English in the continental style, translating the original continental works. However, most of the herbs depicted are unrecognisable, which lead scholars to assume that the scribes who translated the work didn’t have access to any original illustrations (many of the herbs are, in any case, not native to the British Isles). The assumption was that the scribes had no real life models, and so after several editions of the work had been copied, the original illustrations had morphed unrecognisably. Not so: Voigts in his 1979 work proved that the herbs are depicted in their dried form, the only way that Brits would have had access to them (via trade to central and southern Europe), and far more useful a depiction to them than their fresh form. The scribes had kept the knowledge and power of the authoritative written text, but had changed the illustration to fit their needs.
But the symbolised “clues” are still there. Basilicia, adderwort, a herb assumed to protect against adders continues to have its association with the three snakes and so can be recognised. Adderwort without the snake & basilisk association serves no point!
So if we assume a), we can then go ahead and look for symbolic clues in the Voynich. Let us look at 49r. A plant with multi colour golden (well, reddish) bulbs and snakes around the roots. Ah ha! It’s Adderwort.
Or is it?
Well, adderwort traditionally has three snakes, not two as depicted in the Voynich. The snakes are usually called Eriseos, Stillatus & Hematites (or Crysofalus) according to Pollington, at least in the old English tradition, with their associated characteristics that give the plant its power (I skip over the details here). So why does the Voynich only have two? And are they really snakes? Where are their fangs, or the vertical stripes showing that these are indeed the poisontooth snakes of antiquity, the adder family?
So the symbology does not help us. Either the symbology is adhered to as per tradition, or it is thrown out of the window and a new schematic is inserted. We cannot pick on one half recognised detail and expand it to the rest of the material without proof.
Let’s consider b).

The Voynich is the work of someone not following the traditional patterns.

Well, in this case, we cannot assume. We must be sure. And how can we be sure if the text is not there to describe what we are seeing?
Ah ha! We think. This is a rose. No, replies the author, it’s a dog rose, or a badly drawn daisy. How dare you think it is a rose.
Ah ha! This is Adderwort. Look at the snakes. No, replies the author, for that is the medicine of the old guard, not the new exciting stuff I am developing and anyway those are worms showing that these flowers grow in the decay of waste, signifying a phoenix like revival from the ashes of our waste. Or whatever.
We cannot match these illustrations to plants, for the simple reason that the genre is just too large.
Yes, it looks like a red onion. But why should it be a red onion? It could be that the author is referring to a specific type of potato… no wait, potatoes came in later. You know what I mean. Maybe a fat carrot or any other tuber of a specific shape.
But there is a further problem with b). The fact that it doesn’t fit in with our accepted understanding of how later medieval medicine would work.
Early / middle medieval thought discarded original thought. Biblical teachings said that the Ancients possessed all knowledge as granted by God, and that human hubris had lead to this information being lost. Therefore, there was no point in poking around thinking up new things for yourself, you had to rely on the teachings of the Ancients.

That’s not to say that people weren’t curious, of course they were. It’s to say that in “formal” discussion and argument, rhetoric based on the arguments of the ancients was standardised and would overturn any original thought, even when the ancient information was clearly wrong. There is a story that Aristotle claimed the honeybee has eight legs, when any fool can see that it only has six – but this was accepted as fact right up until the Renaissance!

Monasteries copied books because they, in some way, transmitted information as revealed by God in the past and it was their duty to do so. They modified the useful bits of them as they went along, but the essential knowledge was protected – it was their duty to protect the holy knowledge of times past, and of course, they believed implicitly in it.

That’s part of the reason Rudolph II was revered by the early European intellectual – he was the original Renaissance patron, hunting out new information. He was living right at the time when new access to information and greater literacy was starting to evolve thought into the Renaissance, but the old regime continued with their medieval mindset elsewhere. His Spanish Uncle for example was most dismissive of his nephew and his intellectual mindset – it wasn’t something that was “done”. The Italian princes had been doing it for years, by the way, but they were never Holy Roman Emperors – Rudolph main-streamed this rather eccentric pasttime.

And look at Paracelsus. He is known now not for any innovation in medicine (his cures were as claptrap as the ones they were replacing) but because he broke with tradition and urged innovation, trial and error, experimentation and actually discarding old knowledge that didn’t lead anywhere. That’s why he was revolutionary. He was the first figure to become famous for such work, in the same way that his contemporaries such as Martin Lucer would become famous for defying the Catholic Church. OK, neither of them was the first to advocate such a movement, but they were the first to actually create movements. Which, I understand, does not imply that the VM cannot have been an earlier attempt, some visionary who realised that medicine was claptrap and attempted to create his own medicine. But this is a circular argument – for since we cannot read the text, we return to the beginning of this argument!
But all this came after the VM, in the middle 16th century.
There is a c).

That the content in the book simply doesn’t lead anywhere. That the illustrator had access to herbals but no understanding (or interest) of medicine or their purpose, and so just used them as a basis for his work as he went along. Which explains why we only have two snakes instead of three, the illustrator was unaware of the significance of three snakes.


No matter which of the three arguments we choose, there isn’t a lot of point in trying to identify the plants in the VM, since we know (after decades of trying) that they aren’t real life representations.

We can build up logical arguments pointing to this plant or the other, but we cannot be sure. We cannot know the true intention of the artist, because we have no textual confirmation. And so far, we have never been able (Prof Bax aside, ah hem) to use a plant ID to identify words.