A study upon word spaces in the Voynich manuscript

This article is an attempt to answer the most basic of questions: are the spaces between Voynich words arbitrary or purposeful?

Despite the essential simplicity of this question, it’s still a burning issue. If we can prove that spaces are arbitrary, then it’s a push towards the theories that the text is encoded or gibberish. But if we can prove that the spaces are purposeful, that they separate words in the same way as our modern usage, then it’s a push towards a natural or artificial language.

But how can we prove this either way?

Are words “words”?

As I have argued before, the text of the manuscript is divided up into clearly defined word-like glyph groups (what we would call words if we could assign a sense unit to each glyph group). These glyph groups have a non-trivial internal structure which is manifest in the severe restrictions imposed upon the positioning of glyphs within the glyph groups. From now on I will refer to these glyph groups as “words” (I am not a fan of Stolfi’s terminology of token as I find it confuses people).

Voynichese has a very strict phototactic structure for glyphs that appears to indicate that these words are assembled intentionally. They are bound together as if they were words.

We are used to the paradigm that words form a sentence with spaces between the words. The Voynich corpus (with the exception of labels, single words that are attached to images) appears to follow this paradigm (albeit with no punctuation). But it is possible that this is a deception. The spaces between words could be an encoded null character, or an arbitrary sp acet om akei t mo rediff cult for the uniniti atedto read.

If this were so, we would expect the words to have a low repeat value. Words would be broken up into sub-sections, or jumbled around, and this would mean that they would not repeat very often. On the other hand, if spaces are separating words, then we would expect words to be repeated throughout the corpus.

Knight and Reddy (What we know about the Voynich manuscript) prove that words are repeated throughout the VMS, and that furthermore the word frequency distribution of the manuscript follows Zipf’s law.

Furthermore, they note that Landini (2001) found that the corpus follows Zipf’s law of word lengths: there is an inverse relationship between the frequency and the length of a word.

From a slightly different angle, let us look at how often labels repeat within the corpus, as this allows us to see if words are repeated through different contexts. If the labels truly function as “labels”, ie sense units denominating illustrations or objects, we would expect a fair number of them to be repeated within the main corpus. And indeed we do: MarcoP found that 70% of all labels appear within the main corpus (study here).

So we find that voynich “words” obey frequency distribution laws; are repeated with a frequency which is normal for language; and furthermore that they are oft-repeated throughout different contexts.

These conclusions lead us towards the assertion that glyph groups are indeed words, and the spaces between said words are significant, serving to separate sense units.


On what the Voynich recipe section could be…

The “Recipe” section of the Voynich has many interesting properties, but since it is a purely textual part of the manuscript, it tends to get passed over.

Let’s have a quick overview of this section.

It is self-contained within the very last quire of the manuscript. It comprises many sequential paragraphs of text, with many paragraphs being illustrated (or marked) with stars. There are no drawings other than these stars.

The stars are not all the same. They have 7 or 8 arms. Their colouring changes, we find:

  • full dark shaded stars (example)
  • full light shaded stars (example)
  • only central shaded stars (example)
  • central shaded stars with a central dot, with different colours (example)
  • stars with no shading but a central dot, with different colours (example)
  • empty stars

The stars appear to be roughly in a sequential pattern. Therefore we see different patters emerging – ie, dark spot, light spot, dark spot, light spot etc.

Most of the stars have tails, which seem to have the function of marking several lines of text underneath the stars.

Sometimes we find an aberrant star, such as this tiny one here.

The obvious solution is that the stars are marking paragraphs.

I would suggest that the colour coding of the stars, along with the number of arms, indicates topic. Most likely, the type of colouring indicates a topic; the colour within then indicates a sub-category.

That the section is a “recipe book” is an old suggestion that appears to come from Currier’s original classification of the book. I understand that he was not suggestion food recipe, but rather pharmaceutical recipes, to go with the assumed medicinal purpose of the book.

However, I would suggest a different purpose to this section – it’s a florilegium.

In medieval works, a florilegium was a compendium of extracts and maxims derived from the great writers of the past. It then started to develop into other topics, bringing together maxims on different subjects such as ethical topics, civic behaviour, vices and virtues, and the like.

Florilegiums were common in scholar circles. Prof Mary Carruthers says that they were essentially aide de memoires for students, brief dictiones summarising the topic which are presented either ad verba (verbatim) or ad res (a summary).

One of the best known scholar’s florilegiums is De universo, a Carolingian encyclopaedia compiled by Hrabanus Maurus. Of course, this was Middle Ages, far before this book.

By the late Middle Ages / early Renaissance, the florilegium had developed into more of a layman’s treatise. Short homilies for personal study which were designed to remind the reader of the greater text.

They were short summaries of wisdom, designed to make the reader recall the main body of information.

This florilegium theory explains the short abbreviated nature of the paragraphs, together with the stars, which here serve as topic markers.

It doesn’t get us any closer to what the content actually means, but it does provide a new perspective for further analysis.

Incidentally, the term florilegium (which essentially means to gather flowers) was adopted afresh in the Renaissance to describe herbals.

First authorised copy of the Voynich has been commissioned by Yale

It seems the Beinecke has authorised the specialist manuscript producers “Siloé” from Spain to make the first ever authorised copy of the Voynich.

The project will start in February, when the specialists of the company will be given access for a whole week to make their own photos of the book and get “the feel” for it.

They will then start producing handdrawn exact copies on vellum for sale.

Siloé is one of the worlds premier manuscript makers, based in Burgos (Spain) and has made 34 official copies of ancient manuscripts in the last two decades, 14 of which have won international awards. They’ve been pestering Yale for the last decade to allow them access to the Voynich.

It seems Yale opened a selection process last year, and has this week confirmed Siloé has won it.

23 professionals will be working on the process, and the reproduction will be “100% identical” promises the firms director.

However, the first copy is not expected to be released until 2018.

No news on how much the copies will sell for – some of Siloé’s works sell for over €10,000. I understand the project is being financed by crowdfunding.

In all, 898 numbered editions will be issued.

Is the Voynich a natural language?

This article is a work in progress. Comments and feedbacks are enthusiastically welcomed!

First off, let’s discuss what we mean by a natural language.

A natural language is one that has evolved spontaneously amongst a group of people (I include creoles, pidgins and other bricolage in this study) or an artificial language that is capable of being used as a primary source of transmitting information in a natural way (think Esperanto or other a posteriori languages).

In short, I here define a natural language as one that any cognitively normal human being is able to learn, understand and use without recourse to artificial means. (As opposed to the a priori code based artificial languages that require the memorisation of thousands of ciphers; these would be artificial languages under my definition).

Shorthand (Tironian notes) or notarial code are banished to the “artificial languages” page when they occupy most of the text; I make a short discussion of their limited use within a written natural language below.

Oneiric languages (basically those spontaneous languages such as the languages of the insane, or glossolalia) are consigned to the “gibberish” pigeon hole.

Read More

How many glyphs are there in the Voynich alphabet?

How many glyphs are there in the Voynich alphabet?

Note: This page is a work in edit. Still fiddling. Comments and feedback more than welcome, they’re encouraged!

The very question itself is imbued with menace. Before we even get going, we have to first define the very semiotic basis of “glyph” within the manuscript. We first need to define a paradigm for what a glyph actually is.

Note: This page is mainly a compilation of work that is already out there. I wished to collate and to define the very basics of Voynichese before delving into some more complex topics, and to re-examine the assumptions that underpin all of our bigger theories. Most of this is NOT based on my own examination of Voynichese, but upon a compilation of what other people / working groups have observed, with sources, although I do attempt an analysis of certain combination glyphs further on. Certainly none of this is written “in stone” – the question, by its very nature, is subjective. And I can only work from previous work, from the transcription alphabets and the transcription corpus.

The alphabet of a language is the set of symbols, letters, or tokens (which in Voynichese are called glyphs) from which the strings of the language may be formed. The content strings (the signifier) formed from this alphabet are called words. A formal language is often defined by means of a formal grammar such as a regular grammar or context-free grammar, also called its formation rule. [^] Read More

Evidence of repair and trimming on final folio

I have written before on my blog about the repair carried out to the final folio of the VM, the “curse” page.

In short, we decided that the top right tear in the final folio was repaired by the parchment maker whilst it was still on its frame (note oval needle holes, a clue that the string used to stitch the parchment was under pressure).

But there is a further consideration to make from this which I am only just starting to think about.

Namely, the wormhole in the top corner (note the two dark circles on each folio):

Pab f116r

If the folio is spread out to be the same size as the preceding one, then the hole corresponds exactly with the wormhole on the previous folio, as is to be expected. So that suggests that at some point, both folios extended out to the same length, ie, their corners corresponded, and at this point the worm went through both folios.

Which logically means the stitch in the repair had been removed at this point, allowing the page to come out. Probably the string broke.

Now, there is evidence of this repair being again repaired – if you look at the recto side of the folio you can see smaller needle holes amongst the oval ones, holes that weren’t subjected to the same pressure as the original ones. It’s possible that the hair in the folio was thus stuffed back inside its hole and the stitch replaced at some point in its history. The stitch was certainly done before the text on the recto side of this folio was added, because the text takes the repair work into consideration.

Now this leads us to the question of the trimming. On this page we see writing that extends quite naturally to the very edge of the page. It has always been my contention that the writing was made on a full size folio, which was later trimmed to the very edge of the text (see my analysis of the supposed curse on this page for more).

See the second line?
Note how top line and images on left are flush with margin, but show no evidence of having been carefully written, evidence that they were written when the margin was wider.

The trimming was carried out to correspond with the position of the corner of the folio when the stitch is in place, ie, the top left corner is dragged into its current position. We can postulate that the trimming is not original but carried out by a later owner of the book, one who also repaired the stitch with a quick job.

I’m suggesting that originally that the final folio was the same size as the preceding one, with the top outermost corner being pulled in by the stitch, but the bottom outside corner still corresponding with the folio below. At some point the stitch came undone and the corner drifted back to its original position, at which point the wormhole was made. Now, if the lefthand margin was cut to its current point before that moment, the tops of both folios would not correspond, the top of f116 would be dragged downwards because there is not enough give in the parchment to allow it to correspond with f115 and the wormhole would not be where it is. So we can say the opposite: both folios were originally the same size, with the top outermost corner being dragged in. Take a piece of paper and experiment.

So it seems that when the stitch was repaired – after the inscription on f116v was made – the sewer decided it looked a mess and trimmed this folio to its current size, corresponding with the new location of the corners of the folios. When we look at the preceding folio we see a number of wormholes on the outermost margin that don’t exist in the folio in question, they were most probably the reason it was all cut away – the whole outermost margin was damaged anyway. Other wormholes inside the content of the folio do correspond with holes below, showing the rest of the page lay in its current position quite happily.

And there is one final piece of evidence. The top of the final folio shows water damage, the brownish stain across the whole of the top which has slightly blurred the ink in the first line of text. If the folio had been cut at this point to its current size, the water spill would have carried through to the visible portion of the preceding folio; but it hasn’t, which suggests the final folio was larger at the time of the spill, thus protecting the underlying folio. (Although we can say that possibly the folio was lying open when the spill occurred, away from the rest of the quire. But then we would expect to see some damage to the recto side, which would have been open, and this doesn’t seem to have happened).

In summary:

The final parchment of this final folio had an imperfection that was made good before the text was written on it. This caused the top outer corner of the folio to be dragged towards the centre, twisting the parchment. Both folios were aprox the same size, as we see elsewhere in the manuscript.

The folio then had text written on it on both sides. But at a later date the repair broke and the corner drifted back to its original position. At some point after this happened, a worm bored a hole through the corner of this folio and the one underneath it.

After this had happened, an owner of the VM repaired the imperfection, and noticed water had been spilt upon the folio. He then trimmed the folio down to the bare margins of the text, cutting away damaged parchment.

Exploring evolving epizeuxis within the Voynich Manuscript text

There is a unique feature within the Voynich Manuscript, namely the occurrence of very similar repeated sequences of text. These are a progression from Timms Pairs, an effect which is defined as two very similar words appearing within the same paragraph, usually with an additional suffix or prefix. However, an evolving epizauxis sequence, which I call Jackson sequences, are fragments of sentences in which a word appears to be repeated several times with slight modifications. The effect has been described before, from D’Imperio & Currier providing the first comment on the effect. However, as far as I am aware there has been no attempt to develop or analyse the effect. This is not an attempt to formally describe the effect but a quick overview of the characteristics that form the phenomenon and a suggestion for automatically detecting these sequences in the transcription files, which is building up to a more formal description of the reason driving the scribe who first penned this work.

Epizauxis is a term from formal rhetoric which describes the rapid repetition of a single word with no other words in between, albeit for the sake of emphasis. A classic example is from Macbeth: “O horror, horror, horror!”. There is, naturally enough, no term for repeating the same word with difference of spelling, as this makes no sense in natural language, outside of “stream of consciousness” writings which alliterate words, such as in a quasi-poetic style, crumble trumble bumble mumble… Or, of course, one of those clever poems designed to show how difficult it is to speak English:

Just compare heart, beard, and heard,
Dies and diet, lord and word,
Sword and sward, retain and Britain.
(Mind the latter, how it’s written.) [Excerpted from here]

Notwithstanding that, the Voynich manuscript contains many examples of Jackson sequences, especially in the text heavy pages towards the back of the manuscript. They are infrequent or non-existent in the light illustrated pages (although Timms Pairs do appear there) but appear numerous times on the later pages. They appear much more often in Currier B pages, although this may just be because the text heavy pages are written in B.

Note: All eva transcriptions are taken from the reading available in the VIB. However, the readings make more sense in the original document.

Let us look at a few examples:
<f112r.P.6;H> chedal.oteedy.okeey.qokeedy.olkeedy.oteey.oram
<f111r.P.3;H> dsheedy.lkeedy.chckhy.lchedy.qokeey.qokear.chal.qokeeas.cheokedy.sal.lokam

This example makes less sense in Eva, but if you start with the second word in this line and read along you can clearly see how the sequence evolves in a binary sequence. Words 1,3,6 form what is essentially a Jackson sequences, as do words 2,4,5,7.Captura de pantalla completa 06072015 160917Here is another example from the same page. I have copied two lines here because there are some beautiful examples of Timms Pairs (qokeey) here!

In this example we see how okeeo morphs into olchedy, lchedy, qokeey, okeeedy, okain, followed by a repeated chedy.
<f111r.P.9;H> ycheeodai!n.okeeo.olchedy.lchedy.qokeey.okeeedy.okai!n.chedy.chedy.teey.dal.lam


There are many other such examples, but they are omitted here for brevity’s sake – a quick visual search on any of the text rich pages  in Currier B will quickly bring your attention to them.

What we are clearly seeing here are words which are being repeated with modifications as the scribe writes. Duplicate pairs such as the chedy chedy cited in the above example can be dismissed as scribal errors errors, but this explanation does not explain away the nature of Jackson sequences.

Before delving into possible reasons for these sequences, is it possible to automatically detect them in the transcription files?

Well, the above sequences all share the same features. They are a succession of words, usually in a linear sequence, that are very similar. In a sentence where w x y z forms the Jackson sequence, the shorter word between any two adjacent glyphs will usually share at least 80% of its glyphs with its larger partner and y, z will normally remove or insert the most visually striking glyph present or missing in w x.

The difference between words quickly forms a pattern. A word is generated. The next word either omits or adds a letter (or bigram): d become qo for example, usually at the front or back of the word. The subsequent word has a glyph modified (a become d, or ee become a benched gallows, for example), and if a prominent glyph is present this is dropped. The last word has any suffix dropped. Usually after four or five words the Jackson sequence is abandoned.

This rule suggests that Jackson sequences can be automatically found. However, the transcription files do make arbitrary differences between glyphs that can confuse the parser. For example, ch & ee are both visually very similar bigrams which should be treated as the same glyph by any parser.

However, the discovery of a supposed pattern does not imply that there is a technical reason behind the formation of Jackson sequences. The short nature of these words – averaging 5.5 glyphs a word – means that removing or inserting just one or two glyphs forms the 80% rule.

The following possible reasons occur to me:

  1. This is an effect of any possible encryption process. For example, a similar effect would be found in a simple transposition cipher on similar words, ie the atbash cipher or “pig latin” code.
  2. The lines are poetic in nature and we are seeing alliteration in action.
  3. The text is actually phonetic in nature and we are seeing similar sounding words, as in the extract from the poem above.
  4. The text is random, and this is an unintended effect created by the scribe. It would thus be the same phenomenon that produces Timms Pairs – the scribe, intending to produce natural language like text, is copying previous words and modifying them as he goes along to make them look different. This would explain the appearance or disappearance of the most visually striking glyphs.
  5. Anton Alipov makes an interesting suggestion below.
  6. Declension of verbs (see more below).

Comments welcome!

It’s worth mentioning that to counter balance the above, there are repeating sequences throughout the Voynich. The longest are as follows (from Petr Kazil) and are added here for future contemplation:

The text contains a sequence which is repeated not just twice, but four
or more times. Significantly, all of the occurrences are in ``Author B''
<f83r.7> 2OEZC8.EZCC89.4CCC89.4OF9.O4OE.RZCC89.4OFC89.4OPCC89.4OPCC89-

There are also a few twice-only repeats:


<f26r.4> 4OFC89.SCO2.9PC89.4OFC89.9PC89.SCFC89.8AM.O8AJ.2AE89-
<f81v.12>  4OE.OE.S89.ZC89.4OFC89.9PC89.SCPC89.EFC8C9.9PC89-


Anton Alipov suggests that:

Don’t know how in English with its quite simple grammar, but, for example, in Russian this kind of repetition well might be not for the sake of emphasis, but just in the regular course of declension. For example:

Косил косой косой косой.

Here we have three identically looking words, and the first word is also similar to three others. This is a valid sentence and it means: “The boss-eyed [person] mowed [something] with a crooked scythe”. The first word “косил” is past tense, masculine gender for the verb “косить” (to mow). The second word “косой” is a designation of a person (like a nickname) and actually means “one who is suffering from strabismus”. The fourth word “косой” is ablative case, singular number for the feminine gender noun “коса” (scythe). The third word “косой” is feminine gender, singular case adjective “косая” relating to the noun and thus put in ablative. Probably all four words share the common etymology, but actually they are all different in terms of meaning, except that the adjective “косая” and the nickname “косой” share the common meaning like “not straight”, and also the verb “косить” and the noun “коса” are semantically related: you usually mow (“косить”) with a scythe (“коса”) and, alternatively, what you usually do with the help of the scythe (“коса”) is that you mow (“косить”) with it.

Russian is not like the Voynich Manuscript in terms of abundance of such repetitions, but again this is a valid and natural linguistic example.

Anton’s comments also made me think of declensions. Declension of verbs would show a similar effect, not in English (run,run,ran), but in most of the Romance languages or indeed, Latin itself which has four main patterns of conjugation (ie currō, currere, cucurrī, cursus (to run, to race)).