How many glyphs are there in the Voynich alphabet?
Note: This page is a work in edit. Still fiddling. Comments and feedback more than welcome, they’re encouraged!
The very question itself is imbued with menace. Before we even get going, we have to first define the very semiotic basis of “glyph” within the manuscript. We first need to define a paradigm for what a glyph actually is.
Note: This page is mainly a compilation of work that is already out there. I wished to collate and to define the very basics of Voynichese before delving into some more complex topics, and to re-examine the assumptions that underpin all of our bigger theories. Most of this is NOT based on my own examination of Voynichese, but upon a compilation of what other people / working groups have observed, with sources, although I do attempt an analysis of certain combination glyphs further on. Certainly none of this is written “in stone” – the question, by its very nature, is subjective. And I can only work from previous work, from the transcription alphabets and the transcription corpus.
The alphabet of a language is the set of symbols, letters, or tokens (which in Voynichese are called glyphs) from which the strings of the language may be formed. The content strings (the signifier) formed from this alphabet are called words. A formal language is often defined by means of a formal grammar such as a regular grammar or context-free grammar, also called its formation rule. [^] Read More
Tonight I’ve been running Sukhotin’s algorithm against some of the Voynich transcriptions.
In brief, Sukhotin’s algorithm identifies vowels in text. It accepts text, sorts the letters by order of occurrence and looks to see which letters appear most frequently next to a vowel or a consonant.
There is a unique feature within the Voynich Manuscript, namely the occurrence of very similar repeated sequences of text. These are a progression from Timms Pairs, an effect which is defined as two very similar words appearing within the same paragraph, usually with an additional suffix or prefix. However, an evolving epizauxis sequence, which I call Jackson sequences, are fragments of sentences in which a word appears to be repeated several times with slight modifications. The effect has been described before, from D’Imperio & Currier providing the first comment on the effect. However, as far as I am aware there has been no attempt to develop or analyse the effect. This is not an attempt to formally describe the effect but a quick overview of the characteristics that form the phenomenon and a suggestion for automatically detecting these sequences in the transcription files, which is building up to a more formal description of the reason driving the scribe who first penned this work.
Epizauxis is a term from formal rhetoric which describes the rapid repetition of a single word with no other words in between, albeit for the sake of emphasis. A classic example is from Macbeth: “O horror, horror, horror!”. There is, naturally enough, no term for repeating the same word with difference of spelling, as this makes no sense in natural language, outside of “stream of consciousness” writings which alliterate words, such as in a quasi-poetic style, crumble trumble bumble mumble… Or, of course, one of those clever poems designed to show how difficult it is to speak English:
Just compare heart, beard, and heard, Dies and diet, lord and word, Sword and sward, retain and Britain. (Mind the latter, how it’s written.) [Excerpted from here]
Notwithstanding that, the Voynich manuscript contains many examples of Jackson sequences, especially in the text heavy pages towards the back of the manuscript. They are infrequent or non-existent in the light illustrated pages (although Timms Pairs do appear there) but appear numerous times on the later pages. They appear much more often in Currier B pages, although this may just be because the text heavy pages are written in B.
Note: All eva transcriptions are taken from the reading available in the VIB. However, the readings make more sense in the original document.
Let us look at a few examples:
This example makes less sense in Eva, but if you start with the second word in this line and read along you can clearly see how the sequence evolves in a binary sequence. Words 1,3,6 form what is essentially a Jackson sequences, as do words 2,4,5,7.Here is another example from the same page. I have copied two lines here because there are some beautiful examples of Timms Pairs (qokeey) here!
In this example we see how okeeo morphs into olchedy, lchedy, qokeey, okeeedy, okain, followed by a repeated chedy.
There are many other such examples, but they are omitted here for brevity’s sake – a quick visual search on any of the text rich pages in Currier B will quickly bring your attention to them.
What we are clearly seeing here are words which are being repeated with modifications as the scribe writes. Duplicate pairs such as the chedy chedy cited in the above example can be dismissed as scribal errors errors, but this explanation does not explain away the nature of Jackson sequences.
Before delving into possible reasons for these sequences, is it possible to automatically detect them in the transcription files?
Well, the above sequences all share the same features. They are a succession of words, usually in a linear sequence, that are very similar. In a sentence where w x y z forms the Jackson sequence, the shorter word between any two adjacent glyphs will usually share at least 80% of its glyphs with its larger partner and y, z will normally remove or insert the most visually striking glyph present or missing in w x.
The difference between words quickly forms a pattern. A word is generated. The next word either omits or adds a letter (or bigram): d become qo for example, usually at the front or back of the word. The subsequent word has a glyph modified (a become d, or ee become a benched gallows, for example), and if a prominent glyph is present this is dropped. The last word has any suffix dropped. Usually after four or five words the Jackson sequence is abandoned.
This rule suggests that Jackson sequences can be automatically found. However, the transcription files do make arbitrary differences between glyphs that can confuse the parser. For example, ch & ee are both visually very similar bigrams which should be treated as the same glyph by any parser.
However, the discovery of a supposed pattern does not imply that there is a technical reason behind the formation of Jackson sequences. The short nature of these words – averaging 5.5 glyphs a word – means that removing or inserting just one or two glyphs forms the 80% rule.
The following possible reasons occur to me:
This is an effect of any possible encryption process. For example, a similar effect would be found in a simple transposition cipher on similar words, ie the atbash cipher or “pig latin” code.
The lines are poetic in nature and we are seeing alliteration in action.
The text is actually phonetic in nature and we are seeing similar sounding words, as in the extract from the poem above.
The text is random, and this is an unintended effect created by the scribe. It would thus be the same phenomenon that produces Timms Pairs – the scribe, intending to produce natural language like text, is copying previous words and modifying them as he goes along to make them look different. This would explain the appearance or disappearance of the most visually striking glyphs.
Anton Alipov makes an interesting suggestion below.
Declension of verbs (see more below).
It’s worth mentioning that to counter balance the above, there are repeating sequences throughout the Voynich. The longest are as follows (from Petr Kazil) and are added here for future contemplation:
The text contains a sequence which is repeated not just twice, but four
or more times. Significantly, all of the occurrences are in ``Author B''
There are also a few twice-only repeats:
Anton Alipov suggests that:
Don’t know how in English with its quite simple grammar, but, for example, in Russian this kind of repetition well might be not for the sake of emphasis, but just in the regular course of declension. For example:
Косил косой косой косой.
Here we have three identically looking words, and the first word is also similar to three others. This is a valid sentence and it means: “The boss-eyed [person] mowed [something] with a crooked scythe”. The first word “косил” is past tense, masculine gender for the verb “косить” (to mow). The second word “косой” is a designation of a person (like a nickname) and actually means “one who is suffering from strabismus”. The fourth word “косой” is ablative case, singular number for the feminine gender noun “коса” (scythe). The third word “косой” is feminine gender, singular case adjective “косая” relating to the noun and thus put in ablative. Probably all four words share the common etymology, but actually they are all different in terms of meaning, except that the adjective “косая” and the nickname “косой” share the common meaning like “not straight”, and also the verb “косить” and the noun “коса” are semantically related: you usually mow (“косить”) with a scythe (“коса”) and, alternatively, what you usually do with the help of the scythe (“коса”) is that you mow (“косить”) with it.
Russian is not like the Voynich Manuscript in terms of abundance of such repetitions, but again this is a valid and natural linguistic example.
Anton’s comments also made me think of declensions. Declension of verbs would show a similar effect, not in English (run,run,ran), but in most of the Romance languages or indeed, Latin itself which has four main patterns of conjugation (ie currō, currere, cucurrī, cursus (to run, to race)).