March 8, 2026
18 Views
Welcome

Breaking the "Unbreakable Cipher": A Full Walkthrough of the Kasiski Attack on Vigenere

A cipher got called "unbreakable" for three hundred years, then a retired military officer cracked it with pen and paper — that's the Vigenere story.

Key Length Estimation and Key Recovery in CrypTool-2 Source Code


Abstract

A cipher got called "unbreakable" for three hundred years, then a retired military officer cracked it with pen and paper — that's the Vigenere story. This article digs through the CrypTool-2 (Kopal, 2018) source code to take apart the entire attack chain: how the Kasiski test guesses the key length, how the Friedman test and autocorrelation function help cross-validate, and how hill climbing recovers the key content one letter at a time. We'll also talk about when these techniques fall flat, and some interesting engineering choices CrypTool-2 makes along the way.


1. Introduction: From Caesar to Vigenere

The Caesar cipher is the simplest monoalphabetic substitution — every letter gets shifted by the same value. The weakness is obvious: the ciphertext preserves the plaintext's letter frequency distribution. In English, E makes up 12.7% of all letters. If H is the most frequent letter in the ciphertext? The shift is probably 3.

Vigenere's improvement is brute-force in concept: instead of one shift value, use a whole set of them in rotation. A key of KEY means the 1st letter shifts by 10 (K=10), the 2nd by 4 (E=4), the 3rd by 24 (Y=24), then back to 10 for the 4th... The same plaintext letter at different positions becomes different ciphertext letters, and frequency analysis goes out the window — at least on the surface.

This thing went unbroken for three hundred years after it was proposed in the 16th century. Then in 1863, retired Prussian officer Friedrich Kasiski published a technique in Die Geheimschriften und die Dechiffrir-kunst: look for repeated n-gram fragments in the ciphertext, measure the distances between them, and work backwards to the key length. Once the key length is exposed, the polyalphabetic cipher degrades into multiple independent Caesar ciphers — and you can pick them off one by one.


2. The Math Behind Vigenere

2.1 Encryption Formula

Let plaintext be P=P0P1P2P = P_0 P_1 P_2 \ldots, key be K=K0K1Km1K = K_0 K_1 \ldots K_{m-1} (length mm), ciphertext be CC:

Vigenere encryption:
Ci=(Pi+Kimodm)modnC_i = (P_i + K_{i \bmod m}) \bmod n

Vigenere decryption:
Pi=(CiKimodm+n)modnP_i = (C_i - K_{i \bmod m} + n) \bmod n

Where nn is the alphabet size (standard English: n=26n=26).

2.2 CrypTool-2 Implementation: Four Cipher Modes

CrypTool-2's Vigenere.cs doesn't just implement standard Vigenere — it packs three variants in as well. The internal enum defines 8 operation modes (lines 37–47):

csharp
1private enum CipherMode
2{
3    VigenereEncrypt,
4    VigenereDecrypt,
5    VigenereAutokeyEncrypt,
6    VigenereAutokeyDecrypt,
7    BeaufortEncrypt,
8    BeaufortDecrypt,
9    BeaufortAutokeyEncrypt,
10    BeaufortAutokeyDecrypt
11};

The mathematical relationships between the four cipher modes:

ModeEncryptionDecryptionKey Cycling
VigenereCi=(Pi+Kimodm)modnC_i = (P_i + K_{i \bmod m}) \bmod nPi=(CiKimodm)modnP_i = (C_i - K_{i \bmod m}) \bmod nKey repeats cyclically
Vigenere AutokeySame (but plaintext extends key after exhaustion)SameKey + plaintext
BeaufortCi=(KimodmPi)modnC_i = (K_{i \bmod m} - P_i) \bmod nSame as encryption (involution)Key repeats cyclically
Beaufort AutokeySame (but plaintext extends key after exhaustion)SameKey + plaintext

The core Crypt() method (lines 188–395) handles all 8 modes with one big switch. Here's the standard Vigenere encryption part (lines 219–231):

csharp
1case CipherMode.VigenereEncrypt:
2    cpos = (ppos + cfg.ShiftKey[shiftPos]) % alphabet.Length;
3    shiftPos++;
4    if (shiftPos >= cfg.ShiftKey.Length)
5    {
6        shiftPos = 0;  // Key wraps around
7    }
8    break;

Beaufort has a fun property: encryption and decryption use the same formula — mathematically, it's an involution (lines 247–259):

csharp
1case CipherMode.BeaufortEncrypt:
2case CipherMode.BeaufortDecrypt:
3    cpos = Mod((cfg.ShiftKey[shiftPos] - ppos), alphabet.Length);
4    break;

Note the custom Mod() method (lines 397–405) instead of C#'s native % operator — because C#'s % returns negative values for negative inputs, and cryptography needs non-negative modular results:

csharp
1private static int Mod(int number, int module)
2{
3    int result = number % module;
4    if ((result < 0 && module > 0) || (result > 0 && module < 0))
5    {
6        result += module;
7    }
8    return result;
9}

2.3 Autokey Mode

The key difference with Autokey: the key doesn't cycle. When it runs out, plaintext letters step in as the key. Here's Vigenere Autokey encryption (lines 261–283):

csharp
1case CipherMode.VigenereAutokeyEncrypt:
2    if (shiftPos < cfg.ShiftKey.Length)
3    {
4        // Key not exhausted: normal Vigenere encryption
5        cpos = (ppos + cfg.ShiftKey[shiftPos]) % alphabet.Length;
6        shiftPos++;
7    }
8    else
9    {
10        // Key exhausted: use plaintext letters as key
11        int pkey = alphabet.IndexOf(char.ToUpper(_inputString[autopos]));
12        while (pkey < 0)
13        {
14            autopos++;
15            pkey = alphabet.IndexOf(char.ToUpper(_inputString[autopos]));
16        }
17        cpos = (ppos + pkey) % alphabet.Length;
18        autopos++;
19    }
20    break;

Keep this mode in mind — it'll be a key player when we talk about when Kasiski attacks fall apart.


3. The Kasiski Attack: Step One — Finding the Key Length

3.1 The Core Insight

The starting point of the Kasiski attack is actually pretty simple:

If two identical plaintext fragments happen to line up with the same key positions during encryption, they'll produce identical ciphertext fragments too. And the distance between those two repeats must be a multiple of the key length.

An example makes it clear:

text
1Plaintext:  THE SUN AND THE MOON
2Key:        KEY KEY KEY KEY KEYK
3Ciphertext: DLC QSX KXH DLC WSKX

"THE" appears at position 1 and position 13, both times aligned with "KEY," so "DLC" repeats in the ciphertext. Distance = 13 - 1 = 12, and 12 = 3 × 4 — key length 3 is a factor of 12.

Of course, coincidences happen — two different plaintext fragments encrypted by different key positions might accidentally produce the same ciphertext. But statistics wins out: the true key length factor will show up more frequently than any others across all the distances.

PNG Image

3.2 CrypTool-2 Implementation: KasiskiTest.cs

Source file: CrypPlugins/KasiskiTest/KasiskiTest.cs (293 lines)

Three steps: clean the text → find repeated substrings → factorize the distances.

Step 1: Text preprocessing (lines 109–140)

First, clean up the text — whether to strip non-letter characters and whether to normalize case depends on user settings:

csharp
1string workString = stringInput;
2string workstring2 = "";
3
4if (settings.unknownSymbolHandling == 1) // discard unknown symbols
5{
6    char[] validchars = new char[] { 'a', 'b', 'c', ... 'Z' };
7    string strValidChars = new string(validchars);
8    StringBuilder workstring1 = new StringBuilder();
9    foreach (char c in workString.ToCharArray())
10    {
11        if (strValidChars.IndexOf(c) >= 0)
12        {
13            workstring1.Append(c);
14        }
15    }
16    workstring2 = workstring1.ToString();
17}
18
19if (settings.caseSensitivity == 0)
20{
21    workstring2 = workstring2.ToUpper();
22}

Step 2: Repeated substring search (lines 154–174)

This is where the real work happens. Three nested loops brute-force the search: outer loop scans n-gram lengths (from 3 to the user-set max), middle loop scans starting positions, inner loop looks forward for matches:

csharp
1for (int d = 3; d <= settings.grammLength; d++)       // n-gram length
2{
3    for (int i = 0; i <= workstring2.Length - settings.grammLength; i++)   // start position
4    {
5        grammToSearch = workstring2.Substring(i, d);   // extract current n-gram
6
7        for (int n = i + settings.grammLength; n <= workstring2.Length - settings.grammLength; n++)
8        {
9            if (grammToSearch == workstring2.Substring(n, d))  // found a repeat
10            {
11                distances.Add(n - i);           // record the distance
12                checkedGramms.Add(grammToSearch); // record the n-gram
13                break;                          // only record the first match
14            }
15        }
16    }
17}

Notice that break — it bails out after the first match. So if the same n-gram appears three times (positions A, B, C), only the A→B distance gets recorded; A→C and B→C are thrown away. Crude, but it also reduces noise.

Step 3: Distance factorization (lines 190–212)

Here's an interesting design choice — CrypTool-2 doesn't do standard prime factorization; it does divisibility testing:

csharp
1for (int z = 0; z <= distances.Count - 1; z++)       // iterate over each distance
2{
3    int numberToFactorize = Convert.ToInt32(distances[z]);
4    x = 0;
5
6    for (int y = 2; y <= settings.factorSize; y++)    // from 2 to max factor
7    {
8        if (numberToFactorize == 0) { break; }
9        if (numberToFactorize % y == 0)               // divisible = it's a factor
10        {
11            factors[z, x] = y;
12            x++;
13            factorCounter[y]++;                       // increment factor count
14        }
15    }
16}

Check out this commented-out code (lines 204–207):

csharp
1// while (numberToFactorize % y == 0)
2// {
3//     numberToFactorize = numberToFactorize / y;
4// }

The standard approach would repeatedly divide by each factor until it no longer divides evenly. But CrypTool-2 commented that out in favor of pure divisibility testing. Take distance 12:

  • Prime factorization yields: 2, 2, 3 (12 = 2² × 3)
  • Divisibility testing yields: 2, 3, 4, 6 (12 is divisible by all of these)

Why is divisibility testing actually better here? Because the key length isn't necessarily prime. If the key length is 6, then distances 12, 18, and 24 are all divisible by 6, so "6" keeps racking up points in factorCounter. With pure prime factorization, you'd get 2 and 3, and you'd need an extra combination step to guess that the key might be 6 — an unnecessary detour.

In the end, the highest-scoring factor in factorCounter is the most likely key length. Results are output as a histogram (lines 226–231), and in CrypTool-2's UI the relative scores are immediately obvious.


4. The Friedman Test: Statistics Lends a Hand

4.1 The Math

The Friedman test (also called the Kappa test), proposed by William Friedman in 1922, takes a completely different approach from Kasiski — it uses the Index of Coincidence (IC) to estimate key length.

The Index of Coincidence is the probability that two randomly picked letters from a text happen to be the same:

κ=i=1cni(ni1)N(N1)\kappa = \frac{\sum_{i=1}^{c} n_i(n_i - 1)}{N(N-1)}

  • Natural English: κp0.0667\kappa_p \approx 0.0667
  • Uniform random distribution: κr=1/c0.0385\kappa_r = 1/c \approx 0.0385 (26 letters)

For a Vigenere ciphertext with key length mm, the IC falls somewhere between natural language and random. You can rearrange to estimate the key length:

m(κpκr)N(N1)κciphertextκrN+κpm \approx \frac{(\kappa_p - \kappa_r) \cdot N}{(N-1) \cdot \kappa_{\text{ciphertext}} - \kappa_r \cdot N + \kappa_p}

4.2 CrypTool-2 Implementation: FriedmanTest.cs

Source file: CrypPlugins/FriedmanTest/FriedmanTest.cs (186 lines)

Multi-language Kappa values (lines 110–118) — hard-coded IC reference values for 6 languages:

csharp
1switch (settings.Kappa)
2{
3    case 1: Kp = 0.0762; break;     // German
4    case 2: Kp = 0.0778; break;     // French
5    case 3: Kp = 0.0770; break;     // Spanish
6    case 4: Kp = 0.0738; break;     // Italian
7    case 5: Kp = 0.0745; break;     // Portuguese
8    default: Kp = 0.0667; break;    // English
9}

Core calculation (lines 129–136) — plugs straight into the formula:

csharp
1double cipherTextLength = arrayInput.Sum();
2double countDoubleCharacters = arrayInput.Sum(i => i * ((double)i - 1));
3
4double Keqdist = 1.0 / 26;  // IC for uniform random distribution
5kappaCiphertext = countDoubleCharacters / (cipherTextLength * (cipherTextLength - 1));
6keyLength = ((Kp - Keqdist) * cipherTextLength) /
7            ((cipherTextLength - 1) * kappaCiphertext - (Keqdist * cipherTextLength) + Kp);

Mono/poly determination (line 138):

csharp
1string ciphermode = (Math.Abs(Kp - kappaCiphertext) > 0.01)
2    ? "polyalphabetic"
3    : "monoalphabetic/cleartext";

Pretty blunt logic: if the ciphertext IC differs from the target language IC by more than 0.01, it's polyalphabetic. Otherwise it's monoalphabetic or plaintext. The 0.01 threshold is purely empirical.

4.3 Where Friedman Fits In

Friedman gives you a continuous estimate — it might spit out 4.73. Kasiski gives you a discrete candidate set — 2, 3, 6, that sort of thing. Put them together for cross-validation and confidence goes up: Friedman says "somewhere around 5," Kasiski says "6 has the highest score" — the key length is probably 5 or 6.

But the Friedman test has its weak spots: it assumes key letters are uniformly distributed and that single-letter frequencies are consistent across all positions. With short ciphertexts, the estimate wobbles all over the place and isn't very reliable.


5. Autocorrelation: A Third Path

5.1 How It Works

The autocorrelation approach is more intuitive than both Kasiski and Friedman: shift the ciphertext against itself and count how many positions have matching letters.

Why does this work? If the shift happens to equal the key length (or a multiple of it), the original and shifted texts use the same key letter at each position — you're essentially comparing two monoalphabetic substitution ciphertexts. The probability of matching letters jumps back to natural language IC levels (~0.067), well above the random case (~0.038).

5.2 CrypTool-2 Implementation: AutocorrelationFunction.cs

Source file: CrypPlugins/AutocorrelationFunction/AutocorrelationFunction.cs (227 lines)

Maximum test shift is hard-coded to 31 (line 32):

csharp
1private const int MAX_TESTED_SHIFTS = 31;

Text preprocessing (lines 183–196) — the usual: strip non-letter characters, convert to uppercase:

csharp
1private string PrepareForAnalysis(string text)
2{
3    StringBuilder stringBuilder = new StringBuilder();
4    text = text.ToUpper();
5    for (int offset = 0; offset < text.Length; offset++)
6    {
7        if (Alphabet.Contains(text.Substring(offset, 1)))
8        {
9            stringBuilder.Append(text[offset]);
10        }
11    }
12    return stringBuilder.ToString();
13}

Shift matching (lines 103–124):

csharp
1for (int shift = 0; shift < MAX_TESTED_SHIFTS; shift++)
2{
3    int match = 0;
4    for (int offset = 0; offset < _inputText.Length - shift; offset++)
5    {
6        if (_inputText[offset] == _inputText[offset + shift])
7        {
8            match++;
9        }
10    }
11    _autocorrelationValues[shift] = match;
12}

Finding the peak (lines 128–136) — starts at shift=1 (shift=0 is comparing the text against itself, 100% match, meaningless):

csharp
1for (int i = 1; i < MAX_TESTED_SHIFTS; i++)
2{
3    if (_autocorrelationValues[i] > _probablekorr)
4    {
5        _probablekorr = _autocorrelationValues[i];
6        _probableLength = i;
7    }
8}

The shift with the highest match count is the estimated key length. Plot it as a histogram and the peak jumps right out.

5.3 Limitations

Max shift is hard-coded at 31, so key lengths above 30 are invisible. Good enough for most educational scenarios, but long keys are out of reach.


6. Key Recovery: Hill Climbing

Once you have the key length, the next step is digging out the actual key content. CrypTool-2's VigenereAnalyzer uses hill climbing for this.

6.1 Hill Climbing Overview

Source file: CrypPlugins/VigenereAnalyzer/VigenereAnalyzer.cs (816 lines)

The core logic lives in the Hillclimb() method (lines 246–390). The idea is dead simple: for each candidate key length, generate a random starting key, then try every position one at a time — swap position i to letter j, decrypt, score with N-grams. If the score goes up, keep the change; if not, revert. Repeat until no improvement is found. Do this whole thing many times with different random starting points, and dump all the best results into a leaderboard.

6.2 Initial Key Generation

Lines 263–267 — randomly pick a letter from the alphabet for each key position:

csharp
1for (int i = 0; i < runkey.Length; i++)
2{
3    runkey[i] = numalphabet[random.Next(alphabetlength)];
4}

6.3 Position-by-Position Optimization

Lines 296–363 — two nested loops, outer over key positions, inner over the alphabet:

csharp
1do
2{
3    foundbetter = false;
4    for (int i = 0; i < keylength; i++)        // each key position
5    {
6        for (int j = 0; j < alphabetlength; j++) // each letter in the alphabet
7        {
8            int oldLetter = runkey[i];
9            runkey[i] = j;
10            plaintext = decryptFunction(plaintext, runkey, numvigalphabet, i, ciphertext);
11
12            double costvalue = 0;
13            if(_settings.KeyStyle == KeyStyle.Random)
14            {
15                costvalue = _grams.CalculateCost(plaintext);
16            }
17            else
18            {
19                var plaintext_and_key = AppendIntegerArrays(plaintext, runkey);
20                costvalue = _grams.CalculateCost(plaintext_and_key);
21            }
22
23            if (costvalue > bestkeycost)
24            {
25                bestkeycost = costvalue;
26                bestkey = (int[])runkey.Clone();
27                foundbetter = true;
28            }
29            else
30            {
31                runkey[i] = oldLetter;  // revert key
32                if (j == alphabetlength - 1)
33                {
34                    // tried all letters with no improvement, restore original decryption
35                    plaintext = decryptFunction(plaintext, runkey, numvigalphabet, i, ciphertext);
36                }
37            }
38        }
39    }
40    runkey = (int[])bestkey.Clone();
41} while (foundbetter);

6.4 "InPlace" Optimization: Don't Re-decrypt Everything

Here's a neat optimization. You changed position ii of the key — do you really need to re-decrypt the entire ciphertext? Of course not. Only plaintext positions i,i+m,i+2m,i, i+m, i+2m, \ldots are affected; everything else stays exactly the same.

Here's the InPlace decryption for standard Vigenere (lines 542–557):

csharp
1public static int[] DecryptVigenereOwnAlphabetInPlace(
2    int[] plaintext, int[] key, int[] alphabet, int offset, int[] oldciphertext)
3{
4    int plaintextlength = plaintext.Length;
5    int keylength = key.Length;
6    int alphabetlength = alphabet.Length;
7    int[] lookup = new int[alphabetlength];
8    for (int position = 0; position < alphabetlength; position++)
9    {
10        lookup[alphabet[position]] = position;
11    }
12    // Key: only update positions offset, offset+keylength, offset+2*keylength, ...
13    for (int position = offset; position < plaintextlength; position += keylength)
14    {
15        plaintext[position] = alphabet[
16            (lookup[oldciphertext[position]] - lookup[key[position % keylength]] + alphabetlength) % alphabetlength];
17    }
18    return plaintext;
19}

Compare this with the full decryption method (lines 490–506) and the difference is clear: the loop stride goes from 1 to keylength. For a ciphertext of 1000 characters with key length 5, InPlace only computes 200 positions per call versus 1000 for the full version. The hill climbing inner loop runs "key positions × alphabet size" iterations, saving 80% of decryption work each time — that adds up fast.

6.5 When the Key Is Also "Real Language": The KeyStyle Trick

VigenereAnalyzerSettings.cs lines 38–42 define two key styles:

csharp
1public enum KeyStyle
2{
3    Random = 0,
4    NaturalLanguage = 1
5}

When KeyStyle is set to NaturalLanguage, scoring doesn't just evaluate the plaintext — the key gets scored too (lines 315–319):

csharp
1var plaintext_and_key = AppendIntegerArrays(plaintext, runkey);
2costvalue = _grams.CalculateCost(plaintext_and_key);

Why? A lot of the time, the key itself is a meaningful word — "LEMON," "SECRET," that sort of thing. Concatenating plaintext and key before scoring gives you an extra signal source. Especially useful for short ciphertexts: the plaintext statistics alone might be too noisy to be stable, but adding the key's language features helps hill climbing find the right direction.

6.6 Random Restarts and BestList

Hill climbing's classic weakness: it gets stuck in local optima. The fix? Run it multiple times. The while (restarts > 0) loop at line 261 restarts with a different random key each time.

All the good results get tossed into a sorted list (lines 412–481), capped at 100 entries (MaxBestListEntries = 100, line 41). Ranked by N-gram score from high to low, users can browse the results in the UI and pick the right one.

The insertion logic (lines 461–462) maintains sort order:

csharp
1int insertIndex = _presentation.BestList.TakeWhile(e => e.Value > entry.Value).Count();
2_presentation.BestList.Insert(insertIndex, entry);

7. When Does This Whole Thing Fall Apart?

The Kasiski attack looks slick, but it relies on a set of assumptions. Break any of them and the entire attack chain collapses.

7.1 Ciphertext Too Short — Not Enough Data

The Kasiski test needs enough repeated n-grams in the ciphertext. A few dozen characters? The probability of 3-gram repeats is vanishingly small — you can't even gather enough distance samples for meaningful factor analysis.

Friedman is the same story — IC needs sufficient data to stabilize. Short ciphertext IC values wobble around, and the estimated key length might be off by a factor of several.

Autocorrelation gets it even worse with short texts: "peaks" everywhere, no way to separate real periodic signal from noise.

7.2 Key as Long as the Plaintext — That's a One-Time Pad

Key length mm equals plaintext length NN? Congratulations, that's a One-Time Pad — Shannon (1949) proved it to be information-theoretically unbreakable, the perfect secrecy system.

The key never cycles, so the premise of "same key positions encrypting same plaintext produce repeats" simply doesn't exist. Kasiski can't find meaningful repeated n-grams. Friedman's formula degenerates into an indeterminate form as mNm \to N.

Even if the key is just "close to" plaintext length (say m>N/2m > N/2), it's rough going. Each key position covers only 1–2 characters — frequency analysis has nothing to work with.

7.3 Plaintext Isn't Natural Language — Compressed/Random Data

All these statistical attacks — Kasiski, Friedman, autocorrelation, N-gram hill climbing — share an implicit assumption: the plaintext is natural language with non-uniform letter distribution and predictable N-gram patterns.

If the plaintext is:

  • Compressed data (near-random distribution, IC ≈ 0.038)
  • Already encrypted data (encrypted with another algorithm before Vigenere)
  • Random byte sequences

— then you're stuck. Even if you know the key length, hill climbing can't tell the right key from a wrong one. Every decryption result looks equally "uniform" statistically.

7.4 Autokey Mode — No More Cycling

Remember Autokey from earlier (Vigenere.cs lines 261–283)? After the key runs out, plaintext letters become the key. The effective key sequence becomes K0,K1,,Km1,P0,P1,K_0, K_1, \ldots, K_{m-1}, P_0, P_1, \ldotsno periodicity whatsoever.

Kasiski's lifeline — "same plaintext meeting same key position produces repeated ciphertext" — is severed under Autokey. After the mm-th letter, the "key" is the plaintext itself; there's nothing to cycle.

CrypTool-2's VigenereAnalyzer can still attempt hill climbing on Autokey ciphers (it has a dedicated DecryptVigenereAutokeyOwnAlphabet method), but the Kasiski and Friedman key length estimation step becomes completely useless.

7.5 Non-Standard Alphabets — Large Alphabets Dilute Frequency Signatures

CrypTool-2 supports custom alphabets (Vigenere.cs lines 96–108, AlphabetSymbols property) — users can pass in any character sequence.

But if you use a 256-character alphabet (like full ASCII), each character's frequency gets spread very thin. IC approaches 1/2560.00391/256 \approx 0.0039, virtually indistinguishable from random. N-gram statistics are even worse — array size is the alphabet length to the Nth power, and 2565256^5 will blow up your memory. High-order N-gram scoring is simply not feasible.

7.6 Key Length Equals Plaintext Length with Random Content

Even if it's not a strict One-Time Pad (the key might have some structure), as long as the key length equals the plaintext length and the content is random, the attack is effectively hopeless. This is just the generalized version of 7.2.

7.7 Wrong Language or Mixed Languages

N-gram scoring must use the correct language model. CrypTool-2's FriedmanTest provides IC reference values for 6 languages, and LanguageStatisticsLib supports N-gram data for 15 languages. But if the plaintext uses a language that's not in the list (Arabic, Chinese, etc.), or mixes several languages, the scoring function will give misleading results.

The classic failure mode: you score a German ciphertext with English N-grams, hill climbing happily converges to an "English-looking" key — which is completely wrong.


8. CrypTool-2's Engineering Details

8.1 Four Cipher Modes, One Codebase

Vigenere.cs handles all 8 encryption/decryption operations with a single Crypt() method and a CipherMode enum. The analyzer follows the same pattern — VigenereAnalyzer.cs uses a DecryptFunction delegate (line 33) plus a Mode enum (VigenereAnalyzerSettings.cs lines 23–29) for unified dispatch:

csharp
1// Delegate definition (VigenereAnalyzer.cs line 33)
2public delegate int[] DecryptFunction(
3    int[] plaintext, int[] key, int[] alphabet, int offset, int[] oldciphertext);
4
5// Enum (VigenereAnalyzerSettings.cs lines 23–29)
6public enum Mode
7{
8    Vigenere = 0,
9    VigenereAutokey = 1,
10    Beaufort,
11    BeaufortAutokey,
12};

In the hill climbing code, the decryption function is bound dynamically via delegate (lines 275–294), letting the same climbing logic switch between four cipher modes without touching a single line of code — textbook Strategy Pattern.

8.2 Custom Alphabets — Permuted Alphabets Welcome

VigenereAnalyzer accepts an externally provided alphabet (the VigenereAlphabet property, lines 78–79), with deduplication and sorting in Execute() (line 152):

csharp
1Alphabet = string.Join("", VigenereAlphabet.Distinct().OrderBy(q => q).ToArray());
2_grams.ReduceAlphabet(Alphabet);
3_grams.Normalize(10_000_000);

This means if the Vigenere cipher uses a permuted alphabet (say QWERTYUIOPASDFGHJKLZXCVBNM instead of ABCDEFGHIJKLMNOPQRSTUVWXYZ), just feed in the right alphabet and the analyzer works just the same.

8.3 N-gram Statistics for 15 Languages

VigenereAnalyzer.cs line 147 loads the target language's N-gram data:

csharp
1_grams = LanguageStatistics.CreateGrams(
2    _settings.Language,
3    DirectoryHelper.DirectoryLanguageStatistics,
4    (GramsType)(_settings.GramsType + 1),
5    false);

Coverage spans most European languages (English, German, French, Spanish, Italian, Portuguese, Dutch, Swedish, Polish, Czech, Hungarian, Latin, Russian, Greek, Turkish), with independent N-gram frequency files for each, from 1-gram through 6-gram.

8.4 InPlace Decryption Optimization

Covered in section 6.4 already. Each cipher mode has two decryption methods:

  • Full version (e.g., DecryptVigenereOwnAlphabet, lines 490–506): decrypts the entire ciphertext
  • InPlace version (e.g., DecryptVigenereOwnAlphabetInPlace, lines 542–557): only updates positions affected by the key change

Full version handles initialization and final output; InPlace version powers the hill climbing inner loop. Both versions have identical logic — only the loop stride differs (1 vs keylength), so they never produce different results.

8.5 BestList: Top 100 Only

MaxBestListEntries = 100 (line 41) caps the list size. The AddNewBestListEntry method (lines 412–481) handles insertion and trimming:

csharp
1// Skip results that aren't good enough
2if (_presentation.BestList.Count > 0 && value <= _presentation.BestList.Last().Value)
3{
4    return;
5}
6
7// Sorted insertion
8int insertIndex = _presentation.BestList.TakeWhile(e => e.Value > entry.Value).Count();
9_presentation.BestList.Insert(insertIndex, entry);
10
11// Trim
12if (_presentation.BestList.Count > MaxBestListEntries)
13{
14    _presentation.BestList.RemoveAt(MaxBestListEntries);
15}

The list stays sorted by N-gram score from high to low. Anything beyond 100 entries gets the lowest scorer kicked out. Users browse the Top 100 in the UI and pick the winner themselves.

8.6 Kasiski Factorization Design Choices

As discussed in section 3.2: CrypTool-2 uses divisibility testing rather than prime factorization. The commented-out code (KasiskiTest.cs lines 204–207) shows the developers tried the standard approach before deliberately choosing divisibility testing — better suited for key length estimation.

And that break (line 167) isn't laziness either. Matching only the first occurrence of each n-gram actually serves as noise reduction — if "THE" appears 10 times in the ciphertext (many probably coincidental), without the break you'd generate 45 distance values and pollute the factor statistics.

8.7 Lookup Arrays — Trading Space for Speed

Every decryption method in VigenereAnalyzer uses lookup arrays to speed up alphabet index queries. Here's the standard Vigenere version (lines 496–503):

csharp
1int[] lookup = new int[alphabetlength];
2for (int position = 0; position < alphabetlength; position++)
3{
4    lookup[alphabet[position]] = position;
5}
6for (int position = 0; position < plaintextlength; position++)
7{
8    plaintext[position] = alphabet[
9        (lookup[ciphertext[position]] - lookup[key[position % keylength]] + alphabetlength) % alphabetlength];
10}

The lookup array precomputes the mapping from letter values to indices. In the loop, it's a direct array access instead of repeated IndexOf calls. O(1) array lookup vs O(n) linear search — in the hill climbing hot loop, that difference is significant.


9. The Complete Attack Flow

Putting all these pieces together, CrypTool-2's full Vigenere attack goes roughly like this:

Given a ciphertext, you first run three tools in parallel: Kasiski outputs a factor distribution histogram, Friedman gives a continuous key length estimate, and autocorrelation produces a shift-match peak chart. Cross-validate the three results to nail down the key length.

Once the key length is set, hand the ciphertext and candidate length to VigenereAnalyzer. Hill climbing goes to work: random starting point, letter-by-letter testing, N-gram scoring, keep the best, restart many times. When it's done, the BestList holds the Top 100 candidates — the user browses through and picks the correct key and plaintext.

But this pipeline only works when several conditions are all met simultaneously:

  1. Ciphertext is long enough — provides statistical samples for Kasiski/Friedman
  2. Key cycles repeatedly — Autokey and One-Time Pad don't qualify
  3. Plaintext is natural language — N-gram scoring needs something to differentiate
  4. Language is guessed correctly — the cost function needs the right language model
  5. Alphabet is standard — N-gram statistics data must be applicable

10. Key Source File Index

File PathLinesCore Function
CrypPlugins/Vigenere/Vigenere.cs502Encryption/decryption for four Vigenere variants
CrypPlugins/KasiskiTest/KasiskiTest.cs293Kasiski test: repeated n-gram search + divisibility factor detection
CrypPlugins/FriedmanTest/FriedmanTest.cs186Friedman test: IC calculation + key length estimation (6 languages)
CrypPlugins/AutocorrelationFunction/AutocorrelationFunction.cs227Autocorrelation: shift match counting (max shift 31)
CrypPlugins/VigenereAnalyzer/VigenereAnalyzer.cs816Hill climbing key recovery: InPlace optimization + BestList Top 100
CrypPlugins/VigenereAnalyzer/VigenereAnalyzerSettings.csAnalyzer config: Mode/KeyStyle/GramsType enums
LibSource/LanguageStatisticsLib/LanguageStatistics.csLanguage statistics facade: 15 language alphabets + N-gram factory
LibSource/LanguageStatisticsLib/Grams/Grams.csN-gram scoring abstract base class
CrypPlugins/CostFunction/CostFunction.csGeneral cost function (IoC/Entropy/NGram/RegEx)

References

Friedman, W. F. (1922). The index of coincidence and its applications in cryptography (Riverbank Publication No. 22). Riverbank Laboratories.

Kasiski, F. W. (1863). Die Geheimschriften und die Dechiffrir-Kunst. E. S. Mittler und Sohn.

Kopal, N. (2018). Solving classical ciphers with CrypTool 2. In Proceedings of the 1st International Conference on Historical Cryptology (HistoCrypt 2018) (Vol. 149, pp. 29–38). Linköping University Electronic Press.

Shannon, C. E. (1949). Communication theory of secrecy systems. Bell System Technical Journal, 28(4), 656–715.

Enjoyed this article?

Share it with your friends and colleagues!

Welcome
Last updated: March 8, 2026
相关文章
正在检查服务状态...
Breaking the "Unbreakable Cipher": A Full Walkthrough of the Kasiski Attack on Vigenere - ICTRUN