|
|
|
|
Voice scramblers and encryption devices
This section deals with secure speech equipment, such as voice encryption
devices, from a variety of manufacturers. Such devices come in many flavours,
ranging from simple voice scramblers to digital voice encryptors.
Most of the devices shown below, are also featured elsewhere on this website
as they fall into multiple categories.
Secure telephones are a class of their own,
but since they also belong to the group of voice encryption devices,
they are linked from this page
as well.
|
 |
Voice encryption units on this website
|
 |
 |
Secure speech systems are known by various names, such as Voice Privacy
Unit, Secure Speech System, Voice Protection Device, Speech Encryptor,
Voce Encryption Device, Secure Voice Device, etc.
Basically, there are only two systems for voice protection:
|
- A.1 - Frequency domain voice scrambler
In this analogue system, the frequency domain of the human speech is
mirrored and/or transposed around a given center frequency, so that
it becomes unintelligible. Such systems can easily be broken, even if
the audio band is split into multiple smaller bands first.
- A.2 - Time domain voice scrambler
In this system, tspeech is first stored in memory,
after which the individual parts are scrambled in the time domain.
It is more secure than a frequency domain scrambler, but can still be
broken as the individual sound samples still bear the properties of speech.
- A.3 - Frequency and Time Domain voice scrambler
This system, also known as an F/T Scrambler, is a combination of the
above methods. It is the most complex one, but can still be broken with
the right equipment, no matter how complex the randomizer is, as the
individual samples still bear the properties of speech.
- B - Digital Encryption
This method uses a digital representation of the analogue voice signal
(samples), which is mixed with a digital key stream. This method is much
safer than the ones above and is the only one that can really be called
encryption.
|
Before digital speech encryption became widely available,
analogue techniques were used to protect voice transmissions.
This technique is commonly known as voice scrambling and comes
in three flavours which are further explained below.
Scramblers are inherently insecure and only provide protection against
an occasional eavesdropper, such as the telephone exchange operator.
|
 |
A.1 — Frequency domain scrambling
FD
|
 |
 |
The oldest method uses frequency inversion
and is also known as voice inversion.
It is based on mirroring of the audio frequency spectrum around a given
center frequency, and can be applied to a discrete number of sub-bands.
This principle is best explained using a simplified model:
The audio spectrum of the voice data (1) is mixed with a fixed
carrier frequency fc (2).
This results in two spectra: one that is the sum
of the original sectrum and the carrier (3),
and one that is the difference of the two signals (4).
A low-pass filter (LPF) is then applied to filter-off the
sum and leave only the difference, effectively resulting in a mirrored
audio band (5).
At the receiving end, this process of mirroring of the spectrum is repeated
to make the speech 'legible' again:
The advantage of this technique is that it completely takes place within the
audio bandwidth of a channel, whereas digital encryption generally requires
more space. This allows scrambling to be used in existing systems.
At the time, scramblers were also cheaper than
digital encryptors, which is why scramblers were used by the police
in many countries from the 1970's well into the 1990's.
The disadvantage of this method is that an evesdropper can easily reverse
the mirroring process with a simple electronic circuit.
In addition, experienced listeners could sometimes even extract useful
information from the seemingly garbled speech directly, without a descrambling
circuit.
Although voice inversion is commonly achieved by using an electronic
diode-based ring mixer, the French inventor and engineer
Jules Carpentier showed in 1919
that the same effect can be obtained mechanically, by
using a motor-driven commutator running at the centre frequency.
In a more complex scheme, one could vary the carrier frequency and also
split-up the audio band into several (e.g. five) smaller bands that are then
mirrored individually. In addition, the individual frequency bands can be
swapped as shown in the rightmost diagram above.
Continuously varying these parameters by putting them
under digital control, can make it harder to decode the signal.
|
 |
Examples of frequency domain scramblers
|
 |
 |
 |
A.2 — Time domain scrambling
TD
|
 |
 |
Another method for speech protection is the so-called time-division or
time-domain (TD) speech scrambling.
This method is more secure than the simpler
frequency-inversion system, but far less secure than modern
digital speech encryptors.
The simplified diagram below, shows how it works.
Sampled speech data is cut into a number of small fragments which are then
scrambled in an ever changing order. The order in which the packets are
scrambled is determined by a pseudo random number generator (PRNG)
which is seeded (initialised) by the user by means of a
secret KEY.
In the diagram above, the top row shows the clear speech (input) in time.
The second row shows the speech after it is scrambled.
The bottom row finally shows the speech once it is descrambled again (output).
The whole process of scrambling and descrambling causes a noticable delay
which is typically in the range of 0.3 to 0.6 seconds or even longer.
Delays like this are often unacceptable as they can lead to confusion.
As the time segments are scrambled in an ever changing pattern, it is important
that transmitter and receiver are correctly synchronised. To ensure that both
ends are kept in sync, a pilot signal is transmitted with the
scrambled speech by means of Audio Frequency Shift Keying (AFSK).
An example of a speech scrambler that uses Time Domain Scrambling, is the
BBC Cryptophon 1100.
Although scramblers of this type are not safe, many police and other
law enforcement agencies around the world, used this method for securing
their conversations for many years, as it has the advantage that it can be
used on existing narrow-band FM radio channels.
Despite the fact that the experienced listener can't make any sense
of the garbles, the system is prone to cryptanalytic attacks.
It is possible to reconstruct the original signal,
without knowning the key or the PRNG,
by using a computer to analyse the signal to find any discontinuities,
and then reorder the frames.
|
 |
Examples of time domain scramblers
|
 |
 |
 |
A.3 — Frequency and Time domain scrambling
F/T
|
 |
 |
The third and most complex type of voice scrambler, is the so-called
Frequency and Time Domain Scrambler, also known as the F/T Scrambler,
which is basically a combination of the two methods explained above.
This solution is also known as two-dimensional voice scrambling.
Although scrambling and descrambling of this method is much more complex,
the system is equally prone to cryptanalysis as the previous ones.
Any kind of analogue scrambling is inherently insecure.
The diagram above shows roughly how it works. The audio spectrum is
divided into a number of discrete sub-bands (here shown in different
colours and numbered I to VIII). The sub-bands are then sampled
individually, after which they are stored in a memory buffer
in a pseudo random pattern that transposes their place in the sub-band
order (i.e. the frequency domain) as well as their place in time
(i.e. the time domain). The resulting pattern now looks something like this:
The samples for each of the sub-bands are then read from memory,
and combined into a new (scrambled) voice signal that still fits the original
3 kHz bandwidth. The black curve in the diagrams above illustrates how the
signal has changed from its original shape.
In older voice scramblers, band splitting is generally achieved with discrete
filters, mixers and other electronic parts, whereas in modern devices this
is commonly done with a Digital Signal Processor (DSP).
|
Below are some examples of scrambled speech.
These samples were recorded by Barry Wels [1] from the built-in analogue
voice scrambler of the Icom IC-H11 radio. If you listen carefully to the
scrambled audio, you may be able to descramble some of it yourself
with a little exercise.
|
 |
Examples of frequency and time domain (F/T) scramblers
|
 |
 |
In its simplest form, a digital voice encryption device digitizes the
voice signal by means of an Analog-to-Digital Convertor (ADC).
The resulting data stream is them 'mixed' by means of an
XOR-operation with a key stream that is generated by
a Pseudo-Random Number Generator (PRNG). In this context, the latter is
also known as a Key Generator (KG). The resulting ciphertext is then
converted back to the analogue domain by means of a Digital-to-Analog Converter
(DAC).
The Key Generator (KG, PRNG) is seeded by a KEY that is
entered manually or by means of a key transfer device or fill gun.
Modern systems sometimes use Public Key Encryption (PKE) to pass
the key over an insecure channel (e.g. Diffie-Hellman key exchange).
Due to the fact that the digital XOR-operation (modulo-2 addition)
is used for mixing the plaintext with the key stream, the same opration
can be used for decryption. This principle is also known as the
Vernam Cipher.
|
Before speech can be encrypted, it must be converted from the analogue to the
digital domain, by means of a sampler, or digitizer, or Analog-to-Digital
Converter (ADC).
In the 1970s, devices like KY-57 (VINSON) and
Spendex 10
used Continuous Variable Slope Delta-modulation (CVSD) to convert
speech to digital data. This wide-band solution is only suitable
for VHF and UHF radio.
Generally speaking, a digital signal needs more bandwidth than its analogue
equivalent (typically twice the bandwidth), but methods have been developed
to compress the data, allowing it to be send over a narrowband (3 kHz)
channel. At the other end, the data must then be decompressed before it
can be used. Such a compressor/decompressor is commonly known as a CODEC.
An example of a voice compression algorithm, also known as a VOCODER, is
Linear Predictive Coding (LPC-10e), developed in the 1970s
by the US Department of Defense. It analyses the voice data and converts
it to a set of coefficients, which are then sent as numeric values.
At the receiving end, these coefficients are used to reconstruct,
or synthesize, the original sound. LPC-10 allows voice
data to be sent at 2400 baud, and LPC-10e can even be
used at 800 baud. The first vocoder, named VODER,
was developed at Bell Labs in 1939. Its principle was first used during
WWII on SIGSALY — the transatlantic secure telephone
line between Washington and London.
➤ Different CODECs
|
Once analogue speech has been digitized, it can be encrypted digitally,
by means of a variety of encryption algorithms. Some devices use
publicly available algorithms such as DES,
Triple-DES (3DES) or AES, but others use proprietary
encryption algorithms that are kept secret. An example of the latter
is SAVILLE that was jointly developed by GCHQ (UK)
and NSA (USA), and is still widely used in US/NATO military
equipment today.
|
Below are some sound samples of digitally encrypted speech,
recorded from an Icom IC-H10SR radio by Barry Wels [1].
The first file contains the original audio file. The second file plays
the encrypted audio. The last file finally contains the resulting
audio once it has been decrypted.
|
 |
Examples of digital voice encryptors
|
 |
 |
|
|
Any links shown in red are currently unavailable.
If you like the information on this website, why not make a donation?
© Crypto Museum. Created: Tuesday 04 August 2009. Last changed: Friday, 14 June 2024 - 08:49 CET.
|
 |
|
|
|