|
|
|
|
Standard Computer Crypto ← ITA-2
ITA-5, colloquially known as ASCII, is a 7-channel (later: 8-channel)
binary character-encoding scheme, derived from the older 5-channel
ITA2 standard
(also known as Murray code or
Baudot).
ASCII is the abbreviation of
American Standard Code for Information Interchange.
The standard defines the 128 codes that can be made with 7 bits, all based
on the English (Latin) alphabet. As there are variations between
different countries, the common standard is known as US-ASCII.
It was also the major encoding scheme on the
internet, until it was surpassed by UTF-8 in 2007.
Compared to the 5-channel ITA2 standard (Baudot-Murray),
where the characters are
sorted in such a way that they cause minimum (mechanical)
stress on the equipment, characters in the ASCII
table are sorted in the logical order of the alphabet.
ASCII is commonly used by computers
for storing programs (software) and information (data).
Initially, computer programs and data were stored on 7-channel punched
paper tape.
The first ASCII standard was published in 1963, with a major revision in 1967.
The most recent update was published in 1986.
Initially, ASCII was a 7-bit format, with the 7 channels on the punched paper tape
numbered from 1 to 7 (c1-c7).
For error checking, a so-called parity bit (P) was later added to the tape.
This increased the number of channels (now called bits) to 8.
Bits are generally are numberd from 0 onwards (b0-b7).
In many computer systems, the parity bit was later dropped in order to increase
to total number of possible characters from 128 to 256.
The ASCII standard defines only the bottom 128 characters.
Many different encoding schemes for 8-bit data exist, but the majority of them
uses ASCII to define the lower 128 characters. The upper half
– commonly known as the top-bit-set characters – are often used
for language-dependent encodings, such as ISO-8859-1 (Latin1) and
its Microsoft variant Windows-1252.
The definition of the lower 128 ASCII characters
is given below.
|
Hex
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
A
|
B
|
C
|
D
|
E
|
F
|
0
|
NUL
|
SOH
|
STX
|
ETX
|
EOT
|
ENQ
|
ACK
|
BEL
|
BS
|
HT
|
LF
|
VT
|
FF
|
CR
|
SO
|
SI
|
1
|
DLE
|
DC1
|
DC2
|
DC3
|
DC4
|
NAK
|
SYN
|
ETB
|
CAN
|
EM
|
SUB
|
ESC
|
FS
|
GS
|
RS
|
US
|
2
|
SP
|
!
|
"
|
#
|
$
|
%
|
&
|
'
|
(
|
)
|
*
|
+
|
,
|
-
|
.
|
/
|
3
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
:
|
;
|
<
|
=
|
>
|
?
|
4
|
@
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
J
|
K
|
L
|
M
|
N
|
O
|
5
|
P
|
Q
|
R
|
S
|
T
|
U
|
V
|
W
|
X
|
Y
|
Z
|
[
|
\
|
]
|
^
|
_
|
6
|
`
|
a
|
b
|
c
|
d
|
e
|
f
|
g
|
h
|
i
|
j
|
k
|
l
|
m
|
n
|
o
|
7
|
p
|
q
|
r
|
s
|
t
|
u
|
v
|
w
|
x
|
y
|
z
|
{
|
|
|
}
|
~
|
DEL
|
|
The first 32 characters are unprintable. They are known as the control
characters and are mainly used for text formatting on teleprinters and
on the first generations of video terminals. Character 127 is also
special. It is used to delete a character, which – in 7-channel paper
tape – was done by punching all 7 holes. Control characters are often
written as ^@ (NUL), ^A (SOH), ^B (STX), etc.
|
Dec
|
Hex
|
Binary
|
Char
|
Name
|
Description
|
Control characters (C0)
|
0
|
00
|
0000 0000
|
^@
|
NUL
|
Null character (nothing)
|
1
|
01
|
0000 0001
|
^A
|
SOH
|
Start of header
|
2
|
02
|
0000 0010
|
^B
|
STX
|
Start of text
|
3
|
03
|
0000 0011
|
^C
|
ETX
|
End of text
|
4
|
04
|
0000 0100
|
^D
|
EOT
|
End of transmission
|
5
|
05
|
0000 0101
|
^E
|
ENQ
|
Enquiry
|
6
|
06
|
0000 0110
|
^F
|
ACK
|
Acknowlegment
|
7
|
07
|
0000 0111
|
^G
|
BEL
|
Bell (acoustic signal)
|
8
|
08
|
0000 1000
|
^H
|
BS
|
Backspace
|
9
|
09
|
0000 1001
|
^I
|
HT
|
Horizontal Tab
|
10
|
0A
|
0000 1010
|
^J
|
LF
|
Line feed
|
11
|
0B
|
0000 1011
|
^K
|
VT
|
Vertical Tab
|
12
|
0C
|
0000 1100
|
^L
|
FF
|
Form feed (clear screen)
|
13
|
0D
|
0000 1101
|
^M
|
CR
|
Carriage return (enter)
|
14
|
0E
|
0000 1110
|
^N
|
SO
|
Shift Out
|
15
|
0F
|
0000 1111
|
^O
|
SI
|
Shift In
|
|
16
|
10
|
0001 0000
|
^P
|
DLE
|
Data Link Escape
|
17
|
11
|
0001 0001
|
^Q
|
DC1
|
Device Control 1 (XON)
|
18
|
12
|
0001 0010
|
^R
|
DC2
|
Device Control 2
|
19
|
13
|
0001 0011
|
^S
|
DC3
|
Device Control 3 (XOFF)
|
20
|
14
|
00010100
|
^T
|
DC4
|
Device Control 4
|
21
|
15
|
0001 0101
|
^U
|
NAK
|
Negative Acknowledgment
|
22
|
16
|
0001 0110
|
^V
|
SYN
|
Synchronous idle
|
23
|
17
|
0001 0111
|
^W
|
ETB
|
End of Transmission Block
|
24
|
18
|
0001 1000
|
^X
|
CAN
|
Cancel
|
25
|
19
|
0001 1001
|
^Y
|
EM
|
End of Medium
|
26
|
1A
|
0001 1010
|
^Z
|
SUB
|
Substitute (also: End of File, EOF)
|
27
|
1B
|
0001 1011
|
^[
|
ESC
|
Escape
|
28
|
1C
|
0001 1100
|
^\
|
FS
|
File Separator
|
29
|
1D
|
0001 1101
|
^]
|
GS
|
Group Separator
|
30
|
1E
|
0001 1110
|
^^
|
RS
|
Record Separator
|
31
|
1F
|
0001 1111
|
^_
|
US
|
Unit Separator
|
Printable characters
|
32
|
20
|
0010 0000
|
|
SP
|
SPACE character (printable)
|
|
33
|
21
|
0010 0001
|
!
|
|
Exclamation mark (pling)
|
34
|
22
|
0010 0010
|
"
|
|
Quote
|
35
|
23
|
0010 0011
|
#
|
|
Hash
|
36
|
24
|
0010 0100
|
$
|
|
Dollar
|
37
|
25
|
0010 0101
|
%
|
|
Percent
|
38
|
26
|
0010 0110
|
&
|
|
Ampersand
|
39
|
27
|
0010 0111
|
'
|
|
Apostrophy
|
40
|
28
|
0010 1000
|
(
|
|
Left bracket
|
41
|
29
|
0010 1001
|
)
|
|
Right bracket
|
42
|
2A
|
0010 1010
|
*
|
|
Star
|
43
|
2B
|
0010 1011
|
+
|
|
Plus
|
44
|
2C
|
0010 1100
|
,
|
|
Comma
|
45
|
2D
|
0010 1101
|
-
|
|
Minus
|
46
|
2E
|
0010 1110
|
.
|
|
Full stop, point
|
47
|
2F
|
0010 1111
|
/
|
|
Slash, forward slash
|
|
48
|
30
|
0011 0000
|
0
|
|
Zero
|
49
|
31
|
0011 0001
|
1
|
|
One
|
···
|
|
|
|
|
|
127
|
7F
|
0111 1111
|
^?
|
DEL
|
Delete
|
|
Traditionally, in teleprinter jargon, the least significant bit (lsb) is called
channel 1 (c1), whilst the highest bit is channel 7 (c7) or, in the case of
8-bit ASCII, channel 8 (c8). This numbering scheme stems from
the telegraph era in which the five channels of the paper tape were numbered
1 to 5.
In the digital world however, it is more common to start numbering from 0 onwards, which is why in computer jargon, the bits of an 8-bit word
(i.e. a byte) are commonly referred to as b0 to b7.
|
Bit order on an 8-channel paper tape (old channel assignment shown in brackets)
|
In binary notation, each hole is represented by a '1'. The absence of a hole
is represented by a '0'.
With 8-bit data, a single character is also known as a byte. When written out in
individual bits, it is usually written with the least significant bit (lsb) — i.e.
bit 0 — at the right, as shown below. The decimal value of each bit (n) is calculated
as 2n (20 = 1, 21 = 2, ... 27 = 128). It doubles with each bit.
For example: the + sign (binary 0010 1011) has a decimal value of 32+8+2+1 = 43.
A byte can also be specified as a hexadecimal value, in which case it always consists
of two digits. This is more common in computer programming. In hexadecimal notation,
each byte is split into two nibbles of 4 bits each. Each nibble has a decimal value
between 0 and 15. In hexadecimal notation these values are represented by the
numbers 0-9 and the letters A-F, in which A = 10 and F = 15.
E.g.: the + sign (0010 1011) has the values 2 and 8+2+1 = 11,
which is written as 2B.
In practice, a hexadecimal value is commonly given a prefix or a suffix, in order
to discriminate it from a decimal value. Common prefixes are '0x' and '&'. A common
suffix is 'H' or 'h'.
The value of the + sign in the above example (2B) can thus be written as
'0x2B', or '&2B' or '2BH' or '2Bh'.
|
ISO-8859-1, also known as Latin 1, is a superset of US-ASCII, first
published in 1987 by the International Standards Organisation (ISO) as part of
the ISO/IEC 8859 series of ASCII encodings. It defines 191 characters of the
Latin script and is commonly used in the Americas, Western Europe, Oceania and most of
Africa. It is the basis for many 8-bit character sets and of the first two blocks
of characters in Unicode [2].
Characters in the 80h-9Fh range are reserved for C1 control characters [4] as defined
in ISO/IEC 6429 [5]. In most cases however, this range is used for country or
vendor specific characters. A well-known and popular variant is
Window-1252.
|
Hex
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
A
|
B
|
C
|
D
|
E
|
F
|
0
|
NUL
|
SOH
|
STX
|
ETX
|
EOT
|
ENQ
|
ACK
|
BEL
|
BS
|
HT
|
LF
|
VT
|
FF
|
CR
|
SO
|
SI
|
1
|
DLE
|
DC1
|
DC2
|
DC3
|
DC4
|
NAK
|
SYN
|
ETB
|
CAN
|
EM
|
SUB
|
ESC
|
FS
|
GS
|
RS
|
US
|
2
|
SP
|
!
|
"
|
#
|
$
|
%
|
&
|
'
|
(
|
)
|
*
|
+
|
,
|
-
|
.
|
/
|
3
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
:
|
;
|
<
|
=
|
>
|
?
|
4
|
@
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
J
|
K
|
L
|
M
|
N
|
O
|
5
|
P
|
Q
|
R
|
S
|
T
|
U
|
V
|
W
|
X
|
Y
|
Z
|
[
|
\
|
]
|
^
|
_
|
6
|
`
|
a
|
b
|
c
|
d
|
e
|
f
|
g
|
h
|
i
|
j
|
k
|
l
|
m
|
n
|
o
|
7
|
p
|
q
|
r
|
s
|
t
|
u
|
v
|
w
|
x
|
y
|
z
|
{
|
|
|
}
|
~
|
DEL
|
|
8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A
|
HS
|
¡
|
¢
|
£
|
¤
|
¥
|
¦
|
§
|
¨
|
©
|
ª
|
«
|
¬
|
|
®
|
¯
|
B
|
°
|
±
|
²
|
³
|
´
|
µ
|
¶
|
·
|
¸
|
¹
|
º
|
»
|
¼
|
½
|
¾
|
¿
|
C
|
À
|
Á
|
Â
|
Ã
|
Ä
|
Å
|
Æ
|
Ç
|
È
|
É
|
Ê
|
Ë
|
Ì
|
Í
|
Î
|
Ï
|
D
|
Ð
|
Ñ
|
Ò
|
Ó
|
Ô
|
Õ
|
Ö
|
×
|
Ø
|
Ù
|
Ú
|
Û
|
Ü
|
Ý
|
Þ
|
ß
|
E
|
à
|
á
|
â
|
ã
|
ä
|
å
|
æ
|
ç
|
è
|
é
|
ê
|
ë
|
ì
|
í
|
î
|
ï
|
F
|
ð
|
ñ
|
ò
|
ó
|
ô
|
õ
|
ö
|
÷
|
ø
|
ù
|
ú
|
û
|
ü
|
ý
|
þ
|
ÿ
|
|
Undefined in the ISO-8859-1 standard.
SP = Space, HS = Hard space (non-breaking)
|
ISO-8859-1 is known under the following names:
|
- ISO-8859-1
- ISO/IEC 8859-1
- iso-ir-100
- csISOLatin1
- Latin 1
- Latin alphabet No. 1
- l1
- IBM819
- CP819
|
Window-1252, also known as Code Page 1252 (CP-1252), is a superset of
ISO-8859-1 (Latin 1), in which most the characters
in the 80h-9Fh range are defined. Is is the default encoding for 8-bit text
on the Windows™, as well as in all modern browsers (when
ISO-8859-1 is specified) [3].
|
Hex
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
A
|
B
|
C
|
D
|
E
|
F
|
0
|
NUL
|
SOH
|
STX
|
ETX
|
EOT
|
ENQ
|
ACK
|
BEL
|
BS
|
HT
|
LF
|
VT
|
FF
|
CR
|
SO
|
SI
|
1
|
DLE
|
DC1
|
DC2
|
DC3
|
DC4
|
NAK
|
SYN
|
ETB
|
CAN
|
EM
|
SUB
|
ESC
|
FS
|
GS
|
RS
|
US
|
2
|
SP
|
!
|
"
|
#
|
$
|
%
|
&
|
'
|
(
|
)
|
*
|
+
|
,
|
-
|
.
|
/
|
3
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
:
|
;
|
<
|
=
|
>
|
?
|
4
|
@
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
J
|
K
|
L
|
M
|
N
|
O
|
5
|
P
|
Q
|
R
|
S
|
T
|
U
|
V
|
W
|
X
|
Y
|
Z
|
[
|
\
|
]
|
^
|
_
|
6
|
`
|
a
|
b
|
c
|
d
|
e
|
f
|
g
|
h
|
i
|
j
|
k
|
l
|
m
|
n
|
o
|
7
|
p
|
q
|
r
|
s
|
t
|
u
|
v
|
w
|
x
|
y
|
z
|
{
|
|
|
}
|
~
|
DEL
|
|
8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A
|
HS
|
¡
|
¢
|
£
|
¤
|
¥
|
¦
|
§
|
¨
|
©
|
ª
|
«
|
¬
|
|
®
|
¯
|
B
|
°
|
±
|
²
|
³
|
´
|
µ
|
¶
|
·
|
¸
|
¹
|
º
|
»
|
¼
|
½
|
¾
|
¿
|
C
|
À
|
Á
|
Â
|
Ã
|
Ä
|
Å
|
Æ
|
Ç
|
È
|
É
|
Ê
|
Ë
|
Ì
|
Í
|
Î
|
Ï
|
D
|
Ð
|
Ñ
|
Ò
|
Ó
|
Ô
|
Õ
|
Ö
|
×
|
Ø
|
Ù
|
Ú
|
Û
|
Ü
|
Ý
|
Þ
|
ß
|
E
|
à
|
á
|
â
|
ã
|
ä
|
å
|
æ
|
ç
|
è
|
é
|
ê
|
ë
|
ì
|
í
|
î
|
ï
|
F
|
ð
|
ñ
|
ò
|
ó
|
ô
|
õ
|
ö
|
÷
|
ø
|
ù
|
ú
|
û
|
ü
|
ý
|
þ
|
ÿ
|
|
Windows-1252 specific characters.
Undefined in the Windows-1252 standard.
|
Windows-1252 is known under the following names:
|
- Window-1252
- Code Page 1252
- CP-1252
- ansinew (LaTeX)
- WE8MSWIN1252 (Oracle)
|
|
|
Any links shown in red are currently unavailable.
If you like the information on this website, why not make a donation?
© Crypto Museum. Created: Thursday 21 May 2015. Last changed: Monday, 20 May 2024 - 06:30 CET.
|
|
|
|
|
|
| | |