Click for homepage
Standard
Computer
Crypto
  
← ITA-2
    
ITA-5   ASCII
ITA-5, colloquially known as ASCII, is a 7-channel (later: 8-channel) binary character-encoding scheme, derived from the older 5-channel ITA2 standard (also known as Murray code or Baudot). ASCII is the abbreviation of American Standard Code for Information Interchange. The standard defines the 128 codes that can be made with 7 bits, all based on the English (Latin) alphabet. As there are variations between different countries, the common standard is known as US-ASCII. It was also the major encoding scheme on the internet, until it was surpassed by UTF-8 in 2007.

Compared to the 5-channel ITA2 standard (Baudot-Murray), where the characters are sorted in such a way that they cause minimum (mechanical) stress on the equipment, characters in the ASCII table are sorted in the logical order of the alphabet. ASCII is commonly used by computers for storing programs (software) and information (data). Initially, computer programs and data were stored on 7-channel punched paper tape. The first ASCII standard was published in 1963, with a major revision in 1967. The most recent update was published in 1986. Initially, ASCII was a 7-bit format, with the 7 channels on the punched paper tape numbered from 1 to 7 (c1-c7).

For error checking, a so-called parity bit (P) was later added to the tape. This increased the num­ber of channels (now called bits) to 8. Bits are generally are numberd from 0 onwards (b0-b7). In many com­puter systems, the parity bit was later dropped in order to increase to total number of possible characters from 128 to 256. The ASCII standard defines only the bottom 128 characters.


Many different encoding schemes for 8-bit data exist, but the majority of them uses ASCII to de­fine the lower 128 characters. The upper half – commonly known as the top-bit-set characters – are often used for language-dependent encodings, such as ISO-8859-1 (Latin1) and its Microsoft variant Windows-1252. The definition of the lower 128 ASCII characters is given below.

Hex0123456789ABCDEF
0NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2SP!"#$%&'()*+,-./
30123456789:;<=>?
4@ABCDEFGHIJKLMNO
5PQRSTUVWXYZ[\]^_
6`abcdefghijklmno
7pqrstuvwxyz{|}~DEL
The first 32 characters are unprintable. They are known as the control characters and are mainly used for text formatting on teleprinters and on the first generations of video terminals. Character 127 is also special. It is used to delete a character, which – in 7-channel paper tape – was done by punching all 7 holes. Control characters are often written as ^@ (NUL), ^A (SOH), ^B (STX), etc.

DecHexBinaryCharNameDescription
Control characters (C0)
0000000 0000^@NULNull character (nothing)
1010000 0001^ASOHStart of header
2020000 0010^BSTXStart of text
3030000 0011^CETXEnd of text
4040000 0100^DEOTEnd of transmission
5050000 0101^EENQEnquiry
6060000 0110^FACKAcknowlegment
7070000 0111^GBELBell (acoustic signal)
8080000 1000^HBSBackspace
9090000 1001^IHTHorizontal Tab
100A0000 1010^JLFLine feed
110B0000 1011^KVTVertical Tab
120C0000 1100^LFFForm feed (clear screen)
130D0000 1101^MCRCarriage return (enter)
140E0000 1110^NSOShift Out
150F0000 1111^OSIShift In
16100001 0000^PDLEData Link Escape
17110001 0001^QDC1Device Control 1 (XON)
18120001 0010^RDC2Device Control 2
19130001 0011^SDC3Device Control 3 (XOFF)
201400010100^TDC4Device Control 4
21150001 0101^UNAKNegative Acknowledgment
22160001 0110^VSYNSynchronous idle
23170001 0111^WETBEnd of Transmission Block
24180001 1000^XCANCancel
25190001 1001^YEMEnd of Medium
261A0001 1010^ZSUBSubstitute (also: End of File, EOF)
271B0001 1011^[ESCEscape
281C0001 1100^\FSFile Separator
291D0001 1101^]GSGroup Separator
301E0001 1110^^RSRecord Separator
311F0001 1111^_USUnit Separator
Printable characters
32200010 0000 SPSPACE character (printable)
33210010 0001! Exclamation mark (pling)
34220010 0010" Quote
35230010 0011# Hash
36240010 0100$ Dollar
37250010 0101% Percent
38260010 0110& Ampersand
39270010 0111' Apostrophy
40280010 1000( Left bracket
41290010 1001) Right bracket
422A0010 1010* Star
432B0010 1011 +  Plus
442C0010 1100, Comma
452D0010 1101- Minus
462E0010 1110. Full stop, point
472F0010 1111/ Slash, forward slash
48300011 00000 Zero
49310011 00011 One
···          
1277F0111 1111^?DELDelete

Bit order
Traditionally, in teleprinter jargon, the least significant bit (lsb) is called channel 1 (c1), whilst the highest bit is channel 7 (c7) or, in the case of 8-bit ASCII, channel 8 (c8). This numbering scheme stems from the telegraph era in which the five channels of the paper tape were numbered 1 to 5. In the digital world however, it is more common to start numbering from 0 onwards, which is why in computer jargon, the bits of an 8-bit word (i.e. a byte) are commonly referred to as b0 to b7.

Bit order on an 8-channel paper tape (old channel assignment shown in brackets)

In binary notation, each hole is represented by a '1'. The absence of a hole is represented by a '0'. With 8-bit data, a single character is also known as a byte. When written out in individual bits, it is usually written with the least significant bit (lsb) — i.e. bit 0 — at the right, as shown below. The decimal value of each bit (n) is calculated as 2n (20 = 1, 21 = 2, ... 27 = 128). It doubles with each bit. For example: the + sign (binary 0010 1011) has a decimal value of 32+8+2+1 = 43.


A byte can also be specified as a hexadecimal value, in which case it always consists of two digits. This is more common in computer programming. In hexadecimal notation, each byte is split into two nibbles of 4 bits each. Each nibble has a decimal value between 0 and 15. In hexadecimal no­tation these values are represented by the numbers 0-9 and the letters A-F, in which A = 10 and F = 15. E.g.: the + sign (0010 1011) has the values 2 and 8+2+1 = 11, which is written as 2B.


In practice, a hexadecimal value is commonly given a prefix or a suffix, in order to discriminate it from a decimal value. Common prefixes are '0x' and '&'. A common suffix is 'H' or 'h'. The value of the + sign in the above example (2B) can thus be written as '0x2B', or '&2B' or '2BH' or '2Bh'.


ISO-8859-1   Latin 1
ISO-8859-1, also known as Latin 1, is a superset of US-ASCII, first published in 1987 by the International Standards Organisation (ISO) as part of the ISO/IEC 8859 series of ASCII encodings. It defines 191 characters of the Latin script and is commonly used in the Americas, Western Europe, Oceania and most of Africa. It is the basis for many 8-bit character sets and of the first two blocks of characters in Unicode [2]. Characters in the 80h-9Fh range are reserved for C1 control characters [4] as defined in ISO/IEC 6429 [5]. In most cases however, this range is used for country or vendor specific characters. A well-known and popular variant is Window-1252.

Hex0123456789ABCDEF
0NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2SP!"#$%&'()*+,-./
30123456789:;<=>?
4@ABCDEFGHIJKLMNO
5PQRSTUVWXYZ[\]^_
6`abcdefghijklmno
7pqrstuvwxyz{|}~DEL
8                
9                
AHS¡¢£¤¥¦§¨©ª«¬­®¯
B°±²³´µ·¸¹º»¼½¾¿
CÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
DÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
Eàáâãäåæçèéêëìíîï
Fðñòóôõö÷øùúûüýþÿ
 
Undefined in the ISO-8859-1 standard.
 
SP = Space, HS = Hard space (non-breaking)

Nomenclature
ISO-8859-1 is known under the following names:

  • ISO-8859-1
  • ISO/IEC 8859-1
  • iso-ir-100
  • csISOLatin1
  • Latin 1
  • Latin alphabet No. 1
  • l1
  • IBM819
  • CP819
Windows-1252
Window-1252, also known as Code Page 1252 (CP-1252), is a superset of ISO-8859-1 (Latin 1), in which most the characters in the 80h-9Fh range are defined. Is is the default encoding for 8-bit text on the Windows™, as well as in all modern browsers (when ISO-8859-1 is specified) [3].

Hex0123456789ABCDEF
0NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2SP!"#$%&'()*+,-./
30123456789:;<=>?
4@ABCDEFGHIJKLMNO
5PQRSTUVWXYZ[\]^_
6`abcdefghijklmno
7pqrstuvwxyz{|}~DEL
8ƒˆŠŒŽ
9˜šœžŸ
AHS¡¢£¤¥¦§¨©ª«¬­®¯
B°±²³´µ·¸¹º»¼½¾¿
CÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
DÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
Eàáâãäåæçèéêëìíîï
Fðñòóôõö÷øùúûüýþÿ
 
Windows-1252 specific characters.
 
Undefined in the Windows-1252 standard.

Nomenclature
Windows-1252 is known under the following names:

  • Windows-1252
  • Code Page 1252
  • CP-1252
  • ansinew (LaTeX)
  • WE8MSWIN1252 (Oracle)
Data format
ITA5 is a serial protocol, which means that each data word is sent one bit at a time, starting with the least significant bit (b0). The signal can be '0' (low, SPACE) or '1' (high, MARK). Each character consists of 7 or 8 data bits, preceeded by one start bit and succeeded by one or two stop bits. Furthermore, a parity bit can be inserted for even or odd parity. Here are some examples:

9 bits: 1 start-bit, 7 data-bits, no parity, 1 stop-bit (7N1)




10 bits: 1 start-bit, 7 data-bits, 1 parity-bit, 1 stop-bit (7E1 or 7O1)




10 bits: 1 start-bit, 8 data-bits, no parity, 1 stop-bit (8N1)




11 bits: 1 start-bit, 7 data-bits, 1 parity-bit, 2 stop-bits (7E2 or 7O2)




11 bits: 1 start-bit, 8 data-bits, 1 parity-bit, 1 stop-bit (8E1 or 8O1)



References
  1. Wikipedia, ASCII
    Visited 19 May 2024.

  2. Wikipedia, ISO/IEC 8859-1
    Visited 19 May 2024.

  3. Wikipedia, Windows-1252
    Visited 19 May 2024.

  4. Wikipedia, C0 and C1 control codes
    Visited 19 May 2024.

  5. Wikipedia, ISO/IEC 6429
    Visited 19 May 2024.
Further information
Other websites
Any links shown in red are currently unavailable. If you like the information on this website, why not make a donation?
© Crypto Museum. Created: Thursday 21 May 2015. Last changed: Thursday, 30 January 2025 - 16:18 CET.
Click for homepage