ASCII (American Standard Code for Information Interchange)allows 128 characters to be encoded as 7 bit binary numbers. The 128 characters are from the latin script and include uppercase and lowercase letters, numbers, punctuation and some symbols can control characters. The limited number of characters means that documents that use ASCII take up less storage. However, the basic latin script means that only few languages can be represented.


Unicode builds on the ASCII concept and allows many more characters from different scripts and even emojis to be represented. Unicode can represent over 128,000  different characters and is regularly updated to include more. Because so many different options are available, documents that use Unicode take up considerably more storage space.


Thanks Vincent and Harrison for spotting the ‘deliberate’ errors on this page.

%d bloggers like this: