# base64

## Prerequisites

### base64

#### Introduction

• Data can be split into two types: binary and text.
• However some mediums only accept text data, for example:
• A database column which is set to store a string.
• A markup language, such as HTML, which only allows text data as the source of an image.
• A protocol which only allows the transfer of text data, e.g. data inside an XML node.
• If we try and interpret binary data as text data then errors will occur.
• This is because some bytes will be interpreted as control characters,
• e.g. `0x00`
• will be treated as an end of string marker.
• Base64 is a method of encoding binary data such that when it is interpreted as text data, only printable characters will be found.-

#### Encoding Base64 Data

• Lets start by encoding the following bytes: `0x12, 0x15, 0x15, 0x01, 0x66, 0x22`
• We can represent these in binary, using a pipe character to seperate each byte:
• In the above example the bits where grouped into blocks of 8, however there is no reason why we can't change the size of each group. E.g. we can change them to be groups of 6:
• If we convert these back into a numbers: We can define a few properties of these numbers:
1. The smallest possible value is `000000` = 0.
2. The largest possible value is `111111` = 63.
3. The number of distinct values is 26 = 64.
• We have gone from blocks of 8 bits, with 256 possible values, down to blocks of 6 bits, with 64 possible values. The important thing about this is that each one of the 64 values can be given a unique printable character:
• If the value is between 0 and 25 (inclusive): the character is between A and Z (uppercase).
• If the value is between 26 and 51 (inclusive): the character is between a and z (loewrcase).
• If the value is between 52 and 61 (inclusive): the character is between 0 and 9.
• Finally, for the last two values, we use a '/' and a '+':
• 62 = /
• 63 = +
• Assigning a printable character to each block of 6 bits gives:
• And finally we can join these together, to get a base64 string of: EhUVAWYi

• The above example worked well because there were 48 bits in total. This could be split into 8 blocks of 6 bits.
• However, if your total number of bits is not divisible by 6 then you will need to add some padding.
• For example, if we encode: `0x12, 0x15, 0x15, 0x01, 0x66`
• We get:
• If we split group these bits into blocks of 6 bits we get:
• The last group only has 4 bits
• When converting to text, a `=` is added to the end of the string in place of the missing 4 bits: `EhUVAWY=`
• Multiple padding characters might be used:
• For example: `0x12, 0x15, 0x15, 0x01`
• Is encoded to: EhUVAQ==
• To calculate the number of padding characters needed:
• When encoding bytes, 3 bytes produce 4 characters.
• So the length of the final string will be a multiple of 4.
• I.e. if the string is 6 characters, then two padding characters are added.

#### code (Python)

``````hex = '121515016622'
encoded = base64.b64encode(bytearray.fromhex(hex))
print(base64) # prints 'EhUVAWYi'

decoded = base64.b64decode(encoded).hex()
print(decoded) # prints '121515016622'``````

#### Size Increase

• Converting to base64 increases the size of the data.
• Every 3 bytes need 4 characters to represent them.
• The size will therefore increase by 1/3.