base64

Prerequisites

Binary And Text data Show

base64

Introduction

  • Data can be split into two types: binary and text.
  • However some mediums only accept text data, for example:
    • A database column which is set to store a string.
    • A markup language, such as HTML, which only allows text data as the source of an image.
    • A protocol which only allows the transfer of text data, e.g. data inside an XML node.
  • If we try and interpret binary data as text data then errors will occur.
    • This is because some bytes will be interpreted as control characters,
    • e.g. 0x00
    • will be treated as an end of string marker.
  • Base64 is a method of encoding binary data such that when it is interpreted as text data, only printable characters will be found.-

Encoding Base64 Data

  • Lets start by encoding the following bytes: 0x12, 0x15, 0x15, 0x01, 0x66, 0x22
  • We can represent these in binary, using a pipe character to seperate each byte: byte data split into groups of 8
  • In the above example the bits where grouped into blocks of 8, however there is no reason why we can't change the size of each group. E.g. we can change them to be groups of 6: byte data split into groups of 6
  • If we convert these back into a numbers: decimal versions of the base64 numbers We can define a few properties of these numbers:
    1. The smallest possible value is 000000 = 0.
    2. The largest possible value is 111111 = 63.
    3. The number of distinct values is 26 = 64.
  • We have gone from blocks of 8 bits, with 256 possible values, down to blocks of 6 bits, with 64 possible values. The important thing about this is that each one of the 64 values can be given a unique printable character:
    • If the value is between 0 and 25 (inclusive): the character is between A and Z (uppercase).
    • If the value is between 26 and 51 (inclusive): the character is between a and z (loewrcase).
    • If the value is between 52 and 61 (inclusive): the character is between 0 and 9.
    • Finally, for the last two values, we use a '/' and a '+':
      • 62 = /
      • 63 = +
  • Assigning a printable character to each block of 6 bits gives: ascii versions of the base64 characters
  • And finally we can join these together, to get a base64 string of: EhUVAWYi

Padding

  • The above example worked well because there were 48 bits in total. This could be split into 8 blocks of 6 bits.
  • However, if your total number of bits is not divisible by 6 then you will need to add some padding.
  • For example, if we encode: 0x12, 0x15, 0x15, 0x01, 0x66
  • We get: 5 bytes displayed as binary in groups of 8
  • If we split group these bits into blocks of 6 bits we get: 5 bytes displayed as binary in groups of 6
  • The last group only has 4 bits
  • When converting to text, a = is added to the end of the string in place of the missing 4 bits: EhUVAWY=
  • Multiple padding characters might be used:
    • For example: 0x12, 0x15, 0x15, 0x01
    • Is encoded to: EhUVAQ==
  • To calculate the number of padding characters needed:
    • When encoding bytes, 3 bytes produce 4 characters.
    • So the length of the final string will be a multiple of 4.
    • Padding characters are added to make ensure this.
    • I.e. if the string is 6 characters, then two padding characters are added.

code (Python)

hex = '121515016622'
encoded = base64.b64encode(bytearray.fromhex(hex))
print(base64) # prints 'EhUVAWYi'

decoded = base64.b64decode(encoded).hex()
print(decoded) # prints '121515016622'

Size Increase

  • Converting to base64 increases the size of the data.
  • Every 3 bytes need 4 characters to represent them.
  • The size will therefore increase by 1/3.