Python: Encode UTF-8 to Base64

The Hard Way

nick3499
2 min readSep 18, 2020
Screen capture from IDE

The Python class above converts a text string from UTF-8 to Base64, but perhaps you wanted to get down to the nitty-gritty.

A single Base64 digit represents 6 binary bits. For example, lowercase a is represented by index number 26, and binary number 011010, since 0 + 16 + 8 + 0 + 2 + 0 = 26.

For this example, Base64’s 6-bit binary numbers will be separated out of three concatenated 8-bit binary numbers which represent the UTF-8 characters in the string abc. For example, when the binary numbers 01100001 01100010 01100011 concatenated to 011000010110001001100011 then separate out to 011000 010110 001001 100011.

In the Python interactive shell, the string abc can be manually converted to 6-bit numbers:

>>> ord('a')
97
>>> ord('b')
98
>>> ord('c')
99

Since Python’s bin() builtin removes leading zeros, the binary numbers need to be padded with leading zeros in order to end up with eight digits each. Below, three binary numbers are concatenated:

>>> bin(97)[2:].rjust(8, '0') + bin(98)[2:].rjust(8, '0') + bin(99)[2:].rjust(8, '0')
'011000010110001001100011'

Then that long binary string needs to be split up into 6-bit pieces, and then the separated 6-bit binary numbers can be converted into characters from a Base64 table:

  • Y = 011000
  • W = 010110
  • J = 001001
  • j = 100011

That is how UTF-8 abc converts to Base64 YWJj, the hard way.

--

--