Let's build a URL shortener in Go - Part III : Short Link Generator
Recall that in the previous part, we were able to set up, build and test the storage layer of our link shortener. During this part we are going to work solely on the algorithm we will be using to hash and process the initial input or the long url into a smaller and shorter mapping that corresponds to it.
When doing the choice for the algorithm we do have a number of objectives to keep in mind :
- The final input should be shorter : Maximum 8 characters
- Should be easily human readable, avoid confusing characters mix up, characters that often similar in most fonts.
- The entropy should be fairly large to avoid collision in short link generation.
III. 1. Generator Algorithm
During this implementation we are going to use two main schemes : a hash function and a binary to text encoding algorithm.
So let's go ahead and fire up our IDE or whatever editor you are currently using, we are gonna have to create 2 files, shorturl_generator.go
and shorturl_generator_test.go
, and put them under the a folder called shortener
After creating those 2 files and putting them under the folder above, our project directory structure should look like the tree below :
├── go.mod
├── go.sum
├── main.go
├── shortener
│ ├── shorturl_generator.go
│ └── shorturl_generator_test.go
└── store
├── store_service.go
└── store_service_test.go
III. 2. Shortener Implementation
Recall in the previous paragraph we talked about using a hash function and a binary to text encoding for our shortener algorithm, in this section we are going to implement those 2 basic building blocks :
III. 2. 1. SHA256
We will be using SHA256 to hash the initial inputs during the process. So let's go and open the shorturl_generator.go
file and add the SHA256 code below, here we are going to use the Golang built-in implementation of this hash function.
III. 2. 2. BASE58 .
This binary to text encoding will be used to provide the final output of the process.
People familiar with binary to text encodings might ask why we are not using Base64, which is probably one of the most popular encoding scheme of this class of algorithms.
The reasons are mostly related to the user experience, Base58 reduces confusion in character output. Satoshi Nakamoto the anonymous early developer of the Bitcoin protocol and creator of the encoding scheme gave some good reasons on why Base58.
Those reasons are still valid even today :
- The characters 0,O, I, l are highly confusing when used in certain fonts and are even quite harder to differentiate for people with visuality issues .
- Removing ponctuations characters prevent confusion for line breakers.
- Double-clicking selects the whole number as one word if it's all alphanumeric.
As we now understand why we are using BASE58 encoding, let's go ahead and add its implementation.
First thing first, let's load the base58 dependency library :
Now, let's add the implementation in the code.
III. 2. 3. Final Algorithm
The final algorithm will be super straightforward now as we now have our 2 main building blocks already setup, it will go as follow :
- Hashing
initialUrl + userId
url with sha256. HereuserId
is added to prevent providing similar shortened urls to separate users in case they want to shorten exact same link, it's a design decision, so some implementations do this differently. - Derive a big integer number from the hash bytes generated during the hasing.
- Finally apply base58 on the derived big integer value and pick the first 8 characters
At this stage we have now completed the implementation part, next step should be about testing.
III. 3. Shortener Unit Tests
Our algorithm implementation is now fully complete, let's do unit testing as it should, for any custom logic/algorithm implementation.
In the top paragraph above we created our test file shorturl_generator_test.go
, we will be now filling it with some pieces of unit tests.
Running the tests above and should all pass (✅).
III. 4. Conclusion & Next Steps
During this part III, we were able to cover the implementation of our shortener algorithm and the choices behind the design decisions in hash function and byte to text encoder.
Next part we will covering the section where we will expose an Rest API endpoint for shortening and handle the actual URL forwarding.
Ciao ! 👋