This is the interesting part where the actual URL encoding and decoding happens.
Before we jump into the code development, let's explore the basic theory behind URL shortening. Our requirement is as follows:
Given a long URL such as http://stackoverflow.com/questions/tagged/node.js, generate a unique URL that is shorter, such as localhost:3000/3Ys.
First, let's look at actually generating a unique key from our original URL, in this case 3 years, which is at the end of our localhost URL.
To satisfy the uniqueness requirement, we can use a special class of functions called bijective (https://en.wikipedia.org/wiki/Bijection) functions, which guarantee a 1-to-1 mapping. For explanation purposes, a bijective function basically says:
We are going to use a base58 algorithm to fulfil our URL shortening and decoding task.
We are going to convert a unique integer ID (which is in base10) to its equivalent in base58. The base58 alphabet we will be using is as follows:
123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ
It is just the numbers 1-9, a-z, and A-Z, giving us a total of 58 characters, hence the 58 in base58. We are not including 0, l, or O to avoid confusion when sharing the URL over the phone or copying it manually.
Now, to put it all together to generate a shorter URL, our solution will be as follows:
- Create a global auto incremented integer
- Every time a new URL is shortened and added to our database, we'll increment that global number (base 10) and use it as our unique ID for that entry in the DB
- Then, we will base58 encode that unique ID to generate a unique, shorter URL
For example, an entry with the unique ID 100022 (base 10) will result in a base58 encoding of 3Ys. So if you store a lot of URLs, say 100,000,000, they would generate a shortened hash:
9QwvW
Let's develop the code!