Rusly: Rust URL Shortener System Design

Rusly: Rust URL Shortener System Design

Rusly is a URL shortener built using the Rocket framework in Rust.

Checkout Rusly
Checkout the Rusly GitHub repo

This blog lays out the system design thought process I used while designing and developing the system.

Read about the Rust syntactical development details here.


Why Rust?

I decided to use Rust due to the following reasons:

  1. I wanted to implement my learning and make a project

  2. I wanted to implement my system design knowledge

  3. Rust promises features like fearless concurrency and memory safety. I wanted to see their viability when implementing production-grade systems.


API Endpoints

Let's take a quick look at the available endpoints

RouteDescriptionRequest BodyResponse
/ GETThe default home routeNoneHello Heckerr
/v1/shorten POSTTakes in the URL to shorten and returns the shortened URL.- url_to_shorten: String url to shorten- shortened_url: A shortened string URL.
- custom_link: Optional, strictly 7-character alphabetic custom shortened URL string- error: Error message string
/<short-url GETPermanently redirects to the specified short URL stringNoneNone

Why only 2 request params for /shorten?

I studied the UX of some popular URL shortener sites and decided to stick to the core purpose of just letting the user shorten the URL along with a custom URL for convenience.

What happens when a request to shorten a URL hits?

  1. The URL validity is checked

  2. If custom_url param is passed, its validity is checked

  3. A random string of 7 alphabetic characters is generated

  4. The string generated in step 3, the URL to shorten is stored in the database along with the URL to shorten and a UNIX timestamp


Database

Currently, the SQLite database is used as an embedded database.

Database Schema

id

VARCHAR(7) PRIMARY KEY

fullUrl

VARCHAR(1024) NOT NULL

timestamp

INTEGER NOT NULL

Schema Description

id: The randomly generated string acts as the shortened URL string. This being a primary key ensures that there are no duplicate short URLs.

In the case of custom short URLs, it won't be allowed either.

Since the length of the short URL is 7 characters, there are over 8 billion possible combinations that are sufficient and uniqueness is maintained by the primary key.

fullUrl: The full URL string. Is fetched and the user is redirected permanently.

timestamp: The UNIX timestamp of entry. It is provided in the Rust code in the insert query. The commit time of the record and this timestamp may vary.


Why not store the full short URL?

To save storage space, only the short string is stored.

The host URL may change or vary and it'll cause complexities then.

Additional database update operations and compute will be required to update the fully stored short URL.

If this operation fails midway, data inconsistencies will arise.

By only storing the short URL string, a change in the host URL can be handled.

If the service is modified in the future, existing URLs can still be inculcated

Why SQLite?

SQLite is an excellent embedded database and considering the minimal database schema with only one table, it is a great choice.

It takes away the need for managing a dedicated database server. Which saves cost.

The following text is from the docs on when to use SQLite

Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.

SQLite will handle the load

Learn more about SQLite performance here.

Scaling SQLite

Potential Issues with Scaling

Replicated scaling of SQLite seems complex to me and can cause data inconsistencies and data synchronization issues.

If SQLite is set up in a replicated environment then each replica will have its own database and data copies.

If a /shorten write request is handled by replica A and subsequent read request for that data goes to the replica B then the user will face inconsistency issues, or the requests need to be served by the same replica which can overload a single replica.

Potential Solution for Scaling

If SQLite needs to be scaled then one possible approach is that the database file can be put on shared network storage and mounted in the replicas and the backend rust replicas can access it over the network.

This will increase response time as the file needs to be accessed over a network.

Each replica can have its own cache to avoid network database read calls.

But again, the caching database is yet another additional complex component to manage.


Understanding the random short URL string

The random short URL is a 7 lowercase characters alphabetic string.

Before this, I considered ULIDs and UUIDs to be used as short url strings but didn't proceed due to the following factors.

Both ULID and UUID are 128-bit long, the generated full-length string cannot be used as a short URL. If a substring is used then the computation spent on generating these full-length strings is wasted.

Both UUID and ULID strings consist of alphanumeric characters which didn't seem ideal for use as a short URL.

Personally, I feel there's nothing wrong with using an alphanumeric short string, but upon examining popular URL shorteners, I noticed that they don't use alphanumeric characters and thus decided to do the same. My best guess is that they do this from the perspective of user experience.

Existing methodology for generating random string.

Following is a function that generates alphabetic characters of a given length. It's Rust code but fairly easy to understand.

It is provided with a collection of alphabetic characters and it generates a random index and maps it to the equivalent character.

fn generate_shortened_url(length: usize) -> String {
    const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz";

//Run the loop for n times
    (0..length)
       .map(|_| {
//generate a random number between 1 - charset.length
            let index = rand::thread_rng().gen_range(1..CHARSET.len());

//Pickout a character from the charset with the randomly generated index
            CHARSET[index] as char
        })
        .collect()
}

Performance: ULID/UUID vs Random String Generator

The performance of generating a 7-character random alphabetic string using the Rust function is faster than generating a UUID or a ULID.

This is because the function generates a random string by selecting characters from a pre-defined character set, which involves a simple operation of selecting a random index from the character set and converting it to a character.

In contrast, generating a UUID/ULID involves a more complex process of generating a random 128-bit value and encoding it in a specific format.


Current Deployment of Rusly

Rusly is currently deployed on Railway directly from its GitHub repo

Performance Metrics

Metrics are on their way. If you help with benchmarking then reach out to me.


Conclusion

Kudos to you that you read till the end. That's all for this blog.

I can go on and on about more system designing, reach out to me if you got something in mind.

Check out my other blogs at blog.wilfredalmeida.com/

Did you find this article valuable?

Support Wilfred Almeida by becoming a sponsor. Any amount is appreciated!