How To Model Relationships In DynamoDB Without Joins

This is one of the most crucial element of DynamoDB table design.

Oct 30, 2024

The reason why modeling relationships is important in DynamoDB (and in most NoSQL databases), is because of the lack of joins.

However, NoSQL doesn’t actually lack joins — NoSQL doesn’t need joins.

Let me explain.

With SQL, we typically store each data entity in its own table and use different join operations to query related data together. This is the core tenet of SQL.

However, with DynamoDB we have no possibility of making a join operation.

So instead we must pre-join this data.

In DynamoDB we typically model our data in such a way that we do not have a need for joins.

In essence, NoSQL databases aren’t missing joins, they don’t need them!

And in this article, I’ll demonstrate why that is.

DynamoDB’s Data Model

https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

DynamoDB items (similar to an SQL record) have 2 main differences than SQL records.

1. Primary keys

A primary key is made up of a partition key and optionally a sort key. If the sort is specified, the primary key becomes a composite key.

The core difference lies in the purpose of the primary keys. A partition key is defined by the user and dictates in what partition an item will be stored.

This allows your items in a DynamoDB table to scale horizontally. It also enables low-latency data access.

The sort key is what distinguishes multiple items that share the same partition key. Using the sort key, you can enable powerful filtering capabilities.

I cover this in detail in this article.

2. Attributes Flexibility

The second core difference between a DynamoDB item compared to an SQL record is the flexibility of its attributes.

In SQL, a record’s attributes are limited to scalar data types (string, number, boolean, etc).

In DynamoDB, an item can have attributes made up of lists, maps, sets, and binary and scalar data types as well.

This allows you to store more complex data like objects and arrays and enables the practice of denormalizing complex data.

Techniques for Modeling Relationships Without Joins

Pre-joining with Composite Keys

Imagine an e-commerce application.

We need to fetch orders made by a certain user.

In SQL, you would join the “users” table with the “orders” table with a foreign key. The orders table would reference the users table with a “userID” attribute.

In DynamoDB, instead, we can “pre-join” the data beforehand and then query it all together.

We can store a user item by defining the partition and sort key values as the user’s ID (e.g. user#101).

That item can contain all of the data relevant to that user such as name, email, address, etc.

Then under the same partition key (user#101), we can add an order item by defining the sort key as the ID of the order (e.g. order-301).

Every time that user places an order on our e-commerce store, we write a new order item under the user’s partition key and generate a random orderID for the sort key.

And like this, we’ve enabled a one-to-many relationship data model.

This enables much faster queries than what SQL can offer and simple data modeling.

We can query for all orders by a given userID and perform further filtering on those orders (see previously linked article).

Using GSIs for Many-to-Many Relationships

Imagine you have an e-learning application. You have two entities: students, and courses.

Say you need to model many-to-many relationships between students and courses.

For example, you want to perform the following queries:

Get all courses that a given student is enrolled in
Get all students enrolled in a given course

In SQL, you would again join both tables and get a third joined table you can use to satisfy the above queries.

In DynamoDB, we can use what is called a “reverse lookup” using our base table and a secondary index.

Our base table will contain our primary data access pattern — which is getting all courses by a user.

Hence we store denormalized courses inside a a user partition like so:

This table will let us satisfy the first query: get all courses that a given student is enrolled in.

(notice how there are just 2 students here and they are both enrolled in course “COURSE#301”).

Now to satisfy the second access pattern: “get all students enrolled in a particular course”, we simply need to inverse the pk and the sk.

We can do this by creating a global secondary index and using the sk as the pk and the pk as the sk.

Then our GSI table will look like this:

Now we can easily query for all students enrolled in COURSE#301 for example.

Note that, while SQL may have a simpler method, performing joins are slow and do not scale well. The method used above in DynamoDB will offer the same low latency at any scale.

Embedding Denormalized Data

The last strategy for modeling relationships that we’ll look at in this article is about denormalizing data.

In SQL, it is recommended to normalize data across tables. This ensures data consistency and synchronization.

A single source of truth is always best.

However, this idea comes from the fact that when SQL was growing storage was expensive, hence optimizing storage was critical

With DynamoDB today, storage is cheap and we do not need to focus on optimizing speed. Instead, we focus on optimizing compute power.

Therefore normalizing data is an effective way of modeling relationships in data without needing join operations.

Say we want to query for a user’s addresses (they may have several).

With SQL we have no choice but to create an addresses table and join that table with the users table.

But that is a costly operation just to fetch something as simple as a user’s addresses.

In DynamoDB, we can denormalize this addresses data into an array and store it directly in the user’s item.

We can do the same to store user orders and transactions in an e-commerce application.

There’s a lot to be said on when you should and shouldn’t denormalize data in DynamoDB.

If you want to learn more about this, I encourage you to read this post.

Conclusion

In DynamoDB, modeling relationships without joins is not a limitation but an opportunity to design for scalability and performance.

By pre-joining data through composite keys, leveraging GSIs for many-to-many relationships, and embedding denormalized data, we can create efficient and highly optimized access patterns that are designed for DynamoDB, ensuring low-latency queries at any scale.

👋 My name is Uriel Bitton and I’m committed to helping you master Serverless, Cloud Computing, and AWS.

🚀 If you want to learn how to build serverless, scalable, and resilient applications, you can also follow me on Linkedin for valuable daily posts.

Thanks for reading and see you in the next one!

The Serverless Spotlight

Discussion about this post