The 7 DynamoDB Sins
Avoid these 7 crucial bad practices to learn how to build scalable and efficient databases with DynamoDB.
Doyou often find yourself fighting with DynamoDB, getting surprise costs or struggling to craft the perfect query?
I’ve seen many bad uses of DynamoDB.
Some people treat it as an SQL database, others like a regular document-oriented NoSQL database.
To be able to use DynamoDB to its fullest potential — infinite scalability, low latency, and high availability — you need to understand how it works.
Often times this starts with understanding the bad practices — or sins — as I call them.
Being aware of these and avoiding them will empower you to build powerful, scalable, and highly efficient databases for your applications.
Here are my 7 DynamoDB sins.
1. Ignoring Partition Key Design (Pride)
This is perhaps the most important thing to get right with DynamoDB.
If you don’t, you’re doing everything wrong.
Why?
DynamoDB’s entire premise relies on primary key design.
The only reason it’s hyper-scalable and can support millions or billions of reads per second is dependent on your primary key design.
Neglecting that will reduce it to worse than a typical NoSQL database.
By carefully designing the partition key, you’ll allow DynamoDB to partition your data, automatically sharding it for rapid data access.
This is where high cardinality comes into play and is very important.
By carefully designing the sort key, you’ll enable powerful filtering and hierarchical data access patterns.
This is also why the single table design can be so powerful and hyper-efficient.
I discuss more about primary key design and how to implement it in this article.
Remember, if you’re not carefully designing your primary keys like this, using DynamoDB will do you more harm than good.
2. Querying Without Projection Expressions (Greed)
The second DynamoDB sin.
You’re probably missing this one from your skillset.
Not using projection expressions will lead to higher costs and higher latency.
But what are projection expressions?
When you query for data in your DynamoDB table, sometimes your items can have a lot of data.
Most of the time you are not using all of the data returned.
Take this example:
If you query a user item for their userID, you do not need all of the data that is stored in the user.
Like email, name, address (which may be large), and other data attributes.
Fetching all this data every time you need only a single or few attributes, leads to waste and drives up costs.
Using projection expressions eliminates this.
You simply specify the attributes you want to get back in the query, making your queries super efficient (similar to GraphQL).
You get back nothing more and nothing less.
Want to learn how projection expressions work?
Find out more here.
3. Abusing Scans Instead Of Queries (Lust)
The third DynamoDB sin.
The Scan method lets you have unbounded querying and filtering capabilities. However, most of the time you don’t want to use it.
But let’s understand why.
Scans ignore DynamoDB’s entire premise of querying efficiency.
Data in DynamoDB is partitioned across nodes and structured as B-trees inside each of these nodes. This enables super-fast data access for any item — O(log n).
The Query method respects this and fetches the item by its partition and traverses the B-tree inside the partition for fast querying.
The Scan method on the other hand ignores all of these concepts.
It “scans” every item from the start all the way to the end and returns the items that match the Scan query.
The disadvantages:
higher query latency
higher costs to query more items
very slow for large data sets
But should you always be using the Query method and never use Scan?
Not really. There are a few fitting use cases when it's preferable to use Scan (like in this app I recently built).
I explain the use cases for Scans and Query in this article.
4. Mimicking Relational Databases (Envy)
The 4th DynamoDB sin.
DynamoDB isn’t a relational database.
There are many design concepts developers use that are anti-patterns in DynamoDB.
If you’re guilty of any of these below, you need to fix them now.
These anti-patterns are:
Normalizing data: Data in DynamoDB should be denormalized for the most part. Relationships between data should be stored on the same table. There are no joins in Dynamo so all queries should be self-contained
Misunderstanding Primary Key design: Primary key design is drastically different in Dynamo than SQL. Primary keys should be designed for specific data access patterns and are responsible for scalability and performance, unlike SQL.
Using Filters instead of Indexing: Using FilterExpressions has a caveat — when filtering to narrow down results, DynamoDB first retrieves matching items and then applies the filter afterward. Instead of filters, you should be relying on secondary indexes for filtering data.
Ignoring DynamoDB’s Event-Driven nature: You shouldn’t be using DynamoDB as a static database like in SQL, make use of Streams to react to events. This makes your data more Dynamic by enabling event-driven capabilities.
Overusing Multi-item transactions: While DynamoDB can support transactions, it isn’t ideal for complex multi-item transactions like in SQL. Instead, learn to model your data to avoid requiring complex transactions.
There are many more but those are the anti-patterns most seen in my experience.
5. Over-Provisioning Read/Write Capacity (Gluttony)
The 5th DynamoDB sin.
Many DynamoDB users are guilty of this one.
There are two issues here:
When to use On-demand vs Provisioned Capacity mode.
Not right-sizing capacity based on evaluated traffic.
Some developers will aim to avoid throttling and over-provision their DynamoDB tables.
This is never a good solution as it leads to wasting resources and a high monthly bill. Instead, focus on more efficient primary key design.
High cardinality will avoid hot partitions which in turn will help you avoid over-provisioning.
Other solutions such as caching (DAX or client-side) can help reduce the need for excessive capacity provisioning.
The immediately obvious solution is of course using on-demand capacity mode. But you have to know when it is appropriate to use.
Use on-demand mode when you don’t know the traffic patterns of your application or they tend to be spiky at random times.
However, using on-demand mode comes at a cost. It is usually more beneficial to understand your traffic pattern and provision capacity accordingly.
How can you understand your app’s traffic behaviour?
Metrics! Monitor your DynamoDB tables for patterns, identify high and low tides.
From there you can make the best provisioning estimate.
6. Ignoring DynamoDB Limitations (Wrath)
The 6th DynamoDB sin.
DynamoDB isn’t ideal for storing large items, here’s why:
DynamoDB’s promise of low latency at any scale relies on the intentional limit it imposes of 400KBs for one item (record).
While it may not seem like much compared to MongoDB’s 16MB limit or Cassandra’s 16GB, it is enough for most use cases.
But other databases don’t promise the single-digit ms latency. To optimize for latency and costs, use smaller item sizes.
Here are some tips:
store blobs/large content in S3
shorten attribute names for data at scale
don’t store null values (top level attributes with null values aren’t even stored by DynamoDB)
avoid large arrays or nested objects
use sparse indexes where only items with specific attributes are included
Additionally, being aware of DynamoDB’s other limitations is very important.
query/scan limits of 1MBs of data
partition throughput limits
secondary indexes limits
eventual consistency
limits on transaction/batches
streams retention
The takeaway is being aware of the size limits and other limitations in DynamoDB will save you much time and frustration.
Consequently, knowing these will help you design better and more efficient tables.
I wrote a dedicated article here to help you learn about DynamoDB’s limitations.
7. Neglecting To Monitor (Sloth)
The 7th DynamoDB sin.
Setting up your table and never reviewing it to optimize access patterns, indexes, and capacity is a sin in almost any database system.
Monitoring your tables regularly can help you identify areas to optimize.
Over time as your data grows your access patterns evolve. A static design will lead to inefficiencies and higher costs.
How can you fix this?
Regularly monitor your tables
Use CloudWatch metrics
Use AWS Cost Explorer to keep track of usage and costs
Monitoring will allow you to:
track read and write capacity usage
identify hot partitions
optimize query performance
view throttled requests
get a detailed views of table costs
track errors
track IAM permissions usage
many more benefits
Don’t neglect monitoring. Learn how to monitor like a pro in my detailed article here.
Conclusion
Did I miss bad practices? Probably. The aim was to put them into 7 “sins”.
Share some bad practices you’ve seen below!
👋 My name is Uriel Bitton and I’m committed to helping you master Serverless, Cloud Computing, and AWS.
🚀 If you want to learn how to build serverless, scalable, and resilient applications, you can also follow me on Linkedin for valuable daily posts.
Thanks for reading and see you in the next one!