TTL stands for ‘Time to Live’ . It’s a common functionality provided by many of new age DBs.
Modern DBs are designed for speed and scale and much of them have some constrained to achieve that, almost all major Nosql DBs let go of joins to provide a faster and scalable persistence system. Much of their extensible functionalities are at a nascent stage. Triggers, Stored Procedures, Views, Functions, Sessions, CDC etc.
However, most of them do provide ways or alternate ways to support some of these functionalities. TTL is one such tool.
It’s a simple way of telling that for how long a data is valid. That’s the core of it but perhaps more important is who decides as how long data is valid, What do we mean by validity of data. What should happen to invalid data?
Once we understand those questions we will be diving into how to leverage TTL to accomplish our own requirement.
If I have to define TTL in a SQL world, I would think in a manner that you add a trigger on INSERT or UPDATE, what this trigger does is that it takes that particular row and starts a timer of set time let’s say 1 hour, After 1 hour the row would be deleted by itself unless you update the records in between.
It’s half baked example since there are multiple nuances to it. However much of it is true for MongoDB TTL index, You can create a predefined TTL on any field (column) with set time and any record inserted with a value in that field. It would delete the record after set time after the value in that field.
So obviously that field should be of type date or timestamp. So suppose a ttl of 86400 seconds is applied on field startsAt, and we insert a record with the value for startsAt as 2018–12–27 12:00:000 , that particular records will be deleted at 2018–12–28 12:00:000. But wait what if in between that period I update startsAt as 2018–12–28 06:00:000, then the record will be up for deletion on 2018–12–29 06:00:000, We can keep a record alive indefinitely if we keep updating the value just in time.
What possible scenario we will use it, The sky is the limit, one most obvious case is for maintaining a user session. set a ttl of 3600, whenever the request comes you update the time with current time and reset the clock, if the request comes after 1 hour, obviously the records will be not available and you can initiate the process of re login indicating the session has expired.
Using the DB for maintaining a session, helps you to alleviate the problem in a cluster scenario, A call can go to any instance and we do not have to rely on sticky sessions. prevention is better than cure.
But we are deviating from the topic.
- To send a reset password link valid for 1 day.
- OTP validity for set time
- Opening of a bid or offer for certain period and which will be closed post that
Cassandra is one of the most reliable and widely used No SQL DB. It’s a column based NoSQL which is different from MongoDB which is documents based.
And since their bases are different their TTLs are different. Conceptually they do the same thing but in different ways.
This is not a comparison study of both, It’s rather understanding the difference of TTL in both DBs and explore most of Cassandra TTLs behavior.
Since MongoDB is document based hence TTL alters the whole document, you lose the whole document (row if you may). Even though TTL is defined on a single field.
Cassandra on the other hand is Column based storage, so the TTL their alters the data in the particular column. But wait till you read till the end.
Let’s start with an example
— Assuming you have cassandra running and you can have cqlsh terminal open
cqlsh> CREATE KEYSPACE IF NOT EXISTS examples
WITH REPLICATION = {
‘class’ : ‘SimpleStrategy’,
‘replication_factor’ : 1 };cqlsh> USE examples;cqlsh:examples>CREATE TABLE IF NOT EXISTS examples.chat (
id int PRIMARY KEY,
message text,
message_by text,
time timestamp
);cqlsh:examples> INSERT INTO examples.chat (id, message, message_by, time) VALUES (1,'Hi There','John',toTimestamp(now()));cqlsh:examples> select * from examples.chat;id | message | message_by | time
----+----------+------------+---------------------------------
1 | Hi There | John | 2018-12-27 10:00:52.197000+0000(1 rows)
Till now everything should be self explanatory, since we have not used TTL, so let’s create a row using TTL.
cqlsh:examples> INSERT INTO examples.chat (id, message, message_by, time) VALUES (2,'Hello','Mary',toTimestamp(now())) USING TTL 30;
cqlsh:examples> select * from examples.chat;id | message | message_by | time
----+----------+------------+---------------------------------
1 | Hi There | John | 2018-12-27 10:09:43.972000+0000
2 | Hello | Mary | 2018-12-27 10:10:57.275000+0000(2 rows)
We created a new row with TTL 30 (in seconds). Hence the particular row with id 2 will be removed. Below query is after 30 seconds passes
cqlsh:examples> select * from examples.chat;id | message | message_by | time
----+----------+------------+---------------------------------
1 | Hi There | John | 2018-12-27 10:09:43.972000+0000(1 rows)
The first row exists since there was no TTL on that. It looks like row gets expired after TTL passes. But wait till you read till the end.
Now we will re-insert the deleted row using TTL 60 and this time we will try to find out how many time is left till the row gets deleted.
cqlsh:examples> INSERT INTO examples.chat (id, message, message_by, time) VALUES (2,'Hello','Mary',toTimestamp(now())) USING TTL 60;
cqlsh:examples> select id,message,message_by,time,ttl(message) from examples.chat;id | message | message_by | time | ttl(message)
----+----------+------------+---------------------------------+--------------
1 | Hi There | John | 2018-12-27 10:09:43.972000+0000 | null
2 | Hello | Mary | 2018-12-27 10:19:23.953000+0000 | 53(2 rows)
cqlsh:examples> select id,message,message_by,time,ttl(message) as ttl from examples.chat;id | message | message_by | time | ttl
----+----------+------------+---------------------------------+------
1 | Hi There | John | 2018-12-27 10:09:43.972000+0000 | null
2 | Hello | Mary | 2018-12-27 10:19:23.953000+0000 | 51(2 rows)
cqlsh:examples> select id,message,message_by,time,ttl(message) as ttl from examples.chat;id | message | message_by | time | ttl
----+----------+------------+---------------------------------+------
1 | Hi There | John | 2018-12-27 10:09:43.972000+0000 | null
2 | Hello | Mary | 2018-12-27 10:19:23.953000+0000 | 35(2 rows)
It can be seen from above example, We can call ttl on certain column, in the above case I am calling on ‘message’ column. It tells how many time is left when the ttl removes the data.
But does ttl(message) will be different from ttl(message_by) and will they both be different for ttl(time)?
Yes they can be, but in the above example they all would be same. Let us explore it further.
cqlsh:examples> INSERT INTO examples.chat (id, message, message_by, time) VALUES (2,'Hello','Mary',toTimestamp(now())) USING TTL 300;
cqlsh:examples> select id,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message_by | m_ttl | mb_ttl | t_ttl
----+------------+-------+--------+-------
1 | John | null | null | null
2 | Mary | 297 | 297 | 297(2 rows)
cqlsh:examples> select id,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message_by | m_ttl | mb_ttl | t_ttl
----+------------+-------+--------+-------
1 | John | null | null | null
2 | Mary | 291 | 291 | 291(2 rows)
Now we will try to update one column with different TTL value and see.
cqlsh:examples> UPDATE examples.chat USING TTL 60 SET message_by='Mary S' WHERE id=2;
cqlsh:examples> select id,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message_by | m_ttl | mb_ttl | t_ttl
----+------------+-------+--------+-------
1 | John | null | null | null
2 | Mary S | 153 | 57 | 153(2 rows)
cqlsh:examples> select id,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message_by | m_ttl | mb_ttl | t_ttl
----+------------+-------+--------+-------
1 | John | null | null | null
2 | Mary S | 137 | 41 | 137(2 rows)
So we are having different TTL on different columns. so what will happen when message_by ttl expires and other don’t?
cqlsh:examples> select id,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message_by | m_ttl | mb_ttl | t_ttl
----+------------+-------+--------+-------
1 | John | null | null | null
2 | null | 18 | null | 18(2 rows)
It removed the data for that particular column, let us see what happens if rest of ttl expires.
cqlsh:examples> select id,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message_by | m_ttl | mb_ttl | t_ttl
----+------------+-------+--------+-------
1 | John | null | null | null(1 rows)
It is safe to assume that when all the ttl is expired the row is getting deleted. But now it is clear that TTL is on column level and not row level, Initially it did look like we are putting TTL for all row when in fact we were putting TTLs on all the columns mentioned in the insert statement.
So whenever a ttl is expiring, it verifies for all other (non-primary) columns values, If all of them are null, the record is removed automatically.
But we are not done yet. Let’s try something else. Insert record with ttl on limited number of column and then update the other column without TTL.
cqlsh:examples> INSERT INTO examples.chat (id, message_by, time) VALUES (2,'Mary',toTimestamp(now())) USING TTL 60;
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+----------+------------+-------+--------+-------
1 | Hi There | John | null | null | null
2 | null | Mary | null | 32 | 32(2 rows)
cqlsh:examples> UPDATE examples.chat SET message='Hello' WHERE id=2;
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+----------+------------+-------+--------+-------
1 | Hi There | John | null | null | null
2 | Hello | Mary | null | 7 | 7(2 rows)
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+----------+------------+-------+--------+-------
1 | Hi There | John | null | null | null
2 | Hello | null | null | null | null
Here the TTL expired and it removed the column data but then message column did had non-null data, which prevented the whole records from getting deleted. Basically it proves that even if there is a single column with non null value the record stays.
What if later on we make that rebel column to null?
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+----------+------------+-------+--------+-------
1 | Hi There | John | null | null | null
2 | Hello | null | null | null | null(2 rows)
cqlsh:examples> UPDATE examples.chat SET message=null WHERE id=2;
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+----------+------------+-------+--------+-------
1 | Hi There | John | null | null | null
Well, well even the TTL has expired, if the value is set to null later on, It still deletes the record. Frankly I didn’t expect this behavior. so even if ttl is expired it does check for all column null rule at later events.
In contrast let us try to update the value to null for all the columns to the first row with id 1 which never had any TTL
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+----------+------------+-------+--------+-------
1 | Hi There | John | null | null | null(1 rows)
cqlsh:examples> UPDATE examples.chat SET message=null,message_by=null,time=null WHERE id=1;
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+---------+------------+-------+--------+-------
1 | null | null | null | null | null(1 rows)
cqlsh:examples> select * from examples.chat;id | message | message_by | time
----+---------+------------+------
1 | null | null | null
We can assume that TTL enables the record for deletion at events whenever it fulfill all null requirements
One more important thing to consider is when you insert records with ttl and again update it without ttl, the ttls will be gone on the columns you are updating.
cqlsh:examples> INSERT INTO examples.chat (id, message, message_by, time) VALUES (2,'Hello','Mary',toTimestamp(now())) USING TTL 60;
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+----------+------------+-------+--------+-------
1 | Hi There | John | null | null | null
2 | Hello | Mary | 58 | 58 | 58(2 rows)
cqlsh:examples> INSERT INTO examples.chat (id,message,time) VALUES (2,'Hola',toTimestamp(now()));
cqlsh:examples> select id,message,message_by,ttl(message) as m_ttl, ttl(message_by) as mb_ttl, ttl(time) as t_ttl from examples.chat;id | message | message_by | m_ttl | mb_ttl | t_ttl
----+----------+------------+-------+--------+-------
1 | Hi There | John | null | null | null
2 | Hola | Mary | null | 35 | null
It’s time to summarize the points we noticed and few more.
- TTL is used for expiring records or data.
- Cassandra TTL are applied at column level.
- If a record is inserted with TTL, separate TTL will be applied on all the columns mentioned in the insert statement.
- Whenever a TTL is expired in Cassandra for a column it checks for all the non primary column values in the record, if all values are null, record gets deleted.
- Even after TTL is expired in Cassandra and at later point all the non primary column value turns null, the record gets deleted.
- TTL is not applicable on primary columns in Cassandra.
- When columns with ongoing TTL are updated without TTL values, the existing TTL gets removed from those columns. In other words, One cannot update the value of a column with TTL without altering the TTL, either by new TTL or by removing it.
- TTL is not applicable for Cassandra column type counter.
As closing words I would like to put that, In Cassandra, TTL should be used with caution otherwise it can lead to erratic behavior. The way common examples are provided, it looks like in Cassandra TTL is applied at row level even though it clearly mentions that it is actually column level.
Compared to MongoDB which have predefined TTL value at collection level, Cassandra provides more granularity with TTL as different records and column can have different TTL. One key difference however is that Mongo compares the TTL with the field values whereas Cassandra calculates TTL from insert or update time.
Above examples were run at Cassandra version 3.11