I am new to Cassandra's DB, and I'm creating a database structure and I wonder whether my pick is optimal for my requirements.
I need to get information on unique users, and each unique user will have multiple page views.
My two options are de-normalizing my data into one big table, or create two different tables.
My most used queries would be:
- Searching for all of the pages with a certain cookie_id.
- Searching for all of the pages with a certain cookie_id and a client_id. If a cookie doesn't have a client, it would be marked client_id=0 (that would be most of the data).
- Find the first cookie_id with extra data (for example data_type_1 + data_type_2).
My two suggested schemas are these: Two tables - One for users and one for visited pages. This would allow me to save a new user on a different table, and keep every recurring request in another table.
CREATE TABLE user_tracker.pages ( cookie_id uuid, created_uuid timeuuid, data_type_3 text, data_type_4 text, PRIMARY KEY (cookie_id, created_uuid) ); CREATE TABLE user_tracker.users ( cookie_id uuid, client_id id, data_type_1 text, data_type_2 text, created_uuid timeuuid, PRIMARY KEY (cookie_id, client_id, created_uuid) );
This data is normalized as I don't enter the user's data for every request.
One table - For all of the data saved, and the first request as the key. First request would have data_type_1 and data_type_2. I could also save "data_type_1" and "data_type_2" as a hashmap, as they represent multiple columns and they will always be in a single data set (is it considered to be better?). The column "first" would be a secondary index.
CREATE TABLE user_tracker.users_pages ( cookie_id uuid, client_id id, data_type_1 text, data_type_2 text, data_type_3 text, data_type_4 text, first boolean, created_uuid timeuuid, PRIMARY KEY (cookie_id, client_id, created_uuid) );
- In reality we have more columns than 4, this was written briefly.
As far as I understand Cassandra's best practices, I'm leaning into option #2, but please do enlighten me :-)