News Feeds

No results for 'undefined'
Powered by Algolia

Cassandra DB structure suggestion (two tables vs one)


Author: Yossi

Originally Sourced from: https://stackoverflow.com/questions/63901597/cassandra-db-structure-suggestion-two-tables-vs-one

I am new to Cassandra's DB, and I'm creating a database structure and I wonder whether my pick is optimal for my requirements.

I need to get information on unique users, and each unique user will have multiple page views.

My two options are de-normalizing my data into one big table, or create two different tables.

My most used queries would be:

  1. Searching for all of the pages with a certain cookie_id.
  2. Searching for all of the pages with a certain cookie_id and a client_id. If a cookie doesn't have a client, it would be marked client_id=0 (that would be most of the data).
  3. Find the first cookie_id with extra data (for example data_type_1 + data_type_2).

My two suggested schemas are these: Two tables - One for users and one for visited pages. This would allow me to save a new user on a different table, and keep every recurring request in another table.

CREATE TABLE user_tracker.pages (
    cookie_id uuid,
    created_uuid timeuuid,
    data_type_3 text,
    data_type_4 text,
    PRIMARY KEY (cookie_id, created_uuid)
);

CREATE TABLE user_tracker.users (
    cookie_id uuid,
    client_id id,
    data_type_1 text,
    data_type_2 text,
    created_uuid timeuuid,
    PRIMARY KEY (cookie_id, client_id, created_uuid)
);

This data is normalized as I don't enter the user's data for every request.

One table - For all of the data saved, and the first request as the key. First request would have data_type_1 and data_type_2. I could also save "data_type_1" and "data_type_2" as a hashmap, as they represent multiple columns and they will always be in a single data set (is it considered to be better?). The column "first" would be a secondary index.

CREATE TABLE user_tracker.users_pages (
    cookie_id uuid,
    client_id id,
    data_type_1 text,
    data_type_2 text,
    data_type_3 text,
    data_type_4 text,
    first boolean,
    created_uuid timeuuid,
    PRIMARY KEY (cookie_id, client_id, created_uuid)
);
  • In reality we have more columns than 4, this was written briefly.

As far as I understand Cassandra's best practices, I'm leaning into option #2, but please do enlighten me :-)