In our previous post, we looked at how data is partitioned in a Cassandra cluster using a partition key. In this blog we will look at how we can define a partition key, and what is the significance of clustering keys
1.0 Introduction to Primary Key, Partition Key and Clustering keys
Primary keys in Cassandra are exactly same as what they are in other databases – it identifies a unique row in the table.
BUT in Cassandra a primary key has addition role
- Specify the
partition keys
- Specify the
clustering keys
PRIMARY KEY
= PARTITION KEY + CLUSTERING KEYS
= Unique in the table
1.1 Partition Keys
Partition Keys
are the first column specified in the Primary Key
- They are used to partition the data in the Cassandra cluster
- They can be either single column or composed of multiple columns.
- Restrictions in Query
- All columns specified as partition keys are required in the select queries
- We cannot use any other operator other than
=
for the columns specified aspartition key
1.2 Clustering Keys
Clustering Keys
are the columsn in the primary key that comes after the `Partition keys’
- They are used to sort the data within a parition
- Any
EQ
(=
) or range operators like>
,<=
, etc can be used on the columns specified asClustering keys
- All conditions before a
non-EQ
(< , >
) condition must be anEQ
condition.
2.0 Some examples of Primary Keys, and restrictions in Select queries.
Suppose we have a movies table with following columns
column | datatype |
id | uuid |
year | int |
title | text |
rating | decimal |
director | text |
Let’s see some impact of creating different type of PRIMARY keys for this table
Example 1: Only single partition key
PRIMARY KEY ( id )
id
is the partition key- No clustering columns specified in the Primary Key.
- Results in 1 row partition in the cluster, as
id
is unique

Now let’s see some possible queries on the table defined with above primary key

Example 2: Partition key and clustering keys
PRIMARY KEY( director, year, rating, id)
OR
PRIMARY KEY( (director), year, rating, id)
- The first key by default becomes the
Partition key
, and remaining keys are theClustering keys
. director
is thePartition Key
year, rating, id
– are theClustering keys
.- Data within a partition is sorted by year, rating and id (in order)
id
column as been added as a clustering key, to add uniqueness to the PRIMARY KEY – as there can be more than 1 movie with samedirector
,year
and samerating
.

Now let’s see some possible queries on the table defined with above primary key
- Only
EQ
operator (=
) can be specified for partition key. - Both
EQ
(=
) ornon-EQ
(< , >
) operators can be applied on theClustering columns
. - Only
EQ
conditions must come before anynon-EQ
conditions.

Example 3: Composite Partition Key
PRIMARY KEY ( (year, director), rating, id)
- Composite partition key of
year
anddirector
- Clustering keys are
rating
andid

In above example, each year
and director
combination is one partition (in this example almost all partition have only 1 row atmost)
Now let’s see some possible queries on the table defined with above primary key

3.0 Change sort order using With Clustering Order
We have seen that all data within a partition is sorted in ASCENDING
order, as specified by the clustering columns.
What if we wanted the data to be sorted in specific order instead of ASCENDING order always?
Cassandra provides us a way to overrride the sorting order by specifying With Clustering Order
, when creating the table.
Lets see some examples for this primary key given below
PRIMARY KEY( (director), year, rating, id)
Example 1: Sort by Year Descending and Rating Ascending
We can do so using following create table statement
CREATE TABLE movies (
id uuid,
year int,
title text,
rating decimal,
director text,
PRIMARY KEY( (director), year, rating, id)
) WITH CLUSTERING ORDER BY ( year desc);
Since we want to change sort order for first clustering key (year
), we are not required to specify the following clustering keys if their sort order is not changing.
All three options below are same.
WITH CLUSTERING ORDER BY ( year desc)
OR
WITH CLUSTERING ORDER BY ( year desc, rating asc)
OR
WITH CLUSTERING ORDER BY ( year desc, rating asc, id asc)
Example 2: Sort by Year Ascending, but Rating Descending
We can do so using following create table statement
CREATE TABLE movies (
id uuid,
year int,
title text,
rating decimal,
director text,
PRIMARY KEY( (director), year, rating, id)
) WITH CLUSTERING ORDER BY ( year asc, rating desc);
Since we want to change sort order for second clustering key (rating
), we must specify sort order for all preceding keys (e.g. year
)
Below clustering order WILL NOT work
WITH CLUSTERING ORDER BY ( rating desc)
4.0 Override query restrictions using Allow Filtering
We have seen many examples where queries will not work because
- They did not specify the partition key
- OR Partition Key used a non-EQ condition
- OR Preceding clustering keys have used a non-EQ condition
or similar restrictions on queries because of the way we have specified the PRIMARY KEY
Cassandra provides us a way to override these restrictions, and query data using ALLOW FILTERING
Lets take the example of following PRIMARY KEY
PRIMARY KEY( (director), year, rating, id)
ALL Following queries which were not working earlier will now work if we specify ALLOW FILTERING
in the query.
select * from movies where year = 2014 ALLOW FILTERING;
select * from movies where director = 'Russo' and rating > 8.0 ALLOW FILTERING;
This feature should be used with lot of caution, as it causes Cassandra to scan all partitions.
Ideally this should not be used or if possible then only use on small data sets.