{"componentChunkName":"component---src-templates-article-single-page-js","path":"/post/cassandra-compaction-throughput-performance-explained","result":{"pageContext":{"obj_id":"bac4a4fc-4db8-5d99-be5e-ba971e77a129","node":{"content":"<p>This is the second post in my series on improving node density and lowering costs with Apache Cassandra. In the <a href=\"https://rustyrazorblade.com/post/2025/03-streaming/\">previous post</a>, I examined how streaming performance impacts node density and operational costs. In this post, I’ll focus on compaction throughput, and a recent optimization in Cassandra 5.0.4 that significantly improves it, <a href=\"https://issues.apache.org/jira/browse/CASSANDRA-15452\" target=\"_blank\">CASSANDRA-15452</a>.</p><p>This post assumes some familiarity with Apache Cassandra storage engine fundamentals. The documentation has a nice <a href=\"https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html\" target=\"_blank\">section covering the storage engine</a> if you’d like to brush up before reading this post.</p><h2 id=\"the-compaction-bottleneck\">The Compaction Bottleneck</h2><p>Compaction in Cassandra is the process of merging multiple SSTables and writing out new ones, discarding tombstones, resolving overwrites, and generally organizing data for efficient reads. It’s an I/O intensive background operation that directly competes with foreground operations for system resources. In a later post I’ll look at how compaction strategies impact node density, but for now, I’ll just focus on throughput.</p><h2 id=\"why-compaction-throughput-matters-for-node-density\">Why Compaction Throughput Matters for Node Density</h2><p>As we continue to increase the amount of data we store per node, compaction performance becomes increasingly important. It affects:</p><ul><li>How quickly the system can reclaim disk space</li>\n<li>Whether the cluster can keep up with incoming writes</li>\n<li>Read latency, minimizing SSTables per read</li>\n<li>How fast nodes are able to join a new cluster</li>\n</ul><p>Simply put: as your data volume and write throughput increase, compaction throughput must as well. If it doesn’t, you’ll hit a performance wall that effectively caps your maximum practical node density.</p><p>Despite the significant improvements to compaction throughput over the years, there are some circumstances where compaction performance is inadequate. Let’s take a look at the reason why, then dive into what can be done about it.</p><p>When doing any performance evaluation, it’s important to understand how to measure where your time is spent. A lot of folks make incorrect assumptions, and then waste a lot of time trying to optimize something that doesn’t matter. I’ve written several posts about how useful profiling with the <a href=\"https://rustyrazorblade.com/post/2023/2023-11-07-async-profiler\">async-profiler</a> can be from an application perspective. For looking at the OS and hardware, the eBPF based toolkit <a href=\"https://rustyrazorblade.com/post/2023-11-14-bcc-tools\">bcc-tools</a> can help you identify process bottlenecks. I’ve used these tools extensively over the years, and in this post I’ll show how they’ve helped identify two major performance bottlenecks in compaction. My <a href=\"https://github.com/rustyrazorblade/easy-cass-lab\" target=\"_blank\">easy-cass-lab</a> software includes all these tools, as well as integration with <a href=\"https://axonops.com/\" target=\"_blank\">AxonOps</a> for Cassandra dashboards and operational tooling.</p><h2 id=\"being-10x-smarter-with-our-disk-access\">Being 10x Smarter With Our Disk Access</h2><p>When investigating compaction behavior, I discovered an major inefficiency in how Cassandra was accessing disk. The problem was especially severe in cloud environments with disaggregated storage like AWS EBS, where IOPS (Input/Output Operations Per Second) are both limited and expensive when used improperly.</p><p>When Cassandra would read in data during compaction, it would read individual compressed chunks off disk, one small read at a time. Using bcc-tools, we can monitor every filesystem operation. Here I’m using <code>xfsslower</code> to record every read operation on the filesystem (original headers back in for clarity):</p><div class=\"highlight\"><pre class=\"language-shell\" data-lang=\"shell\">$ sudo /usr/share/bcc/tools/xfsslower 0 -p 26988 | awk '$4 == \"R\" { print $0 }'\nTracing XFS operations\nTIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME\n22:27:38 CompactionExec 26988  R 4096    0           0.01 nb-7-big-Statistics.db\n22:27:38 CompactionExec 26988  R 4096    4           0.00 nb-7-big-Statistics.db\n22:27:38 CompactionExec 26988  R 2062    8           0.00 nb-7-big-Statistics.db\n22:27:38 CompactionExec 26988  R 14907   0           0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14924   14          0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14896   29          0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14844   43          0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14923   58          0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14931   72          0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14905   87          0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14891   101         0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14919   116         0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14965   130         0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14918   145         0.01 nb-7-big-Data.db\n22:27:38 CompactionExec 26988  R 14930   160         0.01 nb-7-big-Data.db\n</pre></div><p>The above is showing we’re reading about 14KB at a time. That’s the size of the compressed page. This pattern is terrible for performance on cloud storage systems like EBS, where:</p><ol><li>Each read operation, no matter how small, counts against your provisioned IOPS</li>\n<li>Small reads waste IOPS quota while delivering minimal data</li>\n<li>You pay for IOPS allocation whether you use it efficiently or not</li>\n</ol><p>Looking at a wall clock performance profile, we can see compaction is spending a LOT of time waiting on disk, in the really wide column with the <code>pread</code> call at the top:</p><p><img src=\"https://rustyrazorblade.com/images/2025/wall-clock-profile-compaction.png\" alt=\"wall-clock-profile-compaction.png\" referrerpolicy=\"no-referrer\" /></p><p>Readahead is a disk optimization strategy where the operating system reads a larger block of data than was requested into memory. The objective is reduce latency and improve performance for sequential read operations. Unfortunately, when you don’t need the data it’s reading, it can be the source of major performance problems. In my experience, read ahead is one of the worst culprits in the world of Cassandra performance. It’s especially terrible for lightweight transactions and counters, where we perform a read before write.</p><p>My advice to Cassandra operators is to reduce readahead to 4KB to avoid unnecessary read amplification on the read path.</p><p>Readahead does have one place, however, where it can benefit performance. You <em>may</em> have already guessed that it’s compaction. Let’s take a step back and look at how the size of our reads impacts our throughput in a simple benchmark. Larger reads, initiated either from read ahead or the user, should deliver improved throughput, especially when we’re dealing with a quota on our IOPS (EBS), our drives have higher latency (SAN), or both.</p><h2 id=\"benchmarking\">Benchmarking</h2><p>I ran benchmark tests with sequential <code>fio</code> workloads using different request sizes on a 3K IOPS GP3 EBS volume. Here’s the configuration used:</p><div class=\"highlight\"><pre class=\"language-text\" data-lang=\"text\">[global]\nrw=read\ndirectory=data\ndirect=1\ntime_based=1\nfile_service_type=normal\nstonewall\nsize=100M\nnumjobs=12\ngroup_reporting\n[bs4]\nstonewall\nruntime=60s\nblocksize=4k\n[bs8]\nstonewall\nruntime=60s\nblocksize=8k\n[bs16]\nstonewall\nruntime=60s\nblocksize=16k\n[bs32]\nstonewall\nruntime=60s\nblocksize=32k\n[bs64]\nstonewall\nruntime=60s\nblocksize=64k\n[bs128]\nstonewall\nblocksize=128k\nruntime=60s\n[bs256]\nstonewall\nruntime=60s\nblocksize=256k\n</pre></div><p>When reviewing the results, the benefits of using larger request sizes were evident:</p><table><thead><tr><th>Request Size</th>\n<th>IOPS</th>\n<th>Throughput</th>\n</tr></thead><tbody><tr><td>4K</td>\n<td>3049</td>\n<td>11.9 MB/s</td>\n</tr><tr><td>8K</td>\n<td>3012</td>\n<td>23 MB/s</td>\n</tr><tr><td>16K</td>\n<td>3013</td>\n<td>47 MB/s</td>\n</tr><tr><td>32K</td>\n<td>3013</td>\n<td>94 MB/s</td>\n</tr><tr><td>64K</td>\n<td>1938</td>\n<td>121 MB/s</td>\n</tr><tr><td>128K</td>\n<td>957</td>\n<td>120 MB/s</td>\n</tr><tr><td>256K</td>\n<td>478</td>\n<td>120 MB/s</td>\n</tr></tbody></table><p>The data shows that using 256KB reads instead of 16KB reads would deliver almost 3x the throughput while using only 1/6th of the provisioned IOPS. That’s a massive efficiency improvement. Rather than chewing through all our IOPS to deliver a paltry 47MB/s of throughput, we’re only using about 500 for 120MB/s. That means if we can see these gains in the database, we’ll be able to compact faster, put more data on each node, and lower our total cost.</p><h2 id=\"the-solution-internally-buffering-sequential-reads\">The Solution: Internally Buffering Sequential Reads</h2><p>In <a href=\"https://issues.apache.org/jira/browse/CASSANDRA-15452\" target=\"_blank\">CASSANDRA-15452</a>, I worked with my fellow Cassandra committer Jordan West to implement a solution: an efficient, internal read-ahead buffer for bulk reading operations. Here’s how it works:</p><ol><li>Instead of reading tiny chunks, we use a 256KB off-heap buffer</li>\n<li>Each read operation pulls in a full 256KB of data at once</li>\n<li>Compressed chunks are extracted from this buffer as needed</li>\n<li>The buffer is refilled only when necessary</li>\n</ol><p>This approach maximizes IOPS efficiency by using larger reads during compaction (as well as repair and range reads) that deliver more data per operation. For cloud environments, it’s a game-changer that directly aligns with storage provider recommendations. AWS EBS, for instance, <a href=\"https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-io-characteristics.html#ebs-io-iops\" target=\"_blank\">considers any I/O operation up to 256KB as a single operation</a>, so by using the largest possible size we should get optimal performance.</p><h2 id=\"real-world-impact-a-major-improvement-in-compaction-throughput\">Real-World Impact: A Major Improvement in Compaction Throughput</h2><p>When Jordan and I tested the implementation using <a href=\"https://github.com/rustyrazorblade/easy-cass-lab\" target=\"_blank\">easy-cass-lab</a> on EBS, the results were nothing short of spectacular. The <code>10.0.2.171</code> node is running our patched version, the other two nodes are running an unpatched release. The graphs clearly show a 2-3x improvement to throughput and a 3x reduction in IOPS.</p><p><img src=\"https://rustyrazorblade.com/images/2025/15452-bytes-read.png\" alt=\"15452-bytes-read.png\" referrerpolicy=\"no-referrer\" /></p><p><img src=\"https://rustyrazorblade.com/images/2025/15452-compaction.png\" alt=\"Compaction Throughput Comparison\" referrerpolicy=\"no-referrer\" /></p><p>You can see the results in the flamegraph as well. The calls to <code>pread</code> take up significantly less time.</p><p><img src=\"https://rustyrazorblade.com/images/2025/wall-clock-profile-compaction-after-15452.png\" alt=\"wall-clock-profile-compaction-after-15452.png\" referrerpolicy=\"no-referrer\" /></p><p>We can use <code>xfsslower</code> from <code>bcc-tools</code> again to watch the filesystem access:</p><div class=\"highlight\"><pre class=\"language-shell\" data-lang=\"shell\">$ sudo /usr/share/bcc/tools/xfsslower 0 -p $(cassandra-pid) | awk '$4 == \"R\" { print $0 }'\nTIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME\n14:40:29 CompactionExec 1782   R 262144  256         0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  512         0.06 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  768         0.06 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  1024        0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  1280        0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  1536        0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 241123  1792        0.07 nb-4-big-Data.db\n</pre></div><p>This is a lot better, now we’re fetching 256KB at a time using way fewer requests.</p><p>The EBS test configuration used a GP3 volume with 3K IOPS and 256MB throughput. With the existing code, compaction was bottlenecked by IOPS, peaking at exactly 3K IOPS but achieving only about 51MB/s throughput. With our optimization, the same operation used only ~500 IOPS to achieve around 106MB/s—a more than 2x improvement in throughput with 1/3IOPS.</p><p>In our most aggressive testing, <strong>we actually hit the EBS throughput limit rather than the IOPS limit</strong>. That’s a significant transformation in Cassandra’s resource utilization profile.</p><p>The patch also has the benefit of applying to anti-compaction, repair, and range reads. We can see a significant reduction in range reads, aka table scans:</p><p><img src=\"https://rustyrazorblade.com/images/2025/15452-range-reads.png\" alt=\"15452-range-reads.png\" referrerpolicy=\"no-referrer\" /></p><p>If you’re running Spark jobs using the Cassandra connector, you should see an improvement in performance, and your repair times should decrease.</p><h2 id=\"whats-next--can-we-do-more\">What’s next? Can we do more?</h2><p>Yes, absolutely! There’s several more improvements to IO that will help improve things. I’ll cover them here very quickly, and if there’s interest I’ll write about them in detail in a future post.</p><h3 id=\"avoid-reading-the-statistics\">Avoid Reading the Statistics</h3><p>When compacting, we read data out of the Statistics.db file before reading the data itself. This is completely unnecessary, as it’s stats about the data we’re about to read. Skipping this can reduce IO even further. Looking at a compaction’s IO activity, I see about 30% of the filesystem access is reading from <code>Statistics.db</code>:</p><div class=\"highlight\"><pre class=\"language-text\" data-lang=\"text\">14:40:29 CompactionExec 1782   R 4096    0           0.00 nb-3-big-Statistics.db\n14:40:29 CompactionExec 1782   R 701     4           0.00 nb-3-big-Statistics.db\n14:40:29 CompactionExec 1782   R 4096    0           0.00 nb-4-big-Statistics.db\n14:40:29 CompactionExec 1782   R 4096    4           0.00 nb-4-big-Statistics.db\n14:40:29 CompactionExec 1782   R 1962    8           0.00 nb-4-big-Statistics.db\n14:40:29 CompactionExec 1782   R 262144  0           0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 2115    0           0.01 nb-3-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  256         0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  512         0.06 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  768         0.06 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  1024        0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  1280        0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 262144  1536        0.07 nb-4-big-Data.db\n14:40:29 CompactionExec 1782   R 241123  1792        0.07 nb-4-big-Data.db\n</pre></div><p>This has already been fixed in <code>trunk</code> by Branimir Lambov in <a href=\"https://issues.apache.org/jira/browse/CASSANDRA-20092\" target=\"_blank\">CASSANDRA-20092</a> and is being backported to 5.0 by Jordan.</p><h3 id=\"direct-io-for-compaction\">Direct I/O for Compaction</h3><p>Let’s talk more about page cache. Since we go through the Linux page cache when doing reads, we want to make sure it’s working optimally. Page cache lets us avoid going to disk! Unfortunately we also use it when reading for compaction. This is a problem because we’re pulling data into the page cache that we plan on deleting. To make room for the new data, other data will be evicted. If we compact 10GB of data, we’re pushing out a lot of valuable data from the page cache, meaning it needs to be fetched back into memory later on. Using <a href=\"https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/5/html/global_file_system/s1-manage-direct-io\" target=\"_blank\">Direct I/O</a> we can bypass the page cache entirely, which will prevent data from being evicted. This can be a huge help in latency sensitive systems or systems where IOPS are limited like EBS.</p><p>I’ve filed <a href=\"https://issues.apache.org/jira/browse/CASSANDRA-19987\" target=\"_blank\">CASSANDRA-19987</a> to look at this.</p><h3 id=\"non-blocking-compression\">Non Blocking Compression</h3><p>Next, compression. When we’re writing to disk, we fill a buffer, sized by the <code>chunk_length_in_kb</code> table setting, compress, and write to disk. The compression here is a blocking call, which means we can spend a lot of time waiting on compression to finish, when we could be reading and merging the next chunk in parallel This can show up as a performance bottleneck, so I’ve filed <a href=\"https://issues.apache.org/jira/browse/CASSANDRA-20085\" target=\"_blank\">CASSANDRA-20085</a> to look into it.</p><h3 id=\"better-memory-management\">Better Memory Management</h3><p>When a system is not bottlenecked on disk I/O, such as when using NVMe, the main issue we’ll run into is our heap allocation rate. I’ll go into details in a future post, but for now, it’s enough to know that the more memory we allocate, the worse our performance. Being smart about memory allocations can make a big difference in overall time spent, as allocations aren’t free. It also reduces both the frequency and duration of Garbage Collection. Big wins all around.</p><p>I recently profiled an instance where the row size was about 2KB (not out of the ordinary) and found that a single call was accounting for roughly 50% of memory allocated. Fixing this <em>one</em> thing has the potential to deliver a massive performance improvement, especially in workloads where we have either lots of fields, or large fields like serialized blobs.</p><p>Reaching again for async-profiler, this time we run it with <code>-e alloc</code> to track allocations and <code>--reverse</code> to reverse the stacks. I do this because the same underlying call comes from the read path and compaction, and I want to see the time in aggregate.</p><p><img src=\"https://rustyrazorblade.com/images/2025/allocation-profile-compaction.png\" alt=\"allocation-profile-compaction.png\" referrerpolicy=\"no-referrer\" /></p><p>Addressing this single allocation won’t just deliver faster compaction, but will reduce pressure on the heap, which in turn reduces GC overhead. As part of this series I’ll also be covering GC, as a lot’s changed since I wrote about it last.</p><p>I’ve filed <a href=\"https://issues.apache.org/jira/browse/CASSANDRA-20428\" target=\"_blank\">CASSANDRA-20428</a> and there’s already a fair bit of discussion about different approaches to solving the problem.</p><h2 id=\"conclusion\">Conclusion</h2><p>Maximizing compaction throughput is critical for achieving higher node density with Apache Cassandra. The improvements in <a href=\"https://issues.apache.org/jira/browse/CASSANDRA-15452\" target=\"_blank\">CASSANDRA-15452</a> have removed one of the primary bottlenecks that previously limited practical node size in a lot of clusters.</p><p>By upgrading to Cassandra 5.0.4 (or later) you can:</p><ol><li>Dramatically improve compaction throughput</li>\n<li>Reduce IOPS consumption significantly</li>\n<li>Improve overall system stability during write-heavy workloads</li>\n<li>Increase the maximum practical data density per node</li>\n<li>Significantly reduce your cloud storage costs</li>\n</ol><p>This improvement, combined with the streaming optimizations discussed in the <a href=\"https://rustyrazorblade.com/post/2025/03-streaming/\">previous post</a>, creates a multiplier effect on your ability to increase node density. Each optimization removes a bottleneck, allowing you to push your hardware further and achieve more with less.</p><p>In my next post, I’ll be discussing how and why compaction strategies affect node density. Picking the right strategy can have a significant impact on your cluster’s performance and cost efficiency. Make sure you sign up for my <a href=\"https://rustyrazorblade.com/mailing-list/\">mailing list</a> if you’re interested in getting notified when it’s released!</p>If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please<a href=\"mailto:info@rustyrazorblade.com?subject=Consulting%20Services%20Inquiry\">reach out</a>if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.","id":"bac4a4fc-4db8-5d99-be5e-ba971e77a129","title":"Cassandra Compaction Throughput Performance Explained","origin_url":"https://rustyrazorblade.com/post/2025/04-compaction-throughput/","url":"https://rustyrazorblade.com/post/2025/04-compaction-throughput/","wallabag_created_at":"2025-04-24T12:03:02+00:00","published_at":"2025-04-16T00:00:00+00:00","published_by":"['']","reading_time":14,"domain_name":"rustyrazorblade.com","preview_picture":"https://rustyrazorblade.com/images/2025/wall-clock-profile-compaction.png","tags":["cassandra","performance"],"description":"This is the second post in my series on improving node density and lowering costs with Apache Cassandra. In the previous post, I examined how streaming performance impacts node density and operational..."},"relatedArticles":[{"content":"<div class=\"top-section\"><div class=\"container-fluid main-content-area\"><div id=\"scroll-status-bar\"><div id=\"scroll-status-percent\"></div><div class=\"blog-post-hero container-fluid\"><div class=\"blog-post-hero-bg\"><div class=\"blog-post-hero-bg-img\"></div><div class=\"container\"><img src=\"https://sematext.com/wp-content/uploads/2021/10/critical-cassandra-metrics-to-monitor.jpg\" id=\"the-featured-image\" width=\"1140\" height=\"626\" alt=\"image\" /></div></div><div class=\"container-fluid container-single-blog-post\"><div class=\"container\"><article class=\"single-article-blog-post\" id=\"post-53726\"><main><section><div id=\"the-content\"><p>Apache Cassandra is a distributed database known for its high availability, fault tolerance, and near-linear scaling. It was initially developed by Facebook, but it is a widely used open-source system used by the largest tech companies in the world. There are numerous reasons behind its popularity, including no single point of failure, exceptional horizontal scaling with a data layout designed as a perfect fit for time-series data.</p><p>However, despite these perks, like any other system, Cassandra is prone to performance issues. This makes monitoring imperative. And it all starts with knowing what to measure. In this article, we will explain the <strong>key Cassandra performance metrics</strong> you should monitor to make sure everything is up and running at all times.</p><h2>What Is Cassandra and How Does It Work?</h2><p>Let’s keep it short – Apache Cassandra is a distributed NoSQL database designed to provide fault-tolerant and highly available architecture with performance in mind.</p><p>As a distributed system Cassandra is built out of nodes. A <strong>node</strong> is a single instance of Apache Cassandra that can operate on its own. Multiple nodes can form a <strong>cluster</strong> – a distributed system holding common data and responding to query requests. Cassandra works in a master-less architecture where each node communicates in a <strong>peer to peer </strong>fashion using a protocol known as <strong>Gossip</strong>. The <strong>gossip</strong> protocol is designed so that each node is informed about the state of all other nodes and a single node performs <strong>gossip</strong> communication with up to three other nodes every second.</p><p>The <strong>cluster</strong> can be divided into <strong>data centers</strong> and <strong>racks</strong>, just like the real-life data centers are divided. In Cassandra terminology, a <strong>data center</strong> is designed to hold multiple <strong>racks</strong> and a single <strong>rack </strong>holds a complete replica of the data.</p><p><img data-lazyloaded=\"1\" src=\"https://sematext.com/wp-content/uploads/2021/10/cassandra-metrics-3.png\" alt=\"image\" /></p><noscript><img src=\"https://sematext.com/wp-content/uploads/2021/10/cassandra-metrics-3.png\" alt=\"image\" /></noscript><p><em>Cassandra Cluster Logical Overview</em></p><p>When it comes to the data, Cassandra stores it in tables that are organized the same way as in any other database – in rows and columns. A single table is called a column family. The tables themselves are grouped into keyspaces, where a keyspace usually holds logically similar data – for example, from a business perspective. The keyspace is also used for data replication, and the replication itself is configured on a keyspace level.</p><p>Getting back to the tables. Each table defines a primary key that is built of the partition key and the clustering columns. Cassandra uses the partition key to index the data. All data that share a common partition key make a single data partition – a basic unit for data retrieval and storage. The clustering columns are optional.</p><p>Needless to say, Apache Cassandra is a complicated, distributed system and it’s not uncommon for users to encounter operation problems and difficulties. Everything breaks eventually, from the low-level bare metal components, up to the high-level software. It is not unusual for users to deal with network issues and CPU utilization problems, especially on very large clusters. Cassandra is written in Java and uses both off-heap and heap memory, which means that as the volume of data grows, you may hit issues with the garbage collector. Finally, because of the amount of data that you will process you may need to deal with the hard disk space and performance of your I/O subsystem. All of these can be avoided by keeping an eye for the relevant metrics with the help of a good <a href=\"https://sematext.com/integrations/cassandra\">Cassandra monitoring tool</a>.</p><h2>How Is Cassandra Performance Measured?</h2><p>The most complex, distributed systems provide a set of metrics that you should take care of, monitor, and alert on to ensure that your system is healthy and working well. Apache Cassandra is no different. It provides a plethora of performance metrics which we can divide into three categories:</p><ul><li>Dedicated Apache Cassandra metrics that describe how the system and its parts perform.</li><li><a href=\"https://sematext.com/blog/jvm-metrics/\">Java Virtual Machine metrics</a> that tell you about the execution environment on which Apache Cassandra is running.</li><li><a href=\"https://sematext.com/server-monitoring/\">Operating system metrics</a> describing the metrics related to the bare metal servers, virtual machines, or containers, depending on the environment that you are using.</li></ul><h3>Dedicated Cassandra Performance Metrics</h3><p>When monitoring Apache Cassandra clusters, is the metrics that the distributed data store exposes via the JMX interface. There are many Cassandra performance metrics exposed in the JMX and having visibility into most of them is a good idea. You never know what can be useful when troubleshooting.</p><h4>Nodes</h4><p>One of the most important Cassandra metrics is the number of nodes that are currently available and connected to form a cluster. The ability to store the data and respond to queries is directly related to the availability of nodes.</p><h4>Compaction Metrics</h4><p><a href=\"https://cassandra.apache.org/doc/latest/operating/compaction/index.html\">Compaction</a> is the operation of merging multiple smaller instances of <a href=\"https://cassandra.apache.org/doc/latest/architecture/storage_engine.html#sstables\">SSTable</a> into one bigger SSTable that contains all the data from the smaller tables. Because of that, it can be very expensive and resource-consuming. Having visibility into compaction performance is critical for long-term observability – the <a href=\"https://sematext.com/blog/cassandra-monitoring-tools/\">Cassandra monitoring tool</a> of your choice needs to provide the number of compactions and the number of compacted bytes.</p><p>During compaction, until the process ends, the total disk space used may be double that before the compaction. Because of that, you should consider leaving about 50% of space free to account for compactions and, of course, set up appropriate alerts to inform you when the amount of free disk space is close to a level where compaction could fail.</p><h4>Read and Write Performance Metrics</h4><p>The next set of metrics is dedicated to clients and the read and write side of the operations. You should measure the number of reads happening in a given period, the request latency, and the number of timeouts and failures. Your Cassandra monitoring tool should provide the top-level view and allow for slicing and dicing through the data showing you the aggregated view, per node view, per keyspace view, and per table view. The same goes for write operations.</p><p>You should see the number of write requests happening and write latency. Local writes and reads may also be important when troubleshooting.</p><h4>Table Metrics</h4><p>Table metrics are also essential. The ones you should pay close attention to are partition size, tombstone scans, and the number of SSTables per read.</p><h5>Partition Size</h5><p>Partition size is crucial for cluster performance. Cassandra uses it as a unit of data storage, replication, and retrieval, thus directly dictating the performance of your Cassandra tables. The ideal partition size varies but is usually below 100MB and not less than 10 – 20MB.</p><h5>Tombstones</h5><p>Cassandra produces <a href=\"https://cassandra.apache.org/doc/latest/operating/compaction.html#tombstones-and-garbage-collection-gc-grace\">tombstones</a> when you delete the data. They are markers of the deleted data. Data in Cassandra is immutable by design, and because of that, it can only be physically removed from the SSTable during compactions. Because of that, you should keep an eye on how they affect your disk space.</p><h5>SSTables Per Read</h5><p>Similar to tombstones, the number of SSTables per read is related to the immutability of the data in Cassandra. A single table can be built of multiple SSTables, which are written sequentially. A single read operation can result in reading multiple SSTables to retrieve the relevant data. The more SSTables Cassandra needs to read to return the data, the more resources are required to complete the read operation. This is why you should minimize the number whenever possible.</p><h4>Other Metrics</h4><p>As we mentioned earlier, other Apache Cassandra performance metrics can be helpful and you should consider monitoring them.</p><h5>Caches</h5><p>There are two types of caches in Cassandra – the key cache and the row cache. Cassandra uses the key cache to store the location of row keys in memory so that the rows can be accessed without the need to hit the disk. The row cache stores the rows themselves in memory. By using the caches, Cassandra reduces the need to read the data from the disk and trades the memory usage for performance.</p><p>You need to monitor the key cache requests and row cache requests, which tell how many requests to a given cache type were made, and the key cache hit ratio and the row cache hit ratio, which show the percentage of results retrieved from the cache instead of the disk.</p><h5>Threadpool</h5><p>Cassandra is designed to handle the high load, withstand backpressure, and perform asynchronous tasks. Monitoring various thread pools is crucial for understanding Cassandra’s performance and bottlenecks. Each thread pool exposes the number of active, pending, and blocked tasks. Accumulated, pending, and blocked tasks usually tell about performance issues and the need for more processing power or different data and query architecture.</p><h5>Bloom Filter</h5><p>In the read path, Cassandra merges the data stored on a disk inside the SSTables with the data stored in memory. To minimize the amount of checking for data existence in the SSTables on the disk Cassandra uses a data structure called bloom filter.</p><p>The bloom filter is a probabilistic data structure that can tell Cassandra that the data is definitely not in a given file or that the data may be present in a given file. The key metrics to monitor here are the amount of space used by bloom filters, the number of false positives, and the ratio. You can reduce the number of false positives by assigning more memory to the bloom filters.</p><h3>Java Virtual Machine Metrics</h3><p><a href=\"https://cassandra.apache.org/\">Apache Cassandra</a> is a JVM-based application that comes with all the usual JVM pros and cons. From the developer’s perspective, memory management is easier and requires less hassle – you just use an object and forget about it, letting the JVM do the cleaning up. But that means that something has to clean up all the unused objects in memory. This is where the <a href=\"https://sematext.com/blog/java-garbage-collection/\">Java Garbage Collection</a> comes in and the metrics that come with it.</p><p>A proper <a href=\"https://sematext.com/integrations/cassandra-monitoring/\">Cassandra monitoring tool</a> should provide metrics that allow you to check and troubleshoot issues with the Java Virtual Machine, such as JVM memory utilization and garbage collection count and time. You can read more about them in our guide about <a href=\"https://sematext.com/blog/jvm-metrics/\">JVM metrics</a>.</p><h3>Operating System Metrics</h3><p>You can’t ignore Operating System metrics either. Information such as CPU utilization, memory usage, and disk usage is essential and can play a major role when it comes to Cassandra performance.</p><h4>CPU Utilization</h4><p>Your CPU is used for data processing and query handling. The more spare CPU cycles you have on a given node, the data and queries it can process. The <strong>user</strong> part of the CPU usage will show you your Cassandra process needs, while the <strong>wait</strong> can point to a bottleneck in I/O or network. As with every Java application, CPU cycles are also needed for garbage collection, so keep that in mind when planning.</p><h4>Memory Usage</h4><p>Memory usage is crucial for every JVM-based application. The newest version of Cassandra leverages both off-heap and heap memory. This means that you not only need to set the heap size of your Cassandra nodes correctly but also have enough off-heap memory for keeping your cluster performance at its best.</p><h4>Disk Usage</h4><p>Disk and I/O are crucial – Cassandra keeps its data on the disk, and each query may require a substantial number of I/O operations to return the results. You need to be sure that your hardware can handle your data retrieval needs. You also need to be sure that you have enough space to hold your data and handle the compaction process.</p><h2>Monitor Cassandra Performance with Sematext</h2><p><img data-lazyloaded=\"1\" data-placeholder-resp=\"1999x1017\" src=\"https://sematext.com/wp-content/uploads/2021/10/cassandra-metrics-1.png\" class=\"alignnone\" alt=\"monitoring cassandra performance metrics with sematext\" width=\"1999\" height=\"1017\" /></p><noscript><img class=\"alignnone\" src=\"https://sematext.com/wp-content/uploads/2021/10/cassandra-metrics-1.png\" alt=\"monitoring cassandra performance metrics with sematext\" width=\"1999\" height=\"1017\" /></noscript><p><a href=\"https://sematext.com/cloud/\">Sematext Cloud</a> and its <a href=\"https://sematext.com/integrations/cassandra\">Apache Cassandra monitoring</a> integration provide all that you need to monitor your distributed database. Everything is within a single view available without distractions:</p><ul><li>The overview report gives you a perfect start point for your metrics, painting a picture of the whole cluster.</li><li>A dedicated Cassandra report that provides an in-depth view of all relevant metrics related to the distributed database.</li><li>The OS report provides necessary operating system metrics such as CPU and memory utilization and visibility into your network traffic.</li><li>Finally, the JVM metrics give the full view of the Java Virtual Machine, such as metrics related to garbage collection and per-heap space memory utilization.</li></ul><p>Using the dedicated split-view, you can correlate all the available metrics with other metrics, <a href=\"https://sematext.com/logsene/\">logs</a>, and <a href=\"https://sematext.com/experience/\">real user monitoring</a> data, making Sematext a perfect visibility tool.</p><p>Sematext allows you to set up alerts on any metric or log the event and supports both threshold-based and anomaly-based alerts for full flexibility. You don’t have to watch your metrics over and over. Once you configure your alerts, you can sleep well, and Sematext will let you know if something is wrong.</p><p>If you want to see how Sematext stacks against similar solutions, read our article about the best <a href=\"https://sematext.com/blog/cassandra-monitoring-tools/\">Cassandra monitoring tools</a> available today.</p><h2>Get Started with Cassandra Monitoring</h2><p>As the distributed database Apache Cassandra can quickly become an operational challenge without visibility into what is happening from a global perspective as well as on a node level. You need to have full visibility from top to bottom, but that is not enough. You need to be sure that your monitoring system can notify you when an issue happens and also predict issues before your customers notice them.</p><p>One of the tools that will give you all of that is Sematext’s <a href=\"https://sematext.com/integrations/\">Apache Cassandra monitoring</a> integration. String monitoring your Cassandra cluster by creating the Sematext Cloud account and then the Cassandra monitoring App. Don’t forget to create a Logs App as well to ship you Cassandra logs for a full observability experience.</p><div id=\"jp-relatedposts\" class=\"jp-relatedposts\"><h3 class=\"jp-relatedposts-headline\"><em>You might also like</em></h3></div></div><div id=\"twitter-button\"><p class=\"text-center\"><a href=\"https://apps.sematext.com/ui/registration\" id=\"continue-conversation-twitter\" class=\"g-btn-outline-orange\">Start Your Free Trial</a></p></div></section><aside><div class=\"aside-blog-content\"><div class=\"aside-blog-content-search\"><form role=\"search\" method=\"get\" class=\"form-search\" action=\"https://sematext.com/\"><div class=\"input-group\">\n<label class=\"screen-reader-text\" for=\"s\">Search for:</label>\n<input type=\"text\" class=\"form-control search-query\" placeholder=\"Search…\" value=\"\" name=\"s\" title=\"Search for:\" /><button type=\"submit\" class=\"btn btn-default\" name=\"submit\" id=\"searchsubmit\" value=\"search\">\n</button></div></form></div><div id=\"related-content\"><div class=\"hiring-block\"><h4>Sematext is Hiring</h4><ul><li><a href=\"https://sematext.com/jobs/devops-engineer/\">DevOps Engineer</a></li><li><a href=\"https://sematext.com/jobs/customer-success-manager/\">Customer Success Manager</a></li><li><a href=\"https://sematext.com/jobs/job-product-marketing-manager/\">Product Marketing Manager</a></li><li><a href=\"https://sematext.com/jobs/job-product-manager/\">Product  Manager</a></li><li><a href=\"https://sematext.com/jobs/job-full-stack-developer/\">Full Stack Developer</a></li><li><a href=\"https://sematext.com/jobs/job-search-consulting-and-search-solutions-architect/\">Solr / Elasticsearch Solutions Architect</a></li></ul><p><a href=\"https://sematext.com/jobs/\" title=\"Sematext Jobs\">See all jobs</a></p></div><div class=\"write-to-us\"><h4>Do you have a cool story to share?</h4><p><a href=\"https://sematext.com/contact/\" title=\"Contact Us\">Write for us</a></p></div></div></div></aside></main><footer><div id=\"alternative-sharing-block\"></div></footer></article></div></div></div><div class=\"footer-area\"><div class=\"container footer-inner\"><div class=\"col-md-3 col-sm-6\"><h4>Products</h4><ul><li><a href=\"https://sematext.com/cloud/\" title=\"Sematext Cloud\">Sematext Cloud</a></li><li><a href=\"https://sematext.com/spm/\" title=\"Infrastructure Monitoring\">Infrastructure Monitoring</a></li><li><a href=\"https://sematext.com/logsene/\" title=\"Log Management\">Log Management</a></li><li><a href=\"https://sematext.com/experience/\" title=\"Real User Monitoring\">Real User Monitoring</a></li><li><a href=\"https://sematext.com/synthetic-monitoring/\" title=\"Synthetic Monitoring\">Synthetic Monitoring</a></li><li><a href=\"https://sematext.com/tracing/\" title=\"Distributed Transaction Tracing\">APM / Tracing</a></li><li><a href=\"https://sematext.com/enterprise/\" title=\"Sematext Enterprise\">Sematext Enterprise</a></li></ul></div><div class=\"col-md-2 col-sm-6\"><h4>Services</h4><ul><li><a href=\"https://sematext.com/consulting/\" title=\"Consulting\">Consulting</a></li><li><a href=\"https://sematext.com/support/\" title=\"Support\">Support</a></li><li><a href=\"https://sematext.com/training/\" title=\"Training\">Training</a></li></ul></div><div class=\"col-md-2 col-sm-6\"><h4>About</h4><ul><li><a href=\"https://sematext.com/about/\" title=\"Company\">Company</a></li><li><a href=\"https://sematext.com/blog/\" title=\"Blog\">Blog</a></li><li><a href=\"https://sematext.com/jobs/\" title=\"Jobs\">Jobs</a></li><li><a href=\"https://sematext.com/customers/\" title=\"Customers\">Customers</a></li><li><a href=\"https://status.sematext.com/\" title=\"Status\">Status</a></li></ul></div><div class=\"col-md-2 col-sm-6\"><h4>Contact</h4><ul><li><i class=\"fa fa-phone fa-fw\"> <a href=\"tel:+1%20347-480-1610\">+1 347-480-1610</a></i></li><li><i class=\"fa fa-envelope fa-fw\"> <a href=\"mailto:info@sematext.com\">info@sematext.com</a></i></li><li><i class=\"fa fa-map-marker fa-fw\"> <a href=\"https://www.google.com/maps/place/540+President+St,+Brooklyn,+NY+11215,+EE.+UU./@40.6773068,-73.9875385,17z/data=!3m1!4b1!4m5!3m4!1s0x89c25a55722bfff7:0x2143eab42dc5c96d!8m2!3d40.67713!4d-73.984982\" target=\"_blank\">Brooklyn, NY USA</a></i></li><li class=\"social-networks\">\n<a href=\"https://twitter.com/sematext\"><i class=\"fa fa-twitter\" aria-hidden=\"true\"></i></a>\n<a href=\"https://www.facebook.com/Sematext/\"><i class=\"fa fa-facebook\" aria-hidden=\"true\"></i></a>\n<a href=\"https://github.com/sematext\"><i class=\"fa fa-github\" aria-hidden=\"true\"></i></a>\n<a href=\"https://www.linkedin.com/company/294493/\"><i class=\"fa fa-linkedin\" aria-hidden=\"true\"></i></a></li></ul></div><div class=\"col-md-3 col-sm-12\"><p>\n<strong>© Sematext Group. All rights reserved</strong>\n<br /><a href=\"https://sematext.com/legal/terms-of-service/\">Terms Of Service</a> · <a href=\"https://sematext.com/legal/privacy/\">Privacy Policy</a></p><figure><a href=\"https://www.softwareadvice.com/network-monitoring/#top-products\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-01.png\" alt=\"Software Advice 2020 Front Runners\" /></a>\n<a href=\"https://www.softwareadvice.com/reporting-tools/#top-products\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-02.png\" alt=\"Software Advice 2021 Front Runners\" /></a>\n<a href=\"https://www.getapp.com/business-intelligence-analytics-software/analytics-reporting/category-leaders/\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-03.png\" alt=\"GetApp Category Leaders 2021\" /></a>\n<a href=\"https://crozdesk.com/it/application-performance-monitoring-apm-software/sematext-cloud\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-04.png\" alt=\"Crozdesk 2020 Quality Choice\" /></a>\n<a href=\"https://crozdesk.com/it/application-performance-monitoring-apm-software/sematext-cloud\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-05.png\" alt=\"Crozdesk 2020 Trusted Vendor\" /></a>\n<a href=\"https://crozdesk.com/it/application-performance-monitoring-apm-software/sematext-cloud\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-06.png\" alt=\"Crozdesk 2020 Happiest Users\" /></a></figure></div></div><footer id=\"colophon\" class=\"site-footer\" role=\"contentinfo\"><div class=\"container\"><div class=\"copyright col-md-12\"><p>\nApache Lucene, Apache Solr and their respective logos are trademarks of the Apache Software Foundation.\nElasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S.\nand in other countries. Sematext Group, Inc. is not affiliated with Elasticsearch BV.</p></div></div></footer></div></div></div></div>","id":"070dd491-a966-5c75-86e3-0327d7804b22","title":"How Do You Monitor Cassandra Performance: Key Metrics to Measure","origin_url":"https://sematext.com/blog/cassandra-monitoring/","url":"https://sematext.com/blog/cassandra-monitoring/","wallabag_created_at":"2021-11-08T17:30:27+00:00","published_at":"2021-10-04T10:55:25+00:00","published_by":"['Rafal Kuć']","reading_time":11,"domain_name":"sematext.com","preview_picture":"https://sematext.com/wp-content/uploads/2021/10/critical-cassandra-metrics-to-monitor.jpg","tags":["monitoring","cassandra","performance"],"description":"Apache Cassandra is a distributed database known for its high availability, fault tolerance, and near-linear scaling. It was initially developed by Facebook, but it is a widely used open-source system..."},{"content":"<p>This is our third post in our series on performance tuning with Apache Cassandra.  In our first post, we discussed how we can use <a href=\"https://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html\">Flame Graphs</a> to visually diagnose performance problems.  In our second post, we discussed <a href=\"http://thelastpickle.com/blog/2018/04/11/gc-tuning.html\">JVM tuning</a>, and how the different JVM settings can have an affect on different workloads.</p><p>In this post, we’ll dig into a table level setting which is usually overlooked: compression.  Compression options can be specified when creating or altering a table, and it defaults to enabled if not specified.  The default is great when working with write heavy workloads, but can become a problem on read heavy and mixed workloads.</p>\n<p>Before we get into optimizations, let’s take a step back to understand the basics of compression in Cassandra.  Once we’ve built a foundation of knowledge, we’ll see how to apply it to real world workloads.</p>\n<h2 id=\"how-it-works\">How it works</h2>\n<p>When we create a table in Cassandra, we can specify a variety of table options in addition to our fields.  In addition to options such as using <a href=\"https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html\">TWCS for our compaction strategy</a>, specifying <a href=\"https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html\">gc grace seconds</a>, and caching options, we can also tell Cassandra how we want it to compress our data.  If the compression option is not specified, LZ4Compressor will be used, which is known for it’s excellent performance and compression rate. In addition to the algorithm, we can specify our <code class=\"language-plaintext highlighter-rouge\">chunk_length_in_kb</code>, which is the size of the uncompressed buffer we write our data to as an intermediate step before writing to disk.  Here’s an example of a table using LZ4Compressor with 64KB chunk length:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>create table sensor_data ( \n    id text primary key, \n    data text) \nWITH compression = {'sstable_compression': 'LZ4Compressor', \n                    'chunk_length_kb': 64};\n</pre></div></div>\n<p>We can examine how well compression is working at the table level by checking <code class=\"language-plaintext highlighter-rouge\">tablestats</code>:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>$ bin/nodetool tablestats tlp_stress\nKeyspace : tlp_stress\n\tRead Count: 89766\n\tRead Latency: 0.18743983245326737 ms\n\tWrite Count: 8880859\n\tWrite Latency: 0.009023213069816781 ms\n\tPending Flushes: 0\n\t\tTable: sensor_data\n\t\tSSTable count: 5\n\t\tOld SSTable count: 0\n\t\tSpace used (live): 864131294\n\t\tSpace used (total): 864131294\n\t\tOff heap memory used (total): 2472433\n\t\tSSTable Compression Ratio: 0.8964684393508305\n\t\tCompression metadata off heap memory used: 140544\n</pre></div></div>\n<p>The <code class=\"language-plaintext highlighter-rouge\">SSTable Compression Ratio</code> line above tells us how effective compression is.  Compression ratio is calculated by the following:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>compressionRatio = (double) compressed/uncompressed;\n</pre></div></div>\n<p>meaning the smaller the number, the better the compression.  In the above example our compressed data is taking up almost 90% of the original data, which isn’t particularly great.</p>\n<h2 id=\"how-data-is-written\">How data is written</h2>\n<p>I’ve found digging into the codebase, profiling and working with a debugger to be the most effective way to learn how software works.</p>\n<p>When data is written to / read from SSTables, we’re not dealing with convenient typed objects, we’re dealing with streams of bytes.  Our compressed data is written in the <code class=\"language-plaintext highlighter-rouge\">CompressedSequentialWriter</code> class, which extends <code class=\"language-plaintext highlighter-rouge\">BufferedDataOutputStreamPlus</code>.  This writer uses a temporary buffer. When the data is written out to disk the buffer is compressed and some meta data about it is recorded to a <code class=\"language-plaintext highlighter-rouge\">CompressionInfo</code> file.  If there is more data than available space in the buffer, the buffer is written to, flushed, and the buffer starts fresh to be written to again (and perhaps flushed again).   You can see this in <code class=\"language-plaintext highlighter-rouge\">org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.java</code>:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>@Override\npublic void write(byte[] b, int off, int len) throws IOException\n{\n    if (b == null)\n        throw new NullPointerException();\n    // avoid int overflow\n    if (off &lt; 0 || off &gt; b.length || len &lt; 0\n        || len &gt; b.length - off)\n        throw new IndexOutOfBoundsException();\n    if (len == 0)\n        return;\n    int copied = 0;\n    while (copied &lt; len)\n    {\n        if (buffer.hasRemaining())\n        {\n            int toCopy = Math.min(len - copied, buffer.remaining());\n            buffer.put(b, off + copied, toCopy);\n            copied += toCopy;\n        }\n        else\n        {\n            doFlush(len - copied);\n        }\n    }\n}\n</pre></div></div>\n<p>The size of this buffer is determined by <code class=\"language-plaintext highlighter-rouge\">chunk_length_in_kb</code>.</p>\n<h2 id=\"how-data-is-read\">How data is read</h2>\n<p>The read path in Cassandra is (more or less) the opposite of the write path.  We pull chunks out of SSTables, decompress them, and return them to the client.  The full path is a little more complex - there’s a a <code class=\"language-plaintext highlighter-rouge\">ChunkCache</code> (managed by <a href=\"https://github.com/ben-manes/caffeine\">caffeine</a>) that we go through, but that’s beyond the scope of this post.</p>\n<p>During the read path, the entire chunk must be read and decompressed.  We’re not able to selectively read only the bytes we need.  The impact of this is that if we are using 4K chunks, we can get away with only reading 4K off disk.  If we use 256KB chunks, we have to read the entire 256K.  This might be fine for a handful of requests but when trying to maximize throughput we need to consider what happens when we have requests in the thousands per second.  If we have to read 256KB off disk for ten thousand requests a second, we’re going to need to read 2.5GB per second off disk, and that can be an issue no matter what hardware we are using.</p>\n<h3 id=\"what-about-page-cache\">What about page cache?</h3>\n<p>Linux will automatically leverage any RAM that’s not being used by applications to keep recently accessed filesystem blocks in memory.  We can see how much page cache we’re using by using the <code class=\"language-plaintext highlighter-rouge\">free</code> tool:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>$ free -mhw\n              total        used        free      shared     buffers       cache   available\nMem:            62G        823M         48G        1.7M        261M         13G         61G\nSwap:          8.0G          0B        8.0G\n</pre></div></div>\n<p>Page cache can be a massive benefit if you have a working data set that fits in memory. With smaller data sets this is incredibly useful, but Cassandra was built to solve big data problems. Typically that means having a lot more data than available RAM.  If our working data set on each node is 2 TB, and we only have 20-30 GB of free RAM, it’s very possible we’ll serve <em>almost none of our requests out of cache</em>.  Yikes.</p>\n<p>Ultimately, we need to ensure we use a chunk length that allows us to minimize our I/O.  Larger chunks can compress better, giving us a smaller disk footprint, but we end up needing more hardware, so the space savings becomes meaningless for certain workloads.  There’s no perfect setting that we can apply to every workload.  Frequently, the most reads you do, the smaller the chunk size.  Even this doesn’t apply uniformly; larger requests will hit more chunks, and will benefit from a larger chunk size.</p>\n<h2 id=\"the-benchmarks\">The Benchmarks</h2>\n<p>Alright - enough with the details!  We’re going to run a simple benchmark to test how Cassandra performs with a mix of read and write requets with a simple key value data model.  We’ll be doing this using our stress tool, <a href=\"https://github.com/thelastpickle/tlp-stress\">tlp-stress</a> (commit 40cb2d28fde).  We will get into the details of this stress tool in a later post - for now all we need to cover is that it includes a key value workload out of the box we can leverage here.</p>\n<p>For this test I installed Apache Cassandra 3.11.3 on an AWS c5d.4xlarge instance running Ubuntu 16.04 following the instructions on cassandra.apache.org, and updated all the system packages using <code class=\"language-plaintext highlighter-rouge\">apt-get upgrade</code>.  I’m only using a single node here in order to isolate the compression settings and not introduce noise from the network overhead of running a full cluster.</p>\n<p>The ephemeral NVMe disk is using XFS and mounted it at <code class=\"language-plaintext highlighter-rouge\">/var/lib/cassandra</code>. I set readahead using <code class=\"language-plaintext highlighter-rouge\">blockdev  --setra 0 /dev/nvme1n1</code> so we can see the impact that compression has on our disk requests and not hide it with page cache.</p>\n<p>For each workload, I put the following command in a shell script, and ran tlp-stress from a separate c5d.4xlarge instance (passing the chunk size as the first parameter):</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>$ bin/tlp-stress run KeyValue -i 10B -p 10M --populate -t 4 \\\n  --replication \"{'class':'SimpleStrategy', 'replication_factor':1}\" \\\n  --field.keyvalue.value='book(100,200)' -r .5  \\\n  --compression \"{'chunk_length_in_kb': '$1', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}\" \\\n  --host 172.31.42.30\n</pre></div></div>\n<p>This runs a key value workload across 10 million partitions (<code class=\"language-plaintext highlighter-rouge\">-p 10M</code>), pre-populating the data (<code class=\"language-plaintext highlighter-rouge\">--populate</code>), with 50% reads (<code class=\"language-plaintext highlighter-rouge\">-r .5</code>), picking 100-200 words from of one of the books included in the stress tool (<code class=\"language-plaintext highlighter-rouge\">--field.keyvalue.value='book(100,200)'</code>).  We can specify a compression strategy using <code class=\"language-plaintext highlighter-rouge\">--compression</code>.</p>\n<p>For the test I’ve used slightly modified Cassandra configuration files to reduce the effect of GC pauses by increasing the total heap (12GB) as well as the new gen (6GB).  I spend a small amount of time on this as optimizing it perfectly isn’t necessary.  I also set compaction throughput to <code class=\"language-plaintext highlighter-rouge\">160</code>.</p>\n<p>For the test, I monitored the JVM’s allocate rate using the <a href=\"https://github.com/aragozin/jvm-tools\">Swiss Java Knife</a> (sjk-plus) and disk / network / cpu usage with dstat.</p>\n<h3 id=\"default-64kb-chunk-size\">Default 64KB Chunk Size</h3>\n<p>The first test used the default of 64KB chunk length.  I started the stress command and walked away to play with my dog for a bit.  When I came back, I was through about 35 million requests:</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/stress-64.png\" alt=\"stress 64kb\" /></p>\n<p>You can see in the above screenshot our 5 minute rate is about 22K writes / second and 22K reads/ second.  Looking at the output of dstat at this time, we can see we’re doing between 500 and 600MB / second of reads / second:</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/dstat-64.png\" alt=\"DStat 64KB\" /></p>\n<p>Memory allocation fluctuated a bit, but it hovered around 1GB/s:</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/sjk-64.png\" alt=\"sjk 4kb\" /></p>\n<p>Not the most amazing results in the world.  Of the disk reads, some of that throughput can be attributed to compaction, which we’ll always have to contend with in the real world.  That’s capped at 160MB/s, leaving around 400MB/s to handle reads.  That’s a lot considering we’re only sending 25MB across the network.  That means we’re doing over 15x the disk I/O than our network I/O.  We are very much disk bound in this workload.</p>\n<h3 id=\"4kb-chunk-size\">4KB Chunk Size</h3>\n<p>Let’s see if the 4KB chunk size does any better.  Before the test I shut down Cassandra, cleared the data directory, and started things back up.  I ran the same stress test above using the above shell script, passing 4 as the chunk size.  I once again played fetch with my dog for a bit and came back after around the same time as the previous test.</p>\n<p>Looking at the stress output, it’s immediately obvious there’s a significant improvement:</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/stress-4.png\" alt=\"stress\" /></p>\n<p>In almost every single metric reported by the metric library the test with 4KB outperforms the 64KB test.  Our throughput is better (62K ops / second vs 44K ops / second in the 1 minute rate), and our p99 for reads is better (13ms vs 24ms).</p>\n<p>If we’re doing less I/O on each request, how does that impact our total disk and network I/O?</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/dstat-4.png\" alt=\"dstat 4kb\" /></p>\n<p>As you can see above, there’s a massive improvement.  Disk I/O is significantly reduced from making smaller (but more) requests to disk, and our network I/O is significantly higher from responding to more requests.</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/sjk-4.png\" alt=\"sjk 4kb\" /></p>\n<p>It was initially a small surprise to see an increased heap allocation rate (because we’re reading WAY less data into memory), but this is simply the result of doing a lot more requests.  There are a lot of objects created in order to satisfy a request; far more than the number created to read the data off disk.  More requests results in higher allocation.  We’d want to ensure those objects don’t make it into the Old Gen as we go through <a href=\"https://thelastpickle.com/blog/2018/04/11/gc-tuning.html\">JVM tuning</a>.</p>\n<h3 id=\"off-heap-memory-usage\">Off Heap Memory Usage</h3>\n<p>The final thing to consider here is off heap memory usage.  Along side each compressed SSTable is compression metadata.  The compression files have names like <code class=\"language-plaintext highlighter-rouge\">na-9-big-CompressionInfo.db</code>.  The compression metadata is stored in memory, off the Cassandra heap.  The size of the offheap usage is directly proportional to the amount of chunks used.  More chunks = more space used.  More chunks are used when a smaller chunk size is used, hence more offheap memory is used to store the metadata for each chunk.  It’s important to understand this trade off.  A table using 4KB chunks will use 16 times the memory as one using 64KB chunks.</p>\n<p>In the example I used above the memory usage can be seen as follows:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>Compression metadata off heap memory used: 140544 \n</pre></div></div>\n<h3 id=\"changing-existing-tables\">Changing Existing Tables</h3>\n<p>Now that you can see how a smaller chunk size can benefit read heavy and mixed workloads, it’s time to try it out.  If you have a table you’d like to change the compression setting on, you can do the following at the cqlsh shell:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>cqlsh:tlp_stress&gt; alter table keyvalue with compression = {'sstable_compression': 'LZ4Compressor', 'chunk_length_kb': 4};\n</pre></div></div>\n<p>New SSTables that are written after this change is applied will use this setting, but existing SSTables won’t be rewritten automatically.  Because of this, you shouldn’t expect an immediate performance difference after applying this setting.  If you want to rewrite every SSTable immediately, you’ll need to do the following:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>nodetool upgradesstables -a tlp_stress keyvalue\n</pre></div></div>\n<h2 id=\"conclusion\">Conclusion</h2>\n<p>The above is a single test demonstrating how a tuning compression settings can affect Cassandra performance in a significant way.  Using out of the box settings for compression on read heavy or mixed workloads will almost certainly put unnecessary strain on your disk while hurting your read performance.  I highly recommend taking the time to understand your workload and analyze your system resources to understand where your bottleneck is, as there is no absolute correct setting to use for every workload.</p>\n<p>Keep in mind the tradeoff between memory and chunk size as well.  When working with a memory constrained environment it may seem tempting to use 4KB chunks everywhere, but it’s important to understand that it’ll use more memory.  In these cases, it’s a good idea to start with smaller tables that are read from the most.</p>","id":"b16bdb40-b8c3-54e5-8afa-60ee43153f3d","title":"Apache Cassandra Performance Tuning - Compression with Mixed Workloads","origin_url":"https://thelastpickle.com/blog/2018/08/08/compression_performance.html","url":"https://thelastpickle.com/blog/2018/08/08/compression_performance.html","wallabag_created_at":"2019-12-02T18:51:11+00:00","published_at":null,"published_by":"['']","reading_time":11,"domain_name":"thelastpickle.com","preview_picture":"https://thelastpickle.com/android-chrome-192x192.png","tags":["cassandra","troubleshooting and tuning","performance"],"description":"This is our third post in our series on performance tuning with Apache Cassandra.  In our first post, we discussed how we can use Flame Graphs to visually diagnose performance problems.  In our second..."},{"content":"<p>There are multiple dimensions where <a data-mil=\"22348\" href=\"https://intellipaat.com/blog/tutorial/cassandra-tutorial/cassandra-overview/\" target=\"_blank\" rel=\"noopener\">Cassandra performance</a> can be tuned. Some of them are described below:<br /><strong>Write Operations:</strong><br />Commit log and data dirs (sstables) should be on different disks. Commit log uses sequential write however, if SSTables share the same drive with commit log , I/O contention between commit log &amp; SSTables may deteriorate commit log writes and SSTable reads.<br /> <br /><strong>Read Operations:</strong><br />A good rule of thumb is 4 concurrent_reads per processor core. May increase the value for systems with fast I/O storage.<br /> <br /><strong>Cassandra Compaction Contention:</strong><br />Reduce the frequency of memtable flush by increasing the memtable size or preventing too pre-mature flushing. Less frequent memtable flush results in fewer SSTables files and less compaction. Fewer compaction reduces SSTables I/O contention, and therefore improves read operations. Bigger memtables absorb more overwrites for updates to the same keys, and therefore accommodating more read/write operations between each flushes.<br /> <br /><strong>Memory Cache</strong>:<br />Do not increase Cassandra cache size unless there is enough physical memory (RAM). Avoid memory swapping at any cost.<br /> <br /><strong>Row Cache:</strong><br />The row cache holds the entire content of a row in memory. It provides data caching instead of reading data from the disk. good if column’s data is small so the cache is big enough to hold most of the hotspot data. Bad if column’s data is too large so the cache is not big enough to hold most of the hotspot data. It’s bad for high write/read ratios. By default, it is off. If hit ratio is below 30%, row cache should be disabled.<br /> <br /><strong>Key Cache Tuning:</strong><br />The key cache holds the location of data in memory for each column family. Its Effective if there are hot data spot &amp; cannot use row cache effectively because of the large column size. By default, Cassandra caches 200000 keys per column family. Use absolute number for <strong>keys_cached</strong> instead of percentage.<br /> <br /><strong>JVM:</strong><br />Minimum and Maximum Java Heap Size should be half of available physical memory. Size of young generation heap should be 1/4 of Java Heap. Do NOT increase the size without confirming there are enough available physical memory- Always reserves memory for OS File cache.<br />A detailed understanding of <a data-mil=\"22348\" href=\"https://intellipaat.com/blog/apache-cassandra-a-brief-intro/\" target=\"_blank\" rel=\"noopener\">Apache Cassandra</a> is available in this blog post for your perusal!</p>","id":"2f1edaee-2329-5d58-a6d0-6dce016c28f4","title":"Tuning Cassandra Performance","origin_url":"https://intellipaat.com/blog/tutorial/cassandra-tutorial/tuning-cassandra-performance/","url":"https://intellipaat.com/blog/tutorial/cassandra-tutorial/tuning-cassandra-performance/","wallabag_created_at":"2019-12-02T18:50:11+00:00","published_at":"2019-09-05T08:00:00+00:00","published_by":"['']","reading_time":1,"domain_name":"intellipaat.com","preview_picture":"https://intellipaat.com/blog/wp-content/uploads/2020/09/Certification-in-Bigdata-Analytics-IITG.jpg","tags":["cassandra","troubleshooting and tuning","performance"],"description":"There are multiple dimensions where Cassandra performance can be tuned. Some of them are described below:Write Operations:Commit log and data dirs (sstables) should be on different disks. Commit log u..."},{"content":"<p>In this topic, i will  cover the basics of general Apache Cassandra performance tuning: when to do performance tuning, how to avoid and identify problems, and methodologies to improve.</p><h4>When do you need to tune performance ?</h4><h4>optimizing:</h4><p>when things work but could be better. we want to get better performance.</p><h4><strong>Troubleshooting:</strong></h4><p>fixing a problem that impact performance could actually be broken, could just be slow in clusters, something broken can manifest as slow performance.</p><h4><strong>What are some examples of performance related complaints an admin might receive regarding Cassandra ?</strong></h4><p>Performance-related complaints:</p><ul><li> it’s slow.</li>\n<li>certain queries are slow.</li>\n<li>program X that uses the cluster is slow.</li>\n<li>A node went down.</li>\n</ul><h4>Latency, Throughput and the U.S.E Method:</h4><p>Bad methodology<br />how not to approach performance-related problems?</p>streetlight anti-method\n\trandom change anti-method\n\tblame someone else anti-method<h4>In performance tuning, what are we trying to improve ?</h4><p>latency – how long a cluster,node,server or I/O subsystem takes to respond to a request<br />throughput – how many transactions of a given size (or range) a cluster,node or I/O subsystem can complete in a given timeframe ?<br />how many operations by seconds our cluster node are processing ?<br />But we can’t forget COST !!</p><h4>How are latency and throughput related ?</h4><p>theoretically, they are independent of each other.<br />however, change in latency can have a proportional effect on throughput</p><h4>What causes a change in latency and throughput ?</h4><p>Understanding performance tuning:utilization,saturation,errors,availability</p><h4>What is utilization ? saturation ? errors ? availability ?</h4><p>Utilization – how heavily are the resources being stressed.<br />Saturation – the degree to which a resource has queued work it cannot service<br />Errors – recoverable failure or exception events in the course normal operation<br />Availability – whether a given resource can service work or not</p><h4>GOAL:</h4><p>What is the first step in achieving any performance <br /><strong>tuning goal ?</strong><br />setting a goal!<br />what are some examples of commonly heard cassandra performance tuning goals?<br />reads should be faster<br />writes to table x should be faster<br />the cluster should be able to complete x transactions per second</p><h4>what should a clearly defined performance goal take into account ?</h4><p>Writing SLA (service level agreement)<br />we need to know :<br />what is the type of operation or query ?<br />read or write workload<br />select, insert,update or delete<br />we need to understand latency: expressed as percentile<br />rank e.g 95th percentile read latency is 2 ms<br />throughput: operations per second<br />size: expressed in average bytes<br />We have to think about duration: expressed in minutes or hours<br />scope: keyspace, table, query<br />Example of SLA: “the cluster should be able to sustain<br />20000 2KB read operations per second from table X for<br />two hours with a 95th percentile read latency of 3 ms.”</p><h4>After setting a goal, how can achievement of a goal verified?</h4><p>timing tooks in your application<br />query tracing<br />jmeter test plan<br />customizable cassandra-stress</p><h4>Time in computer performance tuning:</h4><p>how long is a millisecond ?<br />why do we care about milliseconds ?</p><h4>Common latency timings in cassandra:</h4><pre>  reads from main memory should take between 36 and 130 microseconds&#13;\n  reads from an SSD should take between 100 microseconds and 12 milliseconds&#13;\n  reads from a Serial Attached SCSI rotational drive should take between 8 milliseconds and 40 milliseconds&#13;\n  reads from a SATA rotational drive take more than 15 milliseconds&#13;\n</pre><p><strong>Example:</strong><br /><strong> Workload characterization:</strong><br />classroom use case and cassandra story middle sized financial firm uses cassandra to manage distributed data 42 million stock quotes driven by a particular set of queries</p><h4>what queries drove this data model ?</h4><p>retrieve information for a specific stock trade by trade ID find all information about stock trades for a specific stock ticker and range timestamps find all information about stock trades that occurred on a specific date over a short period of time</p><h4>How do you characterize the workload ?</h4><p>what is the load being placed on your cluster<br />calling application or API<br />remote IP address</p><h4>Who is causing the load ?</h4><p>code path or stack trace</p><h4>Why is the load being called ?</h4><p>What are the load characteristics<br />throughput<br />Direction(read/write)<br />include variance<br />keyspace and column family<br />How do you characterize your workload ?<br />how is the load changing over time and is there<br />a daily pattern ?<br />is your workload read heavy or write heavy?<br />how big is your data ?<br />how much data on each node (bytes on node=data density)?<br />does active data fit in buffer cache ?</p><h4>Performance impact of Data Model: How does the data model affect performance ?</h4><p>poorly shaped rows (too narrow or too wide (we have partition to large)<br />hotspots (particular areas with a lot of reads/writes)<br />poor primary or secondary indexes<br />too many tombstones (lot of delete)</p><h4>So data model considerations:</h4><p>understand how primary key affects performance<br />take a look at query patterns and adjust how tables are modeled<br />see how replication factor and/or consistency level impact performance<br />change in compaction strategy can have a positive (or negative) impact<br />parallelize reads/writes if necessary<br />look at how moving infrequently accessed data can improve performance<br />see how per column family cache is having an impact<br />what is the relationship between the data model and cassandra’s read path optimizations (key/row cache,bloom filters, index) ?<br />nesting data (allows for greater degree of flexibility in the column family structure.) (keep all data to the same partition to satisfy a given query) but it can be easy to find model to keep most active data sets in cache, frequently accessed data which are in cache can improve performance.</p><h4>Methodologies:</h4><p>active performance tuning: we focus in the particular problem and we verify if it is fixed – suspect there’s a problem, isolate problem using tools, determine if problem is in cassandra ,environment or both.<br />verify problems and test for reproducibility,fix problems using tuning strategies, test, test and test again<br />Passive performance tuning : regular system “sanity checks”: looking some giving threshold,something we adjust for growth regularly monitor key health areas in cassandra/environment.<br />identify and tune for future growth/scalability.<br />apply tuning strategies as needed.<br /><strong> we have to use the USE Method as tool for troubleshooting</strong><br />this method gives us the methodology to look for on all components of the system not only one. it is the strategy defined by Brendan Gregg <a href=\"http://www.brendangregg.com/USEmethod/use-linux.html\" rel=\"nofollow\">http://www.brendangregg.com/USEmethod/use-linux.html</a>.</p><p>Also performs a health check of various system components to identity bottlenecks and errors<br />separated by components, type and metric to narrow scope and find location of problem.</p><p>what are two things performance tuning to improve ? latency and throuput.<br />what are two types of performance tuning methodologies ? active and passive</p><p>what tool can be used to get a performance baseline ? jmeter or cassandra-stress</p><h4><strong>Cassandra-stress:</strong></h4><p>Interpreting the output of cassandra-stress</p><p>Each line reports data for the interval between the last elapsed time and current elapsed time, which is set by the –progress-interval option (default 10 seconds).</p><pre>[hduser@base ~]$ cassandra-stress write -node 192.168.56.71 &#13;\nINFO  04:11:53 Did not find Netty's native epoll transport in the classpath, defaulting to NIO.&#13;\nINFO  04:11:56 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)&#13;\nINFO  04:11:56 New Cassandra host /192.168.56.71:9042 added&#13;\nINFO  04:11:56 New Cassandra host /192.168.56.72:9042 added&#13;\nINFO  04:11:56 New Cassandra host /192.168.56.73:9042 added&#13;\nINFO  04:11:56 New Cassandra host /192.168.56.74:9042 added&#13;\nConnected to cluster: Training_Cluster&#13;\nDatatacenter: datacenter1; Host: /192.168.56.71; Rack: rack1&#13;\nDatatacenter: datacenter1; Host: /192.168.56.72; Rack: rack1&#13;\nDatatacenter: datacenter1; Host: /192.168.56.73; Rack: rack1&#13;\nDatatacenter: datacenter1; Host: /192.168.56.74; Rack: rack1&#13;\nCreated keyspaces. Sleeping 1s for propagation.&#13;\nSleeping 2s...&#13;\nWarming up WRITE with 50000 iterations...&#13;\nFailed to connect over JMX; not collecting these stats&#13;\nWARNING: uncertainty mode (err&lt;) results in uneven workload between thread runs, so should be used for high level analysis only&#13;\nRunning with 4 threadCount&#13;\nRunning WRITE with 4 threads until stderr of mean &lt; 0.02&#13;\nFailed to connect over JMX; not collecting these stats&#13;\ntype,      total ops,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max ms,  sum ms,  sdv ms,      mb&#13;\ntotal,           402,     403,     403,     403,    10,4,     5,9,    31,4,    86,2,   132,7,   132,7,    1,0,  0,00000,      0,      0,       0,       0,       0,       0&#13;\ntotal,           979,     482,     482,     482,     8,1,     6,3,    17,5,    45,9,    88,3,    88,3,    2,2,  0,06272,      0,      0,       0,       0,       0,       0&#13;\ntotal,          1520,     530,     530,     530,     7,5,     6,4,    14,8,    24,3,   103,9,   103,9,    3,2,  0,07029,      0,      0,       0,       0,       0,       0&#13;\ntotal,          1844,     321,     321,     321,    11,9,     6,4,    33,0,   234,5,   248,5,   248,5,    4,2,  0,06134,      0,      0,       0,       0,       0,       0&#13;\ntotal,          2229,     360,     360,     360,    11,3,     5,4,    43,6,   127,6,   145,4,   145,4,    5,3,  0,06577,      0,      0,       0,       0,       0,       0&#13;\ntotal,          2457,     199,     199,     199,    20,2,     5,9,    82,2,   125,4,   203,6,   203,6,    6,4,  0,11009,      0,      0,       0,       0,       0,       0&#13;\ntotal,          2904,     443,     443,     443,     8,9,     7,0,    23,0,    37,3,    56,1,    56,1,    7,4,  0,09396,      0,      0,       0,       0,       0,       0&#13;\ntotal,          3246,     340,     340,     340,    11,7,     7,4,    41,6,    87,0,   101,0,   101,0,    8,5,  0,08625,      0,      0,       0,       0,       0,       0&#13;\ntotal,          3484,     235,     235,     235,    16,6,     7,2,    76,1,   151,8,   152,1,   152,1,    9,5,  0,09208,      0,      0,       0,       0,       0,       0&#13;\ntotal,          3679,     196,     196,     196,    19,5,     8,1,    86,0,   156,5,   174,8,   174,8,   10,5,  0,09960,      0,      0,       0,       0,       0,       0&#13;\ntotal,          4083,     369,     369,     369,    11,1,     7,3,    38,3,    90,8,   114,9,   114,9,   11,6,  0,09041,      0,      0,       0,       0,       0,       0&#13;\ntotal,          4411,     320,     320,     320,    11,4,     7,4,    39,7,    61,1,    93,6,    93,6,   12,6,  0,08422,      0,      0,       0,       0,       0,       0&#13;\ntotal,          4683,     227,     227,     227,    17,1,     6,0,    90,1,   153,3,   199,8,   199,8,   13,8,  0,08478,      0,      0,       0,       0,       0,       0&#13;\ntotal,          5131,     445,     445,     445,     8,9,     7,6,    19,3,    26,8,    50,8,    50,8,   14,8,  0,07997,      0,      0,       0,       0,       0,       0&#13;\ntotal,          5661,     521,     521,     521,     7,5,     5,4,    17,5,    63,1,    89,2,    89,2,   15,8,  0,07788,      0,      0,       0,       0,       0,       0&#13;\ntotal,          6179,     512,     512,     512,     7,7,     6,2,    16,7,    39,3,    59,1,    59,1,   16,8,  0,07578,      0,      0,       0,       0,       0,       0&#13;\ntotal,          6427,     245,     245,     245,    15,9,     6,1,    56,3,    94,0,   180,6,   180,6,   17,8,  0,07567,      0,      0,       0,       0,       0,       0&#13;\ntotal,          6831,     394,     394,     394,    10,2,     5,6,    39,0,    90,4,   111,7,   111,7,   18,9,  0,07129,      0,      0,       0,       0,       0,       0&#13;\ntotal,          7071,     235,     235,     235,    16,9,     7,4,    58,3,   149,7,   235,3,   235,3,   19,9,  0,07150,      0,      0,       0,       0,       0,       0&#13;\ntotal,          7532,     455,     455,     455,     8,7,     6,1,    17,2,    90,5,   142,2,   142,2,   20,9,  0,06840,      0,      0,       0,       0,       0,       0&#13;\ntotal,          7890,     353,     353,     353,    10,9,     7,3,    35,4,    80,6,   149,5,   149,5,   21,9,  0,06532,      0,      0,       0,       0,       0,       0&#13;\ntotal,          8172,     288,     288,     288,    13,6,     7,1,    45,6,    89,1,    89,7,    89,7,   22,9,  0,06374,      0,      0,       0,       0,       0,       0&#13;\ntotal,          8355,     171,     171,     171,    22,7,     9,1,    82,5,   137,7,   151,5,   151,5,   24,0,  0,06656,      0,      0,       0,       0,       0,       0&#13;\ntotal,          8614,     235,     235,     235,    16,9,     8,0,    56,4,    98,7,   139,5,   139,5,   25,1,  0,06622,      0,      0,       0,       0,       0,       0&#13;\ntotal,          9027,     402,     402,     402,     9,6,     8,5,    20,7,    26,5,    30,8,    30,8,   26,1,  0,06346,      0,      0,       0,       0,       0,       0&#13;\ntotal,          9496,     463,     463,     463,     8,5,     7,6,    17,4,    23,2,    30,1,    30,1,   27,1,  0,06139,      0,      0,       0,       0,       0,       0&#13;\ntotal,          9912,     408,     408,     408,     9,6,     7,5,    23,4,    33,0,    38,0,    38,0,   28,1,  0,05903,      0,      0,       0,       0,       0,       0&#13;\ntotal,         10275,     359,     359,     359,    11,0,     8,8,    26,4,    33,7,    44,7,    44,7,   29,1,  0,05693,      0,      0,       0,       0,       0,       0&#13;\ntotal,         10528,     251,     251,     251,    15,6,     9,5,    45,4,   176,1,   295,9,   295,9,   30,1,  0,05602,      0,      0,       0,       0,       0,       0&#13;\ntotal,         10711,     181,     181,     181,    21,1,     7,8,    53,8,   340,8,   396,8,   396,8,   31,1,  0,05597,      0,      0,       0,       0,       0,       0&#13;\ntotal,         10947,     233,     233,     233,    17,3,     9,3,    55,5,    94,7,   123,7,   123,7,   32,1,  0,05584,      0,      0,       0,       0,       0,       0&#13;\ntotal,         11130,     177,     177,     177,    21,0,    10,6,    86,7,   115,6,   151,9,   151,9,   33,2,  0,05706,     &#13;\n...&#13;\n&#13;\nResults:&#13;\nop rate                   : 388 [WRITE:388]&#13;\npartition rate            : 388 [WRITE:388]&#13;\nrow rate                  : 388 [WRITE:388]&#13;\nlatency mean              : 10,2 [WRITE:10,2]&#13;\nlatency median            : 7,1 [WRITE:7,1]&#13;\nlatency 95th percentile   : 25,9 [WRITE:25,9]&#13;\nlatency 99th percentile   : 71,2 [WRITE:71,2]&#13;\nlatency 99.9th percentile : 150,3 [WRITE:150,3]&#13;\nlatency max               : 396,8 [WRITE:396,8]&#13;\nTotal partitions          : 63058 [WRITE:63058]&#13;\nTotal errors              : 0 [WRITE:0]&#13;\ntotal gc count            : 0&#13;\ntotal gc mb               : 0&#13;\ntotal gc time (s)         : 0&#13;\navg gc time(ms)           : NaN&#13;\nstdev gc time(ms)         : 0&#13;\nTotal operation time      : 00:02:42&#13;\nSleeping for 15s&#13;\n&#13;\n&#13;\nData\t                            Description&#13;\n--------------------------------------------------------------------------------------------------&#13;\ntotal\t                           :Total number of operations since the start of the test.&#13;\ninterval_op_rate\t           :Number of operations performed per second during the interval &#13;\n(default 10 seconds).&#13;\ninterval_key_rate\t           :Number of keys/rows read or written per second during the interval &#13;\n(normally be the same as interval_op_rate unless doing range slices).&#13;\nlatency\tAverage latency            : for each operation during that interval.&#13;\n95th                               :95% of the time the latency was less than the number displayed in the column &#13;\n(Cassandra 1.2 or later).&#13;\n99th                               :99% of the time the latency was less than the number displayed in the column &#13;\n(Cassandra 1.2 or later).&#13;\nelapsed\tNumber of seconds          :elapsed since the beginning of the test.&#13;\n</pre><h4><strong>Cassandra tuning</strong>:</h4><p>successfull performance tuning is all about understanding.<br />understanding what some of metrics means.<br />how the software which we are tuning is architecture ?<br />in the distributed system we have to know how the software of one node<br />is working together with software of others nodes ?<br />that is for the next section come from :<br />how different pieces of cassandra can be put together ?</p><p>we can be talk in about cassandra, data model.<br />how we can get metric and how they can be performing.</p><p>The next section, we talk about environment tuning: JVM and operating system<br />The next section will be focussing on disk tuning and compaction tuning</p><h4><strong>Examine cluster and node health and tuning:</strong></h4><p>Discuss table design and tuning: successfull performance tuning is all about understanding.<br />understanding what kind of some metrics means ? how the software which we are<br />tuning is architecture ? with the distributed system, we have to how the software in one node works together with the software with others nodes?<br />some metrics we will expose<br />we are talking about cassandra, data model then we will talk about the environment tuning as the JVM and the operating system, also we will discuss disk tuning and compaction tuning</p><p>let’s talk about cluster and node tuning:what activities are in a cassandra cluster<br />happen between nodes ?<br />what happening on the network ? when we dig into some performance problem, we will look into all nodes not only one:<br />answers are: coordinator,gossip,replication,repair,read repair,bootstrapping,<br />node removal, node decommissioning.<br />we have to dig not only one node but additional node</p><p>Deeling in one node, whay does a cassandra node do ? read performance, write,<br />monitor,participate in the clusters, maintain consistency.<br />internally cassandra has lot of things to do, there is a architecture that<br />how is it organize ? how does cassandra organize all of that work ?<br />the answer of this is SEDA (staged event driven architecture), so we have<br />several thread pool and the messaging service to know how the queue works.</p><p><strong>What is a thread pool ?</strong><br />example, we have thread pool:workers:6</p><pre>                                                                                              worker thread&#13;\n  task          queue:max pending tasks:7                worker thread&#13;\n  task task  task task  task  task task task task =&gt;     worker thread &#13;\n  task                                task               worker thread&#13;\n  blocked tasks&#13;\n</pre><h4><strong>what are cassandra’s thread pools ?</strong></h4><p>read readstage:32</p><p>write (mutationstage,flushwriter,memtablepostflusher,countermutation,<br />migrationstage)<br />monitor (memorymeter:1,tracing)<br />participate in cluster (requestresponsestage:%CPUs,pendingrangecalculator:1,gossipstage:1)<br />maintain consistency(commitlogarchiver:1,miscstage:1 snapshoting/replicate data after node remove)<br />antientropystage:1 repair consistency – merkle tree build,internal responsestage:#CPUs,HintedHanoff:1,readrepairstage:#CPUs)<br />we will know , for example the single thread on monitor (memorymeter:1)<br />which one has the configurable number of threads ? its read (readstage:32*)</p><p>the utility like nodetool tpstats which can give us metrics inside into , how much work each these pools are doing ? how many appending, active blocks:<br />active – number of messages pulled off the queue,currently being processed by a thread.<br />pending – number of messages in queue waiting for a thread.<br />completed – number of messages completed.<br />blocked -when a pool reaches its max thread count it will begin queuing.<br />until the max size is reached. when this is reached it will block until there is room in the queue.<br />total blocked/all time blocked – total number of messages that have been blocked.</p><p><strong>cassandra thread pools: multi-threaded stages:</strong><br />readstage (affected by main memory,disk) -perform local reads<br />mutationstage (affected by CPU,main memory,disk) – perform local insert/update,<br />schema merge,commit log replay,hints in progress)<br />requestresponsestage (affected by network,other nodes) – when a response to a request is received this is the stage used to execute any callbacks that were created with the original request.<br />flushwriter (affected by CPU,disk) -sort and write memtables to disk<br />hintedhandoo (one thread per host being sent hints,affected by disk,network,others nodes) –<br />sends missing mutations to others nodes<br />memorymeter (several separate threads) – measure memory usage and live ratio of a memtable.<br />readrepairstage (affected by network,others nodes) -perform read repair<br />countermutation (formerly replicateOnWriteStage) -performs counter writes on non-coordinator nodes and replicates after a local write<br />internalresponsestage -responds to non-client initiated messages, including bootstrapping and schema checking.</p><p><strong>single-threaded stages:</strong><br />gossipstage (affected by network) – gossip communication<br />antientropystage(affected by network,other nodes) – build merkle tree and repair consistency<br />migrationstage(affected by network,other nodes) – make schema changes<br />miscstage (affected by disk,network,pthers nodes) – snapshotting,replicating data after node remove<br />memtablepostflusher (affected by disk) -operations after flushing the memtable.<br />Discard commit log files that have all data in them persisted to sstables.<br />flush non-column family backed secondary indexes<br />tracing -for query tracing<br />commitlogarchiver (formally commitlog_archiver) -back up or restore a commit log.</p><p><strong>Messages types:</strong></p><p>handled by the readstage thread pool<br />(read: read data from cache or disk,range_slice: read a range of data,paged_range:<br />read part of a range of data)<br />)</p><p>handled by the mutationstage thread pool: (mutation:write (insert or update) data,<br />counter_mutation:changes counter columns,read_repair:update out-of-sync data discovered during a read)</p><p>handled by readresponsestage:(request_response:respond to a coordinator,_trace:<br />used to trace a query (enable tracing or every(trace probability) queries)<br />binary:deprecated</p><p><strong>Question:why are some messages types “droppable” ?</strong><br />if the message set in one of the queue to long and the to long will be decide by the timeout by the cassandra.yaml, it will be dropped.<br />why cassandra will do that ? why cassandra does not do my work ?<br />if the node has a suffisant resource to do the job, it does not dropped it.<br />in practice, if i have a read to give to cassandra and the timeout is elapsed and it timeout. what does the coordinator in this situation do ? it depends, maybe it can be satisfy by the consistency level to another node, then the consistency level gives the query or the return the timeout then back to the client if it is the to high consistency level.<br />In the read, there is not the deal, usally we can reexecute the query.</p><p>however we have another mutation (insert/delete/update) they can be to the queue to long (2 seconds) then no thing is done, <br />so what happen ? the coordinator can be experienced the<br />timeout, may be it store the hint, may be it can be back to the driver to be retry, we<br />don’t know, but the important thing some recovery actions can be taking in this point.<br />still that node who has a write doesn’t perform, that thing like read repair or repair command can back result to the client<br /><strong>why are some messages types “droppable”</strong><br />some messages can be dropped to prioritize activities in cassandra when high resource contention occurs.<br />the number of these messages dropped is in the nodetool tpstats output<br />if you see dropped messages, you should investigate.<br />Question:What is the state of your cluster ? we have nodetool to gather information on our cluster or OpsCenter to gather information on our cluster.<br />nodetool gives us informations what happen right now, it is cummulatitve<br />but we do not know what happen last second,it is good for a general assessment<br />and also for comparaison between nodes (nodetool -h node0 tpstats, nodetool -h node0 compactionstats)</p><p>nodetool -h node0 netstats (we have to look for repair statistics) read repair<br />is when we are looking for discrepancy and if we find discrepancy between nodes, we will fixed by increasing cch (blocking), another read repair is the background, if we run the query probably consistency level 1 and we return no response, if we get a discrepancy. then we will increase the cch (background).</p><p><strong>too wide</strong>: We have partition too large<br /><strong>too narrow</strong>:partition is to small and our query takes row on all bunch of partition<br /><strong>hotspots:</strong> we have a customer who decides to design partition by country code then 90% of all rows are in US and the rest on all others countries<br /><strong>poor primary and secondary indexes:</strong><br /><strong>too many tombstones:</strong> we have a workload doing a lot of deleting and then reading data around the same partition, those things can have a huge impact on read performance.</p><p><strong>OpsCenter</strong></p><p>Changing logging levels with nodetool setlogginglevel:<br />nodetool getlogginglevels: use to get the current runtime logging levels</p><pre>root@ds220:~# nodetool getlogginglevels&#13;\nLogger Name                                                                 Log Level&#13;\nROOT                                                                        INFO&#13;\nDroppedAuditEventLogger                                                     INFO&#13;\nSLF4JAuditWriter                                                            INFO&#13;\ncom.cryptsoft                                                               OFF&#13;\ncom.thinkaurelius.thrift                                                    ERROR&#13;\norg.apache.lucene.index                                                     INFO&#13;\norg.apache.solr.core.CassandraSolrConfig                                    WARN&#13;\norg.apache.solr.core.RequestHandlers                                        WARN&#13;\norg.apache.solr.core.SolrCore                                               WARN&#13;\norg.apache.solr.handler.component                                           WARN&#13;\norg.apache.solr.search.SolrIndexSearcher                                    WARN&#13;\norg.apache.solr.update                                                      WARN&#13;\n</pre><p><strong>nodetool setlogginglevel</strong>: used to set logging level for a service can be used instead of modifying the logback.xml file possible levels:</p><pre>ALL&#13;\nTRACE&#13;\nDEBUG&#13;\nINFO&#13;\nWARN&#13;\nERROR&#13;\nOFF&#13;\n</pre><p>we can increase the level logging for example for namespace only to DEBUG or even TRACE</p><h4><strong>Data Model Tuning</strong>:</h4><p>One off the key component of cassandra is the data model.<br />for performance tuning, we have the workload which place on the database. it is in the given place on cassandra table, that where the data model come .then we have software tuning (OS,JVM, Cassandra) and Hardware.<br />in many cases, updating hardware is an option.</p><p>When we need to diagnose the data model, cassandra provides many tools to know how our table is performing, how the query is performing.<br />Here is one <strong>nodetool cfstats, additionaly nodetool cfhistograms</strong></p><p>when we identify the query which we may suspect, we have CQL tracing but we also have <strong>nodetool settraceprobability</strong><br /><strong>nodetool cfstats example:</strong><br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png\" rel=\"attachment wp-att-248\"><img data-attachment-id=\"248\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfstats1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png\" data-orig-size=\"452,423\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfstats1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=452\" class=\"alignnone wp-image-248 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=529\" alt=\"cfstats1\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png 452w, https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=300 300w\" /></a><br />This one below is the aggregate number of the keyspace not for this particular table:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png\" rel=\"attachment wp-att-249\"><img data-attachment-id=\"249\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfstats2/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png\" data-orig-size=\"332,76\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfstats2\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=332\" class=\"alignnone wp-image-249 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=529\" alt=\"cfstats2\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png 332w, https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=300 300w\" /></a><br />Then below the aggregate number of the table stock only:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png\" rel=\"attachment wp-att-250\"><img data-attachment-id=\"250\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfstats3/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png\" data-orig-size=\"456,359\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfstats3\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=456\" class=\"alignnone wp-image-250 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=529\" alt=\"cfstats3\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png 456w, https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=300 300w\" /></a><br />Here we have 13 sstable on the disk for this table.<br />How much data we are store on the disk with the metrics space used? in this case we have live bytes and the total bytes is the same, that means in this particular table, we don’t have delete, ever update only insert ,however, if I do some delete or update, we will see the difference between live and total space used, the total number will be bigger than the live.<br />Off heap memory used (1928102) shows us how many memory this table are using. We have also other couple of metrics as bloom filter space used, bytes: 7088, and Bloom filter off heap memory used, bytes: 6984, we also have Index summary off heap memory used: 390, and compression metadata off heap memory used, bytes : 1920728, and memTable data size, bytes (36389573)<br /><strong> Conclusion:</strong> If I look at these numbers:<br /><strong>1928102 + 7088 + 6984 + 1920728 +36389573=40252475 bytes (39 MB)</strong></p><p>I can know how much memory of ram are consuming by this table.</p><p>SSTable compression ratio: 0.173624: if I enable the compression like here. We have 17 % of compression of this table.<br />Number of keys: 1664 this is counting just the partition, so if we have a simple table schema , only have a partition key no cluster column , this number (1664) will be a rough number of rows on a given server.<br />These all metrics can be different on every server, so we have to go on each server to look at.</p><p>Memtable cell count (126342), memTable data size, bytes (36389573) and memTable switch count (107). MemTable data size (36389573) , this is a data which is storing in memory currently and they will go to flush on the disk. For memTable switch count, every time I flush memTable data to disk, this metric will increase. Memtable cell count indicates how many cell we have in memory.if we divide that by column count looking in schema definition (example 5 columns), 126342/5 = 25268,4, we can know approximativement the number of rows because Cassandra stores cell else the rows.</p><p>Local read count, local read latency, local write count and local write latency show us metrics read, write for this particular table. Local read count and local write count give us the indication to know if we have read heavy or write heavy. For our example, we have write heavy (4352651 greater than 942517).</p><p> Pending tasks: 0 this is counting any tasks pending, we saw that earlier on the tpstats.</p><p> Bloom filter false positives, bloom filter false ratio, bloom filter space used, bytes, bloom filter off heap memory used,bytes: the main metric here is the false ratio (0.0000) that takes the number of bloom filter false positives (48566) divided by the number of local read count (942517) , that can give us the false ratio. What did we pay to get 48566 of bloom filter false positives to get 942517 local read count? We paid these memory ram bloom filter space used, bytes: 7088 plus bloom filter off heap memory used, bytes: 6984, total of these memories: 7088 + 6984 =14072 bytes approximatively: 14 KB of RAM to get false positives to avoid unnecessary disk seek. If it is unacceptable, we can tune it by paying more memory and to get lower false positives and consequently less fewer unnecessary less disk seek, so read performance will go UP.</p><p>Index summary off heap memory used, bytes: 390: Is a memory structure we use to help us to jump to correct place in the partition index in memory. In this case we paid 390 bytes of memory that is tunable also but in Cassandra 2.1 we cannot adjust it automatically.</p><p>Compression metadata off heap memory used, bytes: 1920728 bytes, it is the amount of ram consume for compression. We can look at also.</p><p>Compacted partition minimum bytes, compacted partition maximum bytes, compacted partition mean bytes: very useful metrics for getting indication for how we have setup our partition. What I mean by that ? if we look at compacted partition minimum bytes (35426 bytes = 35 KB) and compacted partition mean bytes (122344216 bytes =116.68 MB) and we look at the maximum bytes (557074610 bytes =531,28 MB), we have the idea of the distribution of the size of the partition. If we look at the number of means bytes (122344216 bytes =116.68 MB), there is a huge discrepancy between the min and the max, probably we have the good indication of the hotspot of the data , the partition is abnormally much larger than the other partitions.</p><p>Average live cells per slice, average tombstones per slice: these metrics are only useful on production.<br />Average live cells per slice (last five minutes): 2.0: it has been reading on average 2.0 and 0.0 for tombstones, that good. If it is for example 2.0 for tombstones, that means for every 2.0 last five minutes for read, probably it pick at 2.0 tombstones that gives 50% over read performance to tombstones. Unfortunately, this number of tombstones cannot be the same. That the good indication that we have exception large delete of data.<br />For all these metrics we have some tunable.</p><p><strong>Notes:</strong><br />The <strong>bloom_filter_fp_chance</strong> and <strong>read_repair_chance</strong> control two different things. Usually you would leave them set to their default values, which should work well for most typical use cases.<br /><strong> bloom_filter_fp_chance</strong>: controls the precision of the bloom filter data for SSTables stored on disk. The bloom filter is kept in memory and when you do a read, Cassandra will check the bloom filters to see which SSTables might have data for the key you are reading. A bloom filter will often give false positives and when you actually read the SSTable, it turns out that the key does not exist in the SSTable and reading it waste the time. The better the precision used for the bloom filter, the fewer false positives it will give (but the more memory it will need).<br /><strong>From the documentation:</strong><br />0 Enables the unmodified, effectively the largest possible, Bloom filter<br />1.0 Disables the Bloom Filter<br />The recommended setting is 0.1. A higher value yields diminishing returns.<br />So a higher number gives a higher chance of a false positive (fp) when reading the bloom filter.<br />read_repair_chance: controls the probability that a read of a key will be checked against the other replicas for that key. This is useful if your system has frequent downtime of the nodes resulting in data getting out of sync. If you do a lot of reads, then the read repair will slowly bring the data back into sync as you do reads without having to run a full repair on the nodes. Higher settings will cause more background read repairs and consume more resources, but would sync the data more quickly as you do reads.<br />See documentation on this link: <a href=\"http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html\" rel=\"nofollow\">http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html</a><br />You can read also this document:<br /><a href=\"http://www.datastax.com/dev/blog/common-mistakes-and-misconceptions\" rel=\"nofollow\">http://www.datastax.com/dev/blog/common-mistakes-and-misconceptions</a><br />Here is the description of that table:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png\" rel=\"attachment wp-att-251\"><img data-attachment-id=\"251\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/desc-table1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png\" data-orig-size=\"474,408\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"desc-table1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=474\" class=\"alignnone wp-image-251 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=529\" alt=\"desc-table1\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png 474w, https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=300 300w\" /></a><br />Below we have tunable parameters for this table:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png\" rel=\"attachment wp-att-252\"><img data-attachment-id=\"252\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/desc-table2/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png\" data-orig-size=\"473,185\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"desc-table2\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=473\" class=\"alignnone wp-image-252 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=529\" alt=\"desc-table2\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png 473w, https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=300 300w\" /></a><br />These parameters influence directly the metrics we saw earlier.<br />Example:<br /><strong> bloom_filter_fp_chance</strong>=0.010000, that impact how many false positives we got and how much memory were are going to give up to low or high.<br />c<strong>aching=’KEYS_ONLY’:</strong> In this case it is key only, we just go to cache only key but we have the ability to cache rows cache , so we will increase the memory utilization of this table, that also have the big impact.<br /><strong> dclocal_read_repair_chance</strong>=0.100000 and <strong>read_repair_chance</strong>=0.00000: these parameters influence the no blocking shows in tpstats.<br /><strong> gc_grace_seconds</strong>=864000: it impacts the tombstones, how long we can hold on the tombstones but it impacts the last line we saw earlier (average tombstones per slice (last five minutes), so it is tunable.<br /><strong> index_interval</strong>=128: in Cassandra 2.1, a max and min interval, those will impact index summary off memory used, bytes saw earlier. That is tunable.<br /><strong> Populate_io_cache_on_flush</strong>=’false’: in Cassandra 2.0 it allows us to populate if we want to flush to disk and say.<br /><strong> MemTable_flush_period_in_ms</strong>=0 if we want to flush on the disk by scheduling, most people do that.<br /><strong> Compression</strong>= {‘sstable_compression’:’LZ4compressor’}: that can be impact the compression ratio we saw early. If we want, we can tune off the compression. Why do we change that? May be we have the high compression ratio probably trading off CPU cycle, we can change it.<br /><strong> Speculative_retry</strong>=’99.0PERCENTILE’: this is not impacting in the tpstats output. When we are performing a read, we have additional replicat could be used to feel the request, Cassandra will wait for this long 99.0 % , if it is the quorum read each replicat will go to ask others replicat to get the data , Cassandra will wait this long 99.0 PERCENTILE in terme in milliseconds.</p><h4><strong>NODETOOL CFHISTOGRAMS:</strong></h4><p>Let‘s go to node0 cfhistograms: in Cassandra 2.0, it is pretty long, it gives us very fine bucket. Also we have a new format for percentile sstables writes latency ,read latency , partition size , cell count<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png\" rel=\"attachment wp-att-237\"><img data-attachment-id=\"237\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png\" data-orig-size=\"353,456\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=232\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=353\" class=\"alignnone wp-image-237 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=529\" alt=\"cfhistogram1\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png 353w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=116 116w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=232 232w\" /></a><br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png\" rel=\"attachment wp-att-238\"><img data-attachment-id=\"238\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram2/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png\" data-orig-size=\"563,130\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram2\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=529\" class=\"alignnone wp-image-238 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=529\" alt=\"cfhistogram2\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=529 529w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=300 300w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png 563w\" /></a><br />How do we read us?<br />sstables per read:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png\" rel=\"attachment wp-att-239\"><img data-attachment-id=\"239\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram3/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png\" data-orig-size=\"137,78\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram3\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png?w=137\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png?w=137\" class=\"alignnone wp-image-239 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png?w=529\" alt=\"cfhistogram3\" /></a><br />31261 read will be satisfy by only one sstable but we have 240663 read operations will be satisfy by adding another sstable, that two disk seeks so the performance is there, then when we go down here, we have 397454 read operations by adding another sstables, we have three sstables, that three disk seeks. When we go, the slow the performance get, we have the performance problem,we need to get inside to know how it is performing.<br />Generally if we have this problem, it usually the function of compaction that getting behind. Compaction is the low priority task, it contains disk io, we don’t do that when we have a huge activity on the cluster.<br />Consequence: we have to read on the sstables to satisfy more read.<br />We see here on the five sstables we have 272481 read operations to satisfy read that is the problem.<br />Next is the write latency:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png\" rel=\"attachment wp-att-240\"><img data-attachment-id=\"240\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram4/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png\" data-orig-size=\"355,337\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram4\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=355\" class=\"alignnone wp-image-240 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=529\" alt=\"cfhistogram4\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png 355w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=300 300w\" /></a><br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png\" rel=\"attachment wp-att-241\"><img data-attachment-id=\"241\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram5/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png\" data-orig-size=\"419,365\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram5\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=419\" class=\"alignnone wp-image-241 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=529\" alt=\"cfhistogram5\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png 419w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=300 300w\" /></a><br />We saw in cfstats, local read latency: 0.000 ms and local write latency: 0.000 ms.<br />In write latency, us is the microseconds, here we see 2415947 writes operations will be completed in 50 us (microseconds) , this is the buck, my write operations completed in 50 us, so we have a long tail , then when we come down on write latency, the max was 126934 us:1<br />For the read latency, 50 us (microseconds) of read operations has 12976, this is a buck for read operations .this is proof that with Cassandra write latency is much much lower than read latency.<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png\" rel=\"attachment wp-att-242\"><img data-attachment-id=\"242\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram6/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png\" data-orig-size=\"236,314\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram6\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png?w=225\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png?w=236\" class=\"alignnone wp-image-242 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png?w=529\" alt=\"cfhistogram6\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png 236w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png?w=113 113w\" /></a><br />Another trend is the notion of two bumps:<br />Here is the first bump:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png\" rel=\"attachment wp-att-243\"><img data-attachment-id=\"243\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram7/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png\" data-orig-size=\"245,176\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram7\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png?w=245\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png?w=245\" class=\"alignnone wp-image-243 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png?w=529\" alt=\"cfhistogram7\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png 245w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png?w=150 150w\" /></a><br />Second bump:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png\" rel=\"attachment wp-att-244\"><img data-attachment-id=\"244\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram8/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png\" data-orig-size=\"225,289\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram8\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png?w=225\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png?w=225\" class=\"alignnone wp-image-244 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png?w=529\" alt=\"cfhistogram8\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png 225w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png?w=117 117w\" /></a><br />This is an indication of read from ram for the first bump and read from disk for the second bump.<br /><strong> Partition size</strong>:<br />So we have fine great output for partition size, these data generally is representative of cfstats by the metrics follow (compacted partition minimum bytes, compacted partition maximum bytes and compacted partition mean bytes).<br />We can see here we have one partition with 42Kb and at the end my largest has 25109160 bytes (24MB). 0, 1, 2 mean we create the bucket.<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png\" rel=\"attachment wp-att-245\"><img data-attachment-id=\"245\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram9/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png\" data-orig-size=\"184,623\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram9\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png?w=89\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png?w=184\" class=\"alignnone wp-image-245 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png?w=529\" alt=\"cfhistogram9\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png 184w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png?w=44 44w\" /></a></p><p><strong>Cell count per partition</strong>:<br />Sometimes people have large volume of data in each cell this definitively influence number of partition size but we have a large data and the small number of cells (0, 1, 2) . Combining information below with which we saw on the partition size, we can see how the data is lay out on the term of data model. That is looking at the table schema</p><p><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png\" rel=\"attachment wp-att-246\"><img data-attachment-id=\"246\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram10/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png\" data-orig-size=\"197,698\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram10\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png?w=85\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png?w=197\" class=\"alignnone wp-image-246 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png?w=529\" alt=\"cfhistogram10\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png 197w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png?w=42 42w\" /></a><br />So this line means 3 of my partition have <strong>4866323</strong> cells</p><p><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png\" rel=\"attachment wp-att-247\"><img data-attachment-id=\"247\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram11/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png\" data-orig-size=\"569,155\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram11\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=529\" class=\"alignnone wp-image-247 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=529\" alt=\"cfhistogram11\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=529 529w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=300 300w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png 569w\" /></a></p><p><strong>Quiz:</strong><br />Which of the following information does CQL display for a query trace? Elapsed time<br />Which of the following components would you not find statistics for in a nodetool cfstats output? row cache<br />nodetool cfhistograms shows per-keyspace statistics for read and write latency? false only for the table.</p><h4>Environment Tuning:</h4><p>Some we have to tune others components. For example:<br />The most common bottlenecks in Cassandra? Performance checkpoint, things that slowing down the performance of the system.<br />The most common bottlenecks are:<br />Inadequate hardware.<br />Poorly configured JVM parameters: Cassandra runs on top on this virtual machine.<br />High CPU utilization: Particularly for write workload which we can see the Cassandra CPU bond, CPU pick..<br /><strong> Insufficient or incorrect memory cache tuning</strong>: This referring for read. This point is very outside of Cassandra.<br />I like to drive this point.<br />Question: What is the best way to cope with inadequate node hardware in a Cassandra cluster?<br />Lot of people ask, we have a machine then we want the performance then what can we do?<br />Upgrade the hardware. This is the bad new.<br />If we don’t get adequate equipment for the job, it is not a good idea to do the job.<br />We could say to add another node if the existing machine not performing.</p><p>What are some of the technologies upon which a Cassandra node depends?<br />Java, JVM, JNA, JMX and a bunch of others stuff that starts with “j”</p><h4>JVM:</h4><p>Cassandra is the java program, we will know how the jvm works, so we can tune it.<br />When we can for tuning, JVM has a lot options.<br />If we run this command:</p><pre>java -XX:+PrintFlagsFinal</pre><p>We will see all of the options.<br />JVM and Garbage Collection (GC): <strong>what is Garbage collection?</strong> Java gives the developer to allocate and deallocate memory. Java can complain all of the resources on the machine, lot of ram, CPU, IO.<br /><strong> JVM generational Heap</strong>:<br />When we start the java process, jvm will allocate a big chunk of ram and that is the chunk will be managed can be half. Inside that chunk this is a heap, it divides in several pieces, we have the new gen, old gen and perm gen. after java 8, perm gen change. What is does? It stores all of classe definitions.<br />Important thing that Cassandra boot load all code in most part of perm gen which could not change in size.</p><p><a href=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png\" rel=\"attachment wp-att-253\"><img data-attachment-id=\"253\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/jvm1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png\" data-orig-size=\"810,213\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"jvm1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=529\" class=\"alignnone wp-image-253 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=529\" alt=\"jvm1\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=529 529w, https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=300 300w, https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=768 768w, https://dmngaya.files.wordpress.com/2015/12/jvm1.png 810w\" /></a><br />With Cassandra, we can do anything in perm gen.<br />But in New gen, we have Eden and survival spaces. So, when we create a new objects any kind of objects, that object will be created in Eden. Let say we created it inside the function then it will be created in the Eden and as the space of eden fills Up with objects allocations that can the garbage collection kik send , that the first garbage collector.<br />This piece (new gen) is called parallel new.<br />Important thing is if Eden fills up, we need to look for garbage, we will find and drop all objects which will not references and deallocate them. However, there will be others objects which are always references, they can be move to survival area in New gen.<br />And after the survival, this process could be promoted to Old gen.<br />There are lot of options to do how many time the objects can be on Eden or to be move to survival and also to Old gen. Lot of options.<br />This are (old gen), can be fill up with the options.<br />Take a look at this.<br />I have in my test cluster, 4 nodes working in different virtual machine.</p><pre>nodetool status&#13;\n[hduser@base cassandra]$ nodetool status&#13;\nDatacenter: datacenter1&#13;\n=======================&#13;\nStatus=Up/Down&#13;\n|/ State=Normal/Leaving/Joining/Moving&#13;\n--  Address        Load       Tokens       Owns    Host ID                               Rack&#13;\nUN  192.168.56.72  40.21 MB   256          ?       5ddb3532-70de-47b3-a9ca-9a8c9a70b186  rack1&#13;\nUN  192.168.56.73  50.88 MB   256          ?       ea5286bb-5b69-4ccc-b22c-474981a1f789  rack1&#13;\nUN  192.168.56.74  48.63 MB   256          ?       158812a5-8adb-4bfb-9a56-3ec235e76547  rack1&#13;\nUN  192.168.56.71  48.52 MB   256          ?       a42d792b-1620-4f41-8662-8e44c73c38d4  rack1&#13;\n</pre><p>Now we can do the command:</p><pre class=\"brush: css; title: ; notranslate\" title=\"\">cassandra-stress write -node 192.168.56.71</pre><p>Result:</p><pre>INFO 23:56:42 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter &#13;\nname with DCAwareRoundRobinPolicy constructor) INFO 23:56:42 New Cassandra host /192.168.56.71:9042 added INFO 23:56:42 New Cassandra &#13;\nhost /192.168.56.72:9042 added INFO 23:56:42 New Cassandra host /192.168.56.73:9042 added INFO 23:56:42 New Cassandra host /192.168.56.74:9042 &#13;\nadded Connected to cluster: Training_Cluster Datatacenter: datacenter1; Host: /192.168.56.71; Rack: rack1 Datatacenter: datacenter1; &#13;\nHost: /192.168.56.72; Rack: rack1 Datatacenter: datacenter1; Host: /192.168.56.73; Rack: rack1 Datatacenter: datacenter1; Host: /192.168.56.74; &#13;\nRack: rack1 Created keyspaces. Sleeping 1s for propagation. Sleeping 2s... Warming up WRITE with 50000 iterations... Failed to connect over JMX; &#13;\nnot collecting these stats WARNING: uncertainty mode (err&lt;) results in uneven workload between thread runs, so should be used for high level &#13;\nanalysis only Running with 4 threadCount Running WRITE with 4 threads until stderr of mean &lt; 0.02 Failed to connect over JMX; not collecting &#13;\nthese stats type, &#13;\ntotal ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb &#13;\ntotal, 2086, 2086, 2086, 2086, 1,9, 1,5, 4,2, 7,0, 46,4, 58,0, 1,0, 0,00000, 0, 0, 0, 0, 0, 0 &#13;\ntotal, 4122, 2029, 2029, 2029, 2,0, 1,6, 4,8, 8,0, 14,0, 15,3, 2,0, 0,02617, 0, 0, 0, 0, 0, 0 &#13;\ntotal, 6171, 2029, 2029, 2029, 1,9, 1,5, 5,1, 7,6, 12,0, 13,6, 3,0, 0,02038, 0, 0, 0, 0, 0, 0 &#13;\ntotal, 8466, 2288, 2288, 2288, 1,7, 1,4, 4,2, 6,1, 11,9, 14,4, 4,0, 0,02715, 0, 0, 0, 0, 0, 0</pre><p>We can run the program called jvisualvm<br />If we have the JDK installed this java visual vm<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg\" rel=\"attachment wp-att-319\"><img data-attachment-id=\"319\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg\" data-orig-size=\"1188,762\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=529\" class=\"alignnone wp-image-319 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=529\" alt=\"java1\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=1058 1058w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=1024 1024w\" /></a></p><p>We can see the available plugins on the Tools windows and activate some plugins like VisualVM-Glassfish, visual GC:<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg\" rel=\"attachment wp-att-320\"><img data-attachment-id=\"320\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java2/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg\" data-orig-size=\"1187,762\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java2\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=529\" class=\"alignnone wp-image-320 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=529\" alt=\"java2\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=1058 1058w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=1024 1024w\" /></a><br />We can see the Eden space:<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java3.png\" rel=\"attachment wp-att-321\"><img data-attachment-id=\"321\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java3/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java3.png\" data-orig-size=\"1602,830\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java3\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java3.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java3.png?w=529\" class=\"alignnone wp-image-321 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java3.png?w=529\" alt=\"java3\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java3.png?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=1058 1058w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=1024 1024w\" /></a><br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg\" rel=\"attachment wp-att-322\"><img data-attachment-id=\"322\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java4/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg\" data-orig-size=\"1599,817\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java4\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=529\" class=\"alignnone wp-image-322 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=529\" alt=\"java4\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=1058 1058w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=1024 1024w\" /></a><br />In the long time, Cassandra recommended the CMS collector. In java 7, G1 exist and in java 8 it is very good.so depending of the version of Cassandra which you are running, it will be CMS or G 1.<br />CMS and G1 have both old generation and the permanent generation.<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg\" rel=\"attachment wp-att-323\"><img data-attachment-id=\"323\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java5/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg\" data-orig-size=\"767,626\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java5\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=529\" class=\"alignnone wp-image-323 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=529\" alt=\"java5\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java5.jpg 767w\" /></a><br />The difference is in G1 we have several contigus chunk of memory like this:<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg\" rel=\"attachment wp-att-324\"><img data-attachment-id=\"324\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java6/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg\" data-orig-size=\"692,201\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java6\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=529\" class=\"alignnone wp-image-324 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=529\" alt=\"java6\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java6.jpg 692w\" /></a><br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg\" rel=\"attachment wp-att-325\"><img data-attachment-id=\"325\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java7/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg\" data-orig-size=\"575,310\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java7\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=529\" class=\"alignnone wp-image-325 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=529\" alt=\"java7\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java7.jpg 575w\" /></a><br />G1 is very very well with very large heap. We generally recommend 8 GB heap for Cassandra.<br />When the old gen fill up we can have the capacity and we do a merging garbage collection, this pause time could last for second. That bring the pausis. What is the notion of pausis ?when we are doing the garbage collection in CMS or G1, some part of have to do the stop the work pause, that where they stop our program running at all and in eden and survivor space to find any unnecessary objects so we can clean them up.<br />How long this pause is last? is the function of couple of different things<br />How many objects are still in live?<br />The number of CPU available to the jvm is also the big determine. How long the garbage collection pause there<br />Additionally CMS offers one of the other thing called the heap fragmentation. Any way CMS defragment those is to do that the full stop the wall pause by the serial collector which is another garbage collection but it is single threaded. That the extreme long pause come from.<br />For G1 the only option we have is the target pause time and the minimum is 12 hundred milliseconds.<br />We have a couple of tools available:</p><h4>1. java visual vm</h4><p><strong>2. OpsCenter</strong><br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg\" rel=\"attachment wp-att-326\"><img data-attachment-id=\"326\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java8/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg\" data-orig-size=\"598,276\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java8\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=529\" class=\"alignnone wp-image-326 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=529\" alt=\"java8\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java8.jpg 598w\" /></a></p><h4><strong>3. Jconsole and jvisualvm</strong></h4><p><a href=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg\" rel=\"attachment wp-att-327\"><img data-attachment-id=\"327\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java9/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg\" data-orig-size=\"862,376\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java9\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=529\" class=\"alignnone wp-image-327 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=529\" alt=\"java9\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java9.jpg 862w\" /></a><br /><strong>4. And the last is jstat</strong></p><pre>jstat -gccause 1607 5000 (1607 is the process id for Cassandra)</pre><h4><strong>Notes:</strong></h4><p>The most significant impact for Java virtual machine on Cassandra performance is Garbage collection.<br />The G1 collector is the preferred choice for garbage collection over CMS.<br />Metaspace does NOT exist in the new generation part of the JVM heap memory.</p><h4>JVM Tools and Tuning Strategies:</h4><p>If we have the server with 126 GB of RAM, it starts Cassandra by allocated 8 GB heap, what are it doing of the rest of the memory?<br />Page cache.<br />What is page caching? This is useful for Cassandra improving read performance. it can cache data that people are accessing frequently and get it up quicker than in the disk.</p><pre>[hduser@base ~]$ free -m&#13;\n             total       used       free     shared    buffers     cached&#13;\nMem:          5935       4271       1664         13        104       1189&#13;\n-/+ buffers/cache:       2977       2957 &#13;\nSwap:         2047          0       2047 &#13;\n</pre><p>Here we have 5935 MB of ram but 4271 MB are used, so only 1189 MB are cached.</p><h4><strong>How does Cassandra utilize page caching?</strong></h4><p>It also make a write efficient but it cannot improve a lot write.<br />How do you triage root cause for out of memory (OOM) errors? If we don’t have enough memory.<br />It can be java errors not only Cassandra.<br />All of the challenge can be to buffering data in case of writing. It tries to write data in disk</p><h4><strong>Quiz:</strong></h4><p>What is the benefit of using the page cache? Reads are more efficient, Writes are more efficient, and Repairs are more efficient<br />Memory used by the page cache will not be available to other programs until explicity freed. ? true<br />Which of the following is most likely to cause an out-of-memory error? Client-side joins<br /><strong>CPU</strong><br /><strong> CPU intensive:</strong><br />Writes (INSERT, UPDATE, DELETE), encryption, compression, Garbage collection is CPU-intensive.<br />The more CPU which you can give to garbage is the faster garbage collection will run.<br />If you want to activate the compression or the encryption, you have to monitor the CPU utilization with the tools like dstat or opscenter.<br />What to do ?<br />Add nodes<br />Use nodes that have more and faster CPUs<br />However, if you have saturation of the CPU, we have a couple of options:<br /><strong> 1. Turn off encryption, turn off compression</strong><br /><strong> 2. Add nodes</strong><br /><strong> 3. Alternatively upgrade theses nodes with more CPU</strong><br />Quiz:<br />Which of the following operations would significantly benefit from faster CPUs? Writes and Garbage collection<br />OpsCenter is a tool that can be used to monitor CPU usage ? true<br />What would be the best course of action to resolve issues with CPU saturation? Add more nodes.</p><h4><strong>Disk Tuning:</strong></h4><p>In this section we are going to talk about disk tuning and compaction.<br />Question: How do disk considerations affect performance?<br />When we operate in the database like Cassandra where we have active dataset that have a large available RAM, disk. So, we have to take these things into account.<br />SSD: spinning, or rotation, disks must move a read/write mechanical head to the portion of the disk that is being written to or read.<br />Cassandra is architecture around rotation drive with sequential write and sequential read. However, SSD is very faster. If you have a latency application, SSD is crutial.</p><p>Some of the tuning in the Cassandra.yaml file that affect disk are:<br />Configuring disks in the Cassandra.yaml file:<br />1. <strong>Disk_failure_policy:</strong> what should occur if a data disk fails? Not for performance.<br />• By default, we do stop which Cassandra can detect some kinds of corruption, shut down gossip and thrift.<br />• Best-effort -stop using failed disk and respond using remaining sstables on others disks -obsolete data can be returned if the consistency level is one<br />• Ignore -ignore fatal errors and let requests fail<br />2. <strong>Commit_failure_policy:</strong> – What should occur if a commit log disk fails?<br />• Stop -same as above<br />• Stop_commit – shutdown commit log, let writes collect but continue to serve  reads<br />• Ignore -ignore fatal errors and let batches fail<br />3. <strong>Concurrent_reads</strong> -typically set to 16 * number of drives: how many threads we can allocate for reads pool<br />4. <strong>Trickle_fsync</strong> -good to enable on SSDs but very bad for rotation drive. In SSD, it will do big flash</p><h4>Tools to diagn<strong>ose relative disks issue:</strong></h4><p>Using Linux sysstat tools to discover disk statistics:<br />System activity reporter (sar) -can get information about system buffer activity, system calls, block device, overall paging, semaphore and memory allocation, and CPU utilization.<br />Flags identify the item to check:</p><pre>sar  -d for disk &#13;\nsar  -r for memory &#13;\nsar  -S for swap space used &#13;\nsar  -b for overall I/O activities &#13;\ndstat  -a Leatherman tool for Linux -versatile replacement &#13;\nfor sysstat (vmstat, iostat, netstat, nfsstat, and ifstat) &#13;\nwith colors&#13;\nFlags identify the item to check:                      &#13;\ndstat      -d             for disk &#13;\ndstat      -m             for memory, etc&#13;\n</pre><p>Can also string flags to get multiple stats:</p><pre><strong>&#13;\ndstat    -dmnrs</strong>&#13;\n</pre><p>Step to install dstat on RHEL/CentOS 5.x/6.x and fedora 16/17/18/19/20:<br />Installing RPMForge Repository in RHEL/CentOS<br />For RHEL/CentOS 6.x 64 Bit</p><pre>sudo rpm -Uvh http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm&#13;\nsudo yum repolist&#13;\nsudo yum install dstat&#13;\n</pre><p>If we want to use linux script to monitor our statistics, we can use cron jobs like:</p><pre><strong>sudo dstat  -drsmn --output /var/log/dstat.txt 5 3 &gt;/dev/null&#13;\n</strong></pre><pre><strong>#!/bin/bash&#13;\ndstat -lrvn 10  --output /tmp/dstat.csv -CDN 30 360 mutt -a /tmp/dstat.csv  -s \"dstat report\"  me@me.com &gt;/dev/null</strong></pre><ul><li>load average (-l), disk IOPS(-r) ,vmstat(-v), and network throughput(-n)</li>\n</ul><p>Output can be displayed on webpage for monitoring<br />Output could be piped into graphical programs, like Gnumeric, Gnuplot, and Excel for visual displays.</p><ul><li>memory should be around 95% in use ( most of it in of cache memory column).</li>\n<li>CPU should be &lt;1-2% of iowait and 2-15 % system time.</li>\n<li>Network throughput should mirror whatever the application is doing.</li>\n</ul><h4><strong>Another tool is: nodetool cfhistograms to discover disk issues:</strong></h4><p>For SSD disk io disk performing, this tool can tell us something. The problem could be the JVM, the size of the RAM, there are a lot of thing it could be but disk is one of the things it could be.<br />With cfhistograms, there are two groups of bumps. It could be 1 of three things, usually we see two bumps.<br />One relative to read coming from RAM and big one coming from disk. Sometimes, it could be anything else like a lot of compaction. Compaction can cause disk contention and this contention can cause the read disk goes UP. because the compaction uses the disk by reading.<br /><strong> So, how do we deep that? How to fix it?</strong><br />We have the utility Throttle down compaction by reducing the compaction_throughput_mb_per_sec.<br />Using nodetool proxyhistograms to discover disk issues: will show the full request latency recorded by the coordinator.</p><p><strong>Using CQL TRACING</strong> : <br />To distinguish between slow disk response and slow query: slow disk response will be evident in how long it takes to access each drive.<br />If your queries need to look for SSTables on too many partitions to complete, you will see this in the trace<br />These issues will have different patterns.<br />Here we can see lot off informations like looking on the tombstoned, etc. We can know if our latency on the query is coming from the disk.and it shows us the source on which machine is experiencing those longer latencies.<br />Where is tracing information stored?<br />events table gives us lot of relative details for this particular query.<br />What role does disk readahead play in performance? We read a head a couple of blocks and that tunable, how many blocks to read a head? The problem is in Cassandra we don’t know exactly how much data we want to read a head<br />We recommend people to use readahead value of 8 for SSDs<br />Command to do that is:</p><pre>blockdev -setra 8  &#13;\n</pre><h4><strong>QUIZ:</strong></h4><p>nodetool cfhistograms shows the full request latency recorded by the coordinator.false<br />Which of the following statements is NOT true about the readahead setting?<br />Which of the following tools can be used to monitor disk statistics? iostat, dstat, sysstat, sar.</p><h4>Disk Tuning: </h4><h4>Compaction</h4><h4>How does compaction impact performance?</h4><p>Compaction is the most io intensive operation in Cassandra cluster. So, some of the choice we make around is our compaction strategies and how we throllet, some things happen on the disk, how the big impact on the disk utilization in our cluster.<br /><strong>ction Strategy DTCS </strong> for the time series data model.<br />Adjust compaction is to look at the impact of min/max SSTable thresholds.<br />Understand the connecUnderstand compaction strategies: there are 3 currently (<strong>SizeTiered Compaction Strategy STCS, DateTiered Compaction Strategy DTCS, Leveled Compaction Strategy LTCS</strong>)<br />For write and intensive workload, <strong>SizeTiered Compaction Strategy STCS</strong> can be the best.<br /><strong>Leveled Compaction Strategy</strong> is generaly recommended for read and workload only if we are using SSDs.<br /><strong>DateTiered Compa</strong>tion between number of SSTables and compaction as it affects performance<br />See what options are available for compaction to improve performance<br />How do tombstones affect compaction?<br />Compaction evicts tombstones and removes deleted data while consolidating multiple SSTables into one. More tombstones means more time spent during compaction of SSTables.<br />Once a column is marked with a tombstone, it will continue to exist until compaction permanently deletes the cell. Note that if a node is down longer than gc_grace_seconds and brought back online, it can result in replication of deleted data -zombie!<br />To prevent issues, repair must be done on every node on a regular basis.</p><h4>Best practices: nodes should be repaired every 7 days when gc_grace_seconds is 10 days (the default setting)</h4><h4><strong>How data modelling affects tombstones?</strong></h4><p>If a data model requires a lot of of deletes within a partition of data then a lot of tombstones are created. Tombstones identify stale data awaiting deletion – which data will have to be read until it is removed by compaction.<br />More effective data modelling will alleviate this issue. Ensure that your data model is more likely to delete whole partitions, rather than columns from a partition.<br />The data model has a significant impact on performance. Careful data modelling will avoid the pitfalls of rampant tombstones that affect read performance.<br />Tombstones are normal writes but will not otherwise affect write performance.<br />If you know you do a lot of delete you discover to do a long delete which will affect read performance we probably need to do something to fix it, example changing a data model or the way our workload use this data model.</p><h4>Using nodetool compactionstats to investigate issues:</h4><p>This tool can be used to discover compaction statistics while compaction occurs. Reports how much still needs compacting and the total amount of data getting compacted.<br />But by using CQL tracing to investigate issues, we can see how many nodes and partitions are accessed.<br />The number of tombstones will be shown.<br />The read access time can be observed as decreasing after a compaction is complete. It can also be seen to take longer while a compaction is in progress.<br />Why is a durable queue an anti-pattern that can cause compaction issues?<br />A lot people for another reason like to use Cassandra for durable queue because of this problem of tombstones, this is a use anti pattern. Generally what happen, only reading and delete on the same place then read performance can grow up. If you try to corrige a queue use a queue, use something like KAFKA that the perfect durable queue. Cassandra is not a tool to use as a queue.</p><h4>How do disk choices affect compactions issues?</h4><p>Compaction is the most disk IO with intensive operation of Cassandra performance.so having a good disks has a good positive affect on it. Conversely, have a slow disks can have a very detrimental effect on it. When you have compaction going very slow, you can increase SSTable for read then read performance can suffer.<br />Use nodetool cfhistograms to look at the read performance.<br /><strong> QUIZ:</strong><br />Which of the following compaction strategies should be used for read-heavy workloads, assuming certain hardware conditions are met? Leveled Compaction<br />What tool, command, or setting can be used to investigate issues with tombstones? CQL tracing<br />Compaction can potentially utilize not only a significant amount of disk I/O, but also disk space as well.true</p><h4>Disk Tuning: Easy Wins and Conclusion:</h4><p>To end this paper, we need to revisit:<br />• Performance tuning methodology<br />• Outline easy performance tuning wins<br />• Outline Cassandra or environment anti-patters</p><h4>How does this all fit together?</h4><p>1. We need to understand performance and Cassandra at the high level. That is general performance tuning techniques and some of the terminology as well as how Cassandra itself works.<br />2. Collect performance data on the following things to know where to look for data and what that data means in term of tuning or isolation problem:<br />• workload and data model<br />• cluster and nodes<br />• operating system and hardware<br />• disk and compaction strategies.<br />3. Parsing the information gathered and begin formulating a plan:<br />• Based on metrics collected, where are the bottlenecks?<br />• What tools are available to fix issues that come up?<br />4. Apply solutions to any/all areas required and test solutions:<br />• Using tools and knowledge gained, apply solutions, test solutions applied and start cycle again as needed.</p><h4>Question: What was that performance tuning methodology again?</h4><p>We have:<br /><strong>Active performance tuning – suspect there’s a problem?</strong><br />• Determine if problem is in Cassandra, environment or both.<br />• Isolate problem using tool provided.<br />• Verify problems and test for reproductibility.<br />• Fix problems using tuning strategies provided.<br />• Test, test, and test again.<br />• Verify that your “fixes” did not introduce additional problems.<br /><strong>Passive performance tuning – regular system “sanity checks”</strong><br />• Regularly monitor key health areas in Cassandra / environment using tools provided.<br />• Identify and tune for future growth/scalability.<br />• Apply tuning strategies as needed.<br />• Periodically apply the USE Method for system health check.</p><h4>Easy Cassandra performance tuning wins:</h4><p>• Increase flushwriters, if blocked: Flushing memTable to sstables, if we look in nodetool tpstats tool ,we will see flushwriters regularly getting blocked and we only have one flushwriters which is common on system which has only one disk, we can increase it to 2 that can resolve the problem.<br />• Decrease concurrent compactors: we see lot of people to set their concurrent compactors to high, we recommend at 2 watch for CPU saturation. And if saturated, we can drop it to 1 which will make compaction single thread that is by default.<br />• Increase concurrent reads and writes appropriately: Write is very affecting by CPU then read is very affecting by disk (the kind of disks, the number of disks available) so adjust concurrent reads and writes appropriately.<br />• Nudge Cassandra to leverage OS cache to read based workloads: nudge means more RAM so the more we can read in RAM, better will be the performance.<br />• In cloud environment, sometimes we need to increase phi_convict_threshold for cloud deployments or those with bad network connectivity.<br />• Increase compaction_throughput if disk I/O is available and compactions are falling behind: take the default for 15 MB/s, if we have lot of disk IO available, just increase this, then the compaction will complete quickly.<br />• Increase streaming_throughput to increase the pace of streaming:the default is 200MB/s when we bring the node online and the last repair but if we want to bring the node online faster, this is the parameter which we can increase.<br />• In terme of tuning data model: if we have a time series data modelling, we can learn a lot of by reading the link: <a href=\"http://planetcassandra.org/blog/getting-started-with-time-series-data-modeling/\" rel=\"nofollow\">http://planetcassandra.org/blog/getting-started-with-time-series-data-modeling/</a><br />• Avoid creating more than 500 tables in Cassandra: if it is empty, these tables take at least 1MB of space on heap.<br />• Keep wide rows under 100 MB or 100000 columns: we remember the in memory compaction by default has 15 MB, that the reason, it is a bad idea to have wide rows below 100 MB.<br />• Leverage wide rows instead of collections for high granularity items: sometimes, people have partition that contains lot of list, for that, it is recommended to use clustering column.<br />• Avoid data modelling hotspots by choosing a partition key that ensures read/write workload is spread across cluster: try to find the right partition key, not to large partition key not to small.<br />• Avoid tombstone build up by leveraging append only techniques.<br />You can read also these documents about tombstones:<br /><a href=\"http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html\" rel=\"nofollow\">http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html</a><br /><a href=\"http://www.jsravn.com/2015/05/13/cassandra-tombstones-collections.html\" rel=\"nofollow\">http://www.jsravn.com/2015/05/13/cassandra-tombstones-collections.html</a><br />You can view this video about tombstones:<br /><a href=\"https://www.youtube.com/watch?v=olTsTxpBFqc&amp;feature=youtu.be&amp;t=270\" rel=\"nofollow\">https://www.youtube.com/watch?v=olTsTxpBFqc&amp;feature=youtu.be&amp;t=270</a><br />• Use DESC sort to minimize impact of tombstones, if I do a descending sort by trying to get recent data.<br />You can read also this document:<br /><a href=\"http://www.sestevez.com/range-tombstones/\" rel=\"nofollow\">http://www.sestevez.com/range-tombstones/</a><br />• Use inverted indexes to help where data duplication or nesting is not appropriate.<br />• Use DataStax drivers to ensure coordinator workload is spread evenly across cluster.<br />• Use the token Aware load balancing policy: allows you to go directly to the data of the data by avoiding to go to the coordinator.<br />• Use Prepared Statements (where appropriate), if you do a query a lot of time, you will see performance gain if you use that.<br />• Put OpsCenter’s database on a dedicated cluster.<br />• Size the cluster for peak anticipated workload: for example for load balancing or bench marking.<br />• Use a 10G network between nodes to avoid network bottlenecks.<br />• On JVM hand memery size, the number one is RAM and the CPU if we want Garbage Collection to run faster, so ensure there is adequate RAM to keep active data in memory.<br />• Understand how heap allocation affects performance.<br />• Look at how the key cache affects performance in memory.<br />• Understand bloom filters and their impact in memory: we can tune it on the table by the false positive bloom filter to eliminate unnecessary disk seek.<br />• Disable swap: swap can cause problem and it will be very very difficult to reproduce, we have to disable it by the command:</p><pre>sudo swapoff –a</pre><p> • Remove all swap files on /etc/fstab by:</p><pre>sed -i 's/^\\(.*swap\\)/#\\1/'  /etc/fstab</pre><p> • Look at the impact of memtables on performance: by default, memtables take records by heap then the more memtables we have, the more flushing them on the disk, then the more disk IO, so understanding, how many we have, how many they are flushing</p><h4>What are some Cassandra/environment anti-patterns?</h4><p>1. Network attached storage. Bottlenecks include: when you put Cassandra on the SAN, it like you are putting it on the top on storage system also Cassandra has sequential pattern of reads and writes. Advice is don’t use SAN plus it will be cheaper.<br />Use Cassandra on the SAN will increase network latency to all of operations.<br />• Router latency<br />• Network Interface Card (NIC)<br />• NIC in the NAS device<br />2. Shared network file systems.<br />3. Excessive heap space size: that can cause the JVM pause time very high because running it memory takes time.<br />• Can impair the JVM’s ability to perform fluid garbage collection.<br />4. Load balancers: don’t put load balancers between applications in Cassandra because Cassandra has the load balancer built into the drivers.<br />5. Queues and queue-like datasets: don’t use Cassandra like a queue.<br />• Deletes do not remove rows/columns immediately.<br />• Can cause overhead with RAM/disk because of tombstones.<br />• Can affect read performance if data not modelled well.</p>","id":"09f92fe1-9ee7-57a9-90d5-775990c3f999","title":"Cassandra Operations and Performance Tuning","origin_url":"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/","url":"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/","wallabag_created_at":"2019-12-02T18:42:19+00:00","published_at":"2015-10-10T07:44:25+00:00","published_by":"['']","reading_time":53,"domain_name":"dmngaya.wordpress.com","preview_picture":"https://i0.wp.com/dmngaya.wordpress.com/wp-content/uploads/2016/07/wallpaper-nature-hd-2.jpg?fit=1200%2C675&ssl=1","tags":["cassandra","troubleshooting and tuning","performance"],"description":"In this topic, i will  cover the basics of general Apache Cassandra performance tuning: when to do performance tuning, how to avoid and identify problems, and methodologies to improve.When do you need..."},{"content":"<p id=\"f66d\" class=\"hr hs cw bi ht b hu is hw it hy iu ia iv ic iw ie\">You can enable row caching. It can avoid significant READ load on your disks. First, in cassandra.yaml, define the amount of heap space you want to dedicate to row caching. Then you can activate it per table to set how much data to fit per partition key in the row cache.</p><p id=\"b882\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">Of course, row cache is better suited with read intensive, size limited tables. Eg: <a href=\"http://james.apache.org/\" class=\"fa cn jm jn jo jp\" target=\"_blank\" rel=\"noopener nofollow\">on James</a>, this perfectly fits our denormalized metadata tables.</p><p id=\"f147\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">War story: You need to reboot the node when enabling row-cache though <strong class=\"ht ix\">row_cache_size_in_mb </strong>cassandra.yaml configuration file. nodetool will not be enough.</p><p id=\"0203\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">Once this (annoying) configuration parameter is enabled, you can use CQL per table to enable row cache:</p><div><p id=\"7beb\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">Avoid reads on SSTables, and thus on disks can not be always avoided. In this case, we want to limit the amount of I/Os to the maximum.</p><p id=\"abff\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">Our problem is that writes to given key might be spread across several SSTables. For instance, this is the case for tables with many updates, deletes or with Clustering keys.</p><p id=\"6058\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">We can hopefully trade <a href=\"http://cassandra.apache.org/doc/latest/operating/compaction.html\" class=\"fa cn jm jn jo jp\" target=\"_blank\" rel=\"noopener nofollow\">compaction</a> time and I/Os against read efficiency. The idea is to switch compaction algorithm from <strong class=\"ht ix\">Size Tiered Compaction Strategy </strong>to <strong class=\"ht ix\">Levelled Compaction Strategy</strong>. Levelled compaction strategy will by its structure limit the number of SSTables a given key can belong to. Of course, read intensive tables, with updates, deletes or partition keys will benefit a lot from this change. Avoid it on immutable, not clustered tables, as you will not get collocation benefits, but will pay the costs of more expensive compactions.</p><p id=\"48cc\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\"><strong class=\"ht ix\">Date Tiered Compaction Strategy</strong> stores data written within a certain period of time in the same SSTable. It’s very useful for time series of tables implying the use of TTL, where entries expire.</p><p id=\"410d\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">Updating the compaction strategy of your tables can be done without downtime, at the cost of running compactions. Warning: this might consume IO and memories, and thus decrease performances when the compaction is running.</p><p id=\"a94b\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">You need to modify the CQL table declaration to change the compaction strategy:</p><blockquote class=\"jw jx jy\"><p id=\"9221\" class=\"hr hs cw jz ht b hu hv hw hx hy hz ia ib ic id ie\">use apache_james ;<br />ALTER TABLE modseq <br />WITH compaction = { ‘class’ : ‘LeveledCompactionStrategy’ };</p></blockquote><p id=\"d791\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">For the changes to take effect, you need to compact the SSTables. To force this, you need to use nodetool:</p><blockquote class=\"jw jx jy\"><p id=\"67dd\" class=\"hr hs cw jz ht b hu hv hw hx hy hz ia ib ic id ie\">nodetool compact keyspace table</p></blockquote><p id=\"2372\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">Be careful not to ommit keyspace or table, if you do not want to trigger a global compaction…</p><p id=\"bd37\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">For the following compaction on large tables, you can use:</p><blockquote class=\"jw jx jy\"><p id=\"1afa\" class=\"hr hs cw jz ht b hu hv hw hx hy hz ia ib ic id ie\">nodetool compactionstats</p></blockquote><p id=\"cdcd\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">The rule of thumb for compaction time estimate is, with our hardware (16GB, HDD), approximatively one hour per GB stored on the table.</p><figure class=\"gp gq gr gs gt gf hl iz ck ja jb jc jd je bv jf jg jh ji jj jk paragraph-image\"><h2 id=\"b883\" class=\"kb ig cw bi au av kc kd ke kf kg kh ki kj kk kl km\">Bloom filters</h2><p id=\"5b98\" class=\"hr hs cw bi ht b hu is hw it hy iu ia iv ic iw ie\">If you have a high false positive rate, then you might consider increasing the memory dedicated to bloom filters. Again, this parameter <a href=\"http://docs.datastax.com/en/archived/cassandra/1.2/cassandra/operations/ops_tuning_bloom_filters_c.html\" class=\"fa cn jm jn jo jp\" target=\"_blank\" rel=\"noopener nofollow\">can be set per table</a>. I will not detail it here as it was not a problem for us.</p><h2 id=\"399b\" class=\"kb ig cw bi au av kc kd ke kf kg kh ki kj kk kl km\">Compression</h2><p id=\"0364\" class=\"hr hs cw bi ht b hu is hw it hy iu ia iv ic iw ie\"><a href=\"http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_about_config_compress_c.html\" class=\"fa cn jm jn jo jp\" target=\"_blank\" rel=\"noopener nofollow\">Compression</a> leads to less data being read and less data being written. If I/O bound, this is a nice trade-off. It turns out default behaviour is to compress SSTables by chunks using LZ4. Of course, this can be tuned on the table level.</p><h2 id=\"118d\" class=\"kb ig cw bi au av kc kd ke kf kg kh ki kj kk kl km\"><strong class=\"bu\">Commitlog</strong></h2><p id=\"c4b2\" class=\"hr hs cw bi ht b hu is hw it hy iu ia iv ic iw ie\">An obvious tip is to store your commitlog on a different disk than the SSTables. Thus I/Os are shared across disk.</p><p id=\"5881\" class=\"hr hs cw bi ht b hu is hw it hy iu ia iv ic iw ie\">As a conclusion, Cassandra offers a data model that can be optimised in details. It requires knowing well the way Cassandra works. It also demands knowing well your data model. But significant improvments can be gained.</p><p id=\"7472\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">By applying these configuration changes on the table level, we achieved a x3 reduction in read latencies on some of our read intensive tables. We have a 62% row cache hit rate. And our Cassandra seems now to handle read load better. This tuning session has been both instructive, and promising. We even now have identified new room for improvements, for example on blob storage. Note that with a bad schema, good performances can not be achieved.</p><figure class=\"gp gq gr gs gt gf hl iz ck ja jb jc jd je bv jf jg jh ji jj jk paragraph-image\"><figcaption class=\"bm ey hm hn ho do dm dn hp hq au ex\">You don’t want your RAM to end up like this!</figcaption></figure><p id=\"a6e4\" class=\"hr hs cw bi ht b hu hv hw hx hy hz ia ib ic id ie\">Also please note, that as Cassandra is a JVM application, single node performance is also impacted by your <strong class=\"ht ix\">Garbage Collection </strong>settings. We decided not to cover this aspect of Cassandra configuration, as we did not have enough free memory to switch to the G1 garbage collector, and we would have ended describing minor settings. You can read this <a href=\"https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html\" class=\"fa cn jm jn jo jp\" target=\"_blank\" rel=\"noopener nofollow\">blog post</a>, wich covers the topic pretty well.</p></figure></div>","id":"9a05fe53-d7a3-52f3-8936-ba3a25f01f25","title":"Tuning Cassandra performances","origin_url":"https://medium.com/linagora-engineering/tunning-cassandra-performances-7d8fa31627e3","url":"https://medium.com/linagora-engineering/tunning-cassandra-performances-7d8fa31627e3","wallabag_created_at":"2019-12-02T14:56:48+00:00","published_at":"2017-10-09T01:41:36+00:00","published_by":"['']","reading_time":3,"domain_name":"medium.com","preview_picture":"https://miro.medium.com/v2/resize:fit:1200/1*IHRDtx285TH0clzHAkQIzQ.jpeg","tags":["cassandra","troubleshooting and tuning","performance"],"description":"You can enable row caching. It can avoid significant READ load on your disks. First, in cassandra.yaml, define the amount of heap space you want to dedicate to row caching. Then you can activate it pe..."},{"content":"<pre class=\"language-bash\" data-lang=\"bash\">$ bin/tlp-stress\nUsage: tlp-stress [options] [command] [command options]\n  Options:\n    --help, -h\n      Shows this help.\n      Default: false\n  Commands:\n    run      Run a tlp-stress profile\n      Usage: run [options]\n        Options:\n          --cl\n            Consistency level for reads/writes (Defaults to LOCAL_ONE).\n            Default: LOCAL_ONE\n            Possible Values: [ANY, ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE]\n          --compaction\n            Compaction option to use.  Double quotes will auto convert to\n            single for convenience.  A shorthand is also available: stcs, lcs,\n            twcs.  See the full documentation for all possibilities.\n            Default: &lt;empty string&gt;\n          --compression\n            Compression options\n            Default: &lt;empty string&gt;\n          --concurrency, -c\n            Concurrent queries allowed.  Increase for larger clusters.\n            Default: 100\n          --coordinatoronly, --co\n            Coordinator only made.  This will cause tlp-stress to round robin\n            between nodes without tokens.  Requires using -Djoin_ring=false in\n            cassandra-env.sh.  When using this option you must only provide a\n            coordinator to --host.\n            Default: false\n          --cql\n            Additional CQL to run after the schema is created.  Use for DDL\n            modifications such as creating indexes.\n            Default: []\n          --csv\n            Write metrics to this file in CSV format.\n            Default: &lt;empty string&gt;\n          --dc\n            The data center to which requests should be sent\n            Default: &lt;empty string&gt;\n          --drop\n            Drop the keyspace before starting.\n            Default: false\n          --duration, -d\n            Duration of the stress test.  Expressed in format 1d 3h 15m\n            Default: 0\n          --field.\n            Override a field's data generator.  Example usage:\n            --field.tablename.fieldname='book(100,200)'\n            Syntax: --field.key=value\n            Default: {}\n          -h, --help\n            Show this help\n          --host\n            Default: 127.0.0.1\n          --id\n            Identifier for this run, will be used in partition keys.  Make\n            unique for when starting concurrent runners.\n            Default: 001\n          --iterations, -i, -n\n            Number of operations to run.\n            Default: 0\n          --keycache\n            Key cache setting\n            Default: ALL\n          --keyspace\n            Keyspace to use\n            Default: tlp_stress\n          --paging\n            Override the driver's default page size.\n          --partitiongenerator, --pg\n            Method of generating partition keys.  Supports random, normal\n            (gaussian), and sequence.\n            Default: random\n          --partitions, -p\n            Max value of integer component of first partition key.\n            Default: 1000000\n          --password, -P\n            Default: cassandra\n          --populate\n            Pre-population the DB with N rows before starting load test.\n            Default: 0\n          --port\n            Override the cql port. Defaults to 9042.\n            Default: 9042\n          --prometheusport\n            Override the default prometheus port.\n            Default: 9500\n          --rate\n            Rate limiter, accepts human numbers. 0 = disabled\n            Default: 0\n          --readrate, --reads, -r\n            Read Rate, 0-1.  Workloads may have their own defaults.  Default\n            is dependent on workload.\n          --replication\n            Replication options\n            Default: {'class': 'SimpleStrategy', 'replication_factor':3 }\n          --rowcache\n            Row cache setting\n            Default: NONE\n          --ssl\n            Enable SSL\n            Default: false\n          --threads, -t\n            Threads to run\n            Default: 1\n          --ttl\n            Table level TTL, 0 to disable.\n            Default: 0\n          --username, -U\n            Default: cassandra\n          --workload., -w.\n            Override workload specific parameters.\n            Syntax: --workload.key=value\n            Default: {}\n    info      Get details of a specific workload.\n      Usage: info\n    list      List all workloads.\n      Usage: list\n    fields      null\n      Usage: fields</pre>","id":"7c01b35f-b29c-5158-9c81-3d8c5d4df5f4","title":"tlp-stress","origin_url":"http://thelastpickle.com/tlp-stress/","url":"http://thelastpickle.com/tlp-stress/","wallabag_created_at":"2019-08-16T16:08:58+00:00","published_at":null,"published_by":null,"reading_time":2,"domain_name":"thelastpickle.com","preview_picture":null,"tags":["stress","performance","cassandra","cassandra.stress"],"description":"$ bin/tlp-stress\nUsage: tlp-stress [options] [command] [command options]\n  Options:\n    --help, -h\n      Shows this help.\n      Default: false\n  Commands:\n    run      Run a tlp-stress profile\n      U..."},{"content":"<iframe id=\"video\" width=\"480\" height=\"270\" src=\"https://www.youtube.com/embed/swL7bCnolkU?feature=oembed\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\">[embedded content]</iframe>","id":"a12aecc1-b8d3-50a9-8725-d487a9d21f96","title":"10 Easy Ways to Tune Your Cassandra Cluster with John Haddad | DataStax Accelerate 2019","origin_url":"http://www.youtube.com/oembed?format=xml&url=https://www.youtube.com/watch?v=swL7bCnolkU","url":"http://www.youtube.com/oembed?format=xml&url=https://www.youtube.com/watch?v=swL7bCnolkU","wallabag_created_at":"2019-08-16T15:56:52+00:00","published_at":null,"published_by":null,"reading_time":null,"domain_name":"www.youtube.com","preview_picture":"https://i.ytimg.com/vi/swL7bCnolkU/maxresdefault.jpg","tags":["cassandra","performance","video"],"description":"[embedded content]"},{"content":"<p><em>This post is part 2 of a 3-part series about monitoring Apache Cassandra. <a href=\"https://www.datadoghq.com/blog/how-to-monitor-cassandra-performance-metrics/\">Part 1</a> is about the key performance metrics available from Cassandra, and <a href=\"https://www.datadoghq.com/blog/monitoring-cassandra-with-datadog/\">Part 3</a> details how to monitor Cassandra with Datadog.</em></p><p>If you’ve already read <a href=\"https://www.datadoghq.com/blog/how-to-monitor-cassandra-performance-metrics/\">our guide</a> to key Cassandra metrics, you’ve seen that Cassandra provides a vast array of metrics on performance and resource utilization, which are available in a number of different ways. This post covers several different options for collecting Cassandra metrics, depending on your needs.</p><p>Like Solr, Tomcat, and other Java applications, Cassandra exposes metrics on availability and performance via JMX (Java Management Extensions). Since version 1.1, Cassandra’s metrics have been based on Coda Hale’s popular <a href=\"https://github.com/dropwizard/metrics\">Metrics library</a>, for which there are numerous integrations with graphing and monitoring tools. There are at least three ways to view and monitor Cassandra metrics, from lightweight but limited utilities to full-featured, hosted services:</p><ul><li><a href=\"#collecting-metrics-with-nodetool\">nodetool</a>, a command-line interface that ships with Cassandra</li><li><a href=\"#collecting-metrics-with-jconsole\">JConsole</a>, a GUI that ships with the Java Development Kit (JDK)</li><li><a href=\"#collecting-metrics-via-jmxmetrics-integrations\">JMX/Metrics integrations</a> with external graphing and monitoring tools and services</li></ul><p>Nodetool is a command-line utility for managing and monitoring a Cassandra cluster. It can be used to manually trigger compactions, to flush data in memory to disk, or to set parameters such as cache size and compaction thresholds. It also has several commands that return simple node and cluster metrics that can provide a quick snapshot of your cluster’s health. Nodetool ships with Cassandra and appears in Cassandra’s <code>bin</code> directory.</p><p>Running <code>bin/nodetool</code> status from the directory where you installed Cassandra outputs an overview of the cluster, including the current <strong>load</strong> on each node and whether the individual nodes are up or down:</p><pre>$ bin/nodetool status\nDatacenter: datacenter1\n=======================\nStatus=Up/Down\n|/ State=Normal/Leaving/Joining/Moving\n--  Address    Load       Owns    Host ID   Token                  Rack\nUN  127.0.0.1  14.76 MB   66.7%   9e524995  -9223372036854775808   rack1\nUN  127.0.0.1  14.03 MB   66.7%   12e12ead  -3074457345618258603   rack1\nUN  127.0.0.1  13.92 MB   66.7%   44387d08   3074457345618258602   rack1\n</pre><p><code>nodetool info</code> outputs slightly more detailed statistics for an individual node in the cluster, including uptime, <strong>load</strong>, <strong>key cache hit rate</strong>, and a total count of all <strong>exceptions</strong>. You can specify which node you’d like to inspect by using the <code>--host</code> argument with an IP address or hostname:</p><pre>$ bin/nodetool --host 127.0.0.1 info \nID                     : 9aa4fe41-c9a8-43bb-990a-4a6192b3b46d\nGossip active          : true\nThrift active          : false\nNative Transport active: true\nLoad                   : 14.76 MB\nGeneration No          : 1449113333\nUptime (seconds)       : 527\nHeap Memory (MB)       : 158.50 / 495.00\nOff Heap Memory (MB)   : 0.07\nData Center            : datacenter1\nRack                   : rack1\nExceptions             : 0\nKey Cache              : entries 26, size 2.08 KB, capacity 24 MB, 87 hits, 122 requests, 0.713 recent hit rate, 14400 save period in seconds\nRow Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds\nCounter Cache          : entries 0, size 0 bytes, capacity 12 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds\nToken                  : -9223372036854775808\n</pre><p><code>nodetool cfstats</code> provides statistics on each keyspace and column family (akin to databases and database tables, respectively), including <strong>read latency</strong>, <strong>write latency</strong>, and <strong>total disk space used</strong>. By default nodetool prints statistics on all keyspaces and column families, but you can limit the query to a single keyspace by appending the name of the keyspace to the command:</p><pre>$ bin/nodetool cfstats demo\nKeyspace: demo\n    Read Count: 4\n    Read Latency: 1.386 ms.\n    Write Count: 4\n    Write Latency: 0.71675 ms.\n    Pending Flushes: 0\n        Table: users\n        SSTable count: 3\n        Space used (live), bytes: 16178\n        Space used (total), bytes: 16261\n        ...\n        Local read count: 4\n        Local read latency: 1.153 ms\n        Local write count: 4\n        Local write latency: 0.224 ms\n        Pending flushes: 0\n        ...\n</pre><p><code>nodetool compactionstats</code> <a href=\"https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCompactionStats.html\">shows</a> the compactions in progess as well as a count of <strong>pending compaction tasks</strong>.</p><pre>$ bin/nodetool compactionstats\npending tasks: 5\n          compaction type        keyspace           table       completed           total      unit  progress\n               Compaction       Keyspace1       Standard1       282310680       302170540     bytes    93.43%\n               Compaction       Keyspace1       Standard1        58457931       307520780     bytes    19.01%\nActive compaction remaining time :   0h00m16s\n</pre><p><code>nodetool gcstats</code> returns statistics on garbage collections, including total <strong>number of collections</strong> and <strong>elapsed time</strong> (both the total and the max elapsed time). The counters are reset each time the command is issued, so the statistics correspond only to the interval between <code>gcstats</code> commands.</p><pre>$ bin/nodetool gcstats\nInterval (ms)  Max GC Elapsed (ms)  Total GC Elapsed (ms)  Stdev GC Elapsed (ms)  GC Reclaimed (MB)  Collections  Direct Memory Bytes\n     73540574                   64                    595                      7         3467143560           83             67661338\n</pre><p><code>nodetool tpstats</code> provides usage statistics on Cassandra’s thread pool, including <strong>pending tasks</strong> as well as current and historical <strong>blocked tasks</strong>.</p><pre>$ bin/nodetool tpstats\nPool Name                    Active   Pending      Completed   Blocked  All time blocked\nReadStage                         0         0          11801         0                 0\nMutationStage                     0         0         125405         0                 0\nCounterMutationStage              0         0              0         0                 0\nGossipStage                       0         0              0         0                 0\nRequestResponseStage              0         0              0         0                 0\nAntiEntropyStage                  0         0              0         0                 0\nMigrationStage                    0         0             10         0                 0\nMiscStage                         0         0              0         0                 0\nInternalResponseStage             0         0              0         0                 0\nReadRepairStage                   0         0              0         0                 0\n</pre><h2 id=\"collecting-metrics-with-jconsole\">Collecting metrics with JConsole</h2><p>JConsole is a simple Java GUI that ships with the Java Development Kit (JDK). It provides an interface for exploring the full range of metrics Cassandra provides via JMX. If the JDK was installed to a directory in your system path, you can start JConsole simply by running:</p><pre>jconsole\n</pre><p>Otherwise it can be found in <code>your_JDK_install_dir/bin</code></p><p>To pull up metrics in JConsole, you can select the relevant local process or monitor a remote process using the node’s IP address (Cassandra uses port 7199 for JMX by default):</p><div class=\"shortcode-wrapper shortcode-img expand\"><figure class=\"\"><a href=\"https://datadog-prod.imgix.net/img/blog/how-to-collect-cassandra-metrics/jconsole-3.png?fit=max\" class=\"pop\"><img class=\"lazyload\" srcset=\"https://datadog-prod.imgix.net/img/blog/how-to-collect-cassandra-metrics/jconsole-3.png?auto=format&amp;fit=max&amp;w=847\" alt=\"cassandra metrics\" src=\"https://www.datadoghq.com/blog/how-to-collect-cassandra-metrics/src\" /></a></figure></div><p>The MBeans tab brings up all the JMX paths available:</p><div class=\"shortcode-wrapper shortcode-img expand\"><figure class=\"\"><a href=\"https://datadog-prod.imgix.net/img/blog/how-to-collect-cassandra-metrics/jmx-metrics.png?fit=max\" class=\"pop\"><img class=\"lazyload\" srcset=\"https://datadog-prod.imgix.net/img/blog/how-to-collect-cassandra-metrics/jmx-metrics.png?auto=format&amp;fit=max&amp;w=847\" alt=\"cassandra metrics\" src=\"https://www.datadoghq.com/blog/how-to-collect-cassandra-metrics/src\" /></a></figure></div><p>Out of the box, <code>org.apache.cassandra.metrics</code> (based on the <a href=\"https://github.com/dropwizard/metrics\">Metrics</a> library) provides almost all of the metrics that you need to monitor a Cassandra cluster. (See the first footnote on the table below for exceptions.) Prior to Cassandra 2.2, many identical or similar metrics were also available via alternate JMX paths (<code>org.apache.cassandra.db</code>, <code>org.apache.cassandra.internal</code>, etc.), which, while still usable in some versions, reflect an older structure that has been deprecated. Below are modern JMX paths, which mirror the JConsole interface’s folder structure, for the key metrics described in this article:</p><table><thead></thead><tbody><tr class=\"odd\"><td>Throughput (writes|reads)</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=ClientRequest,scope=(Write|Read),name=Latency</code><br /><code>Attribute: OneMinuteRate</code></td></tr><tr class=\"even\"><td>Latency (writes|reads)*</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=ClientRequest,scope=(Write|Read),name=TotalLatency</code><br /><code>Attribute: Count</code><p><code>org.apache.cassandra.metrics:</code><br /><code>type=ClientRequest,scope=(Write|Read),name=Latency</code><br /><code>Attribute: Count</code></p></td></tr><tr class=\"odd\"><td>Key cache hit rate*</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=Cache,scope=KeyCache,name=Hits</code><br /><code>Attribute: Count</code><p><code>org.apache.cassandra.metrics:</code><br /><code>type=Cache,scope=KeyCache,name=Requests</code><br /><code>Attribute: Count</code></p></td></tr><tr class=\"even\"><td>Load</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=Storage,name=Load</code><br /><code>Attribute: Count</code></td></tr><tr class=\"odd\"><td>Total disk space used</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=ColumnFamily,keyspace=(KeyspaceName),scope=(ColumnFamilyName),name=TotalDiskSpaceUsed</code><br /><code>Attribute: Count</code></td></tr><tr class=\"even\"><td>Completed compaction tasks</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=Compaction,name=CompletedTasks</code><br /><code>Attribute: Value</code></td></tr><tr class=\"odd\"><td>Pending compaction tasks</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=Compaction,name=PendingTasks</code><br /><code>Attribute: Value</code></td></tr><tr class=\"even\"><td>ParNew garbage collections (count|time)</td><td><code>java.lang:</code><br /><code>type=GarbageCollector,name=ParNew</code><br /><code>Attribute: (CollectionCount|CollectionTime)</code></td></tr><tr class=\"odd\"><td>CMS garbage collections (count|time)</td><td><code>java.lang:</code><br /><code>type=GarbageCollector,name=ConcurrentMarkSweep</code><br /><code>Attribute: (CollectionCount|CollectionTime)</code></td></tr><tr class=\"even\"><td>Exceptions</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=Storage,name=Exceptions</code><br /><code>Attribute: Count</code></td></tr><tr class=\"odd\"><td>Timeout exceptions (writes|reads)</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=ClientRequest,scope=(Write|Read),name=Timeouts</code><br /><code>Attribute: Count</code></td></tr><tr class=\"even\"><td>Unavailable exceptions (writes|reads)</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=ClientRequest,scope=(Write|Read),name=Unavailables</code><br /><code>Attribute: Count</code></td></tr><tr class=\"odd\"><td>Pending tasks (per stage)**</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=ThreadPools,path=request,scope=(CounterMutationStage|MutationStage|ReadRepairStage|ReadStage|RequestResponseStage), name=PendingTasks</code><br /><code>Attribute: Value</code></td></tr><tr class=\"even\"><td>Currently blocked tasks**</td><td><code>org.apache.cassandra.metrics:</code><br /><code>type=ThreadPools,path=request,scope=(CounterMutationStage|MutationStage|ReadRepairStage|ReadStage|RequestResponseStage), name=CurrentlyBlockedTasks</code><br /><code>Attribute name: Count</code></td></tr></tbody></table><p>* The metrics needed to monitor recent latency and key cache hit rate are available in JConsole, but must be calculated from two separate metrics. For read latency, to give an example, the relevant metrics are ReadTotalLatency (cumulative read latency total, in microseconds) and the “Count” attribute of ReadLatency (the number of read events). For two readings at times 0 and 1, the recent read latency would be calculated from the deltas of those two metrics:</p><pre>(ReadTotalLatency1−ReadTotalLatency0)/(ReadLatency1−ReadLatency0)\n</pre><p>** There are five different request stages in Cassandra, plus roughly a dozen internal stages, each with its own thread pool metrics.</p><h2 id=\"collecting-metrics-via-jmx-metrics-integrations\">Collecting metrics via JMX/Metrics integrations</h2><p>Nodetool and JConsole are both lightweight and can provide metrics snapshots very quickly, but neither are well suited to the kinds of big-picture questions that arise in a production environment: What are the long-term trends for my metrics? Are there any large-scale patterns I should be aware of? Do changes in performance metrics tend to correlate with actions or events elsewhere in my environment?</p><p>To answer these kinds of questions, you need a more sophisticated monitoring system. The good news is, virtually every major monitoring service and tool supports Cassandra monitoring, whether via <a href=\"http://docs.datadoghq.com/integrations/java/\">JMX</a> plugins; via pluggable <a href=\"http://wiki.apache.org/cassandra/Metrics#Reporting\">Metrics reporter libraries</a>; or via <a href=\"https://github.com/jmxtrans/jmxtrans\">connectors</a> that write JMX metrics out to StatsD, Graphite, or other systems.</p><p>The configuration steps depend greatly on the particular monitoring tools you choose, but both JMX and Metrics expose Cassandra metrics using the taxonomy outlined in the table of JMX paths above.</p><h2>Conclusion</h2><p>In this post we have covered a few of the ways to access Cassandra metrics using simple, lightweight tools. For production-ready monitoring, you will likely want a more powerful monitoring system that ingests Cassandra metrics as well as key metrics from all the other technologies in your stack.</p><p>At Datadog, we have developed a Cassandra integration so that you can start collecting, graphing, and alerting on metrics from your cluster with a minimum of overhead. For more details, check out our guide to <a href=\"https://www.datadoghq.com/blog/monitoring-cassandra-with-datadog/\">monitoring Cassandra metrics with Datadog</a>, or get started right away with a <a href=\"#\" class=\"sign-up-trigger\">free trial</a>.</p><hr /><p><em>Source Markdown for this post is available <a href=\"https://github.com/DataDog/the-monitor/blob/master/cassandra/how_to_collect_cassandra_metrics.md\">on GitHub</a>. Questions, corrections, additions, etc.? Please <a href=\"https://github.com/DataDog/the-monitor/issues\">let us know</a>.</em></p>","id":"758bc758-7efe-5f1c-9dc6-a58eea511aab","title":"How to collect Cassandra metrics","origin_url":"https://www.datadoghq.com/blog/how-to-collect-cassandra-metrics/","url":"https://www.datadoghq.com/blog/how-to-collect-cassandra-metrics/","wallabag_created_at":"2017-01-10T15:31:58+00:00","published_at":"2015-12-03T00:00:00+00:00","published_by":"['']","reading_time":8,"domain_name":"www.datadoghq.com","preview_picture":"https://web-assets.dd-static.net/42588/1776356135-how-to-collect-cassandra-metrics-cassandra-blog-2.png","tags":["cassandra","metrics","performance"],"description":"This post is part 2 of a 3-part series about monitoring Apache Cassandra. Part 1 is about the key performance metrics available from Cassandra, and Part 3 details how to monitor Cassandra with Datadog..."},{"content":"<p>Amy's Cassandra 2.1 Tuning Guide (2015)</p><p>I really appreciate all the folks who have told me that this guide helped them\nin some way. I'm happy to hear that.</p><p>I've pushed this small update to change my name from Albert to Amy and haven't\nchanged anything else at this point. I'm also leaving the URL the same because\nsome folks have it bookmarked and I don't mind my old name being around so long\nas folks call me Amy from now on :)</p><ul><li>Jr. Systems Administrator level Linux CLI skills</li>\n<li>familiarity with Cassandra 2.1 and/or Datastax Enterprise 4.7</li>\n<li>basic statistics</li>\n</ul><p>This guide is not intended to be complete and focuses on techniques I've used\nto track down performance issues on production clusters.</p><p>This version of the guide has not had a lot of peer review, so there may be some\nmistakes as well as things I'm just outright wrong about.</p><p>If you find any errors, please (really!) submit an issue at\n<a href=\"https://github.com/tobert/tobert.github.io\">https://github.com/tobert/tobert.github.io</a>.</p><p>Observation is a critical skill to develop as a systems administrator. A wide\nvariety of tools are available for observing systems in different ways. Many of\nthem use the same few system metrics in different ways to provide you with a\nview into your system. Understanding low-level metrics (e.g. /proc/vmstat)\nallows you to better reason about higher-level displays such as OpsCenter\ngraphs.</p><p>It's important to remember that most metrics we consume are some kind of\naggregate; computers today are so fast that precise bookkeeping is too expensive\nto do all of the time. The critical implication of this is that we have to read\nbetween the lines; averages lie and the larger the sample is, the larger the\nlie.</p><p>Monitoring systems are tempting choices for gathering performance metrics, but\nthey usually end up having to trade off resolution for economy of storage and\nrarely have a resolution higher than 1 sample/minute. Low-resolution metrics are\ncertainly useful for capacity planning, useless for performance tuning. One\nexception is collectd + related tooling configured at a 10s resolution by\ndefault. This is much better than average but not good enough for all\nsituations, but still check it out.</p><h2>dstat</h2><p>dstat is by far my favorite tool for observing performance metrics on a Linux\nmachine. It provides the features of many tools all in one with high-resolution\n(1s). My go-to command to start it is:</p><p>dstat -lrvn 10</p><p><img src=\"https://tobert.github.io/pages/image_1.png\" alt=\"image alt text\" /></p><p>That runs dstat with the load average (-l), disk IOPS (-r), vmstats (-v), and\nnetwork throughput (-n). The 10 tells dstat to keep updating the current line\nevery second and roll to a new line every 10 seconds. It can do fancier like\nreport per-disk metrics and per-network interface, not to mention all of the\nvarious plugins. Dig in. This the most useful stats tool on the Linux command\nline. If you ask me for help, be prepared to get screenshots of a few minutes'\nactivity like the one above.</p><p>Reading the Matrix: I leave dstat running inside GNU Screen (or tmux if you\nprefer) pretty much all the time I'' connected to a cluster. Even when there\naren't problems. Maybe especially then. While running benchmarks or production\nload, I'll flip through my screens (ctrl-a n) and glance at the dstat output.\nOnce it has been running for a little while, the whole terminal should be full.\nWhat I'm looking for is vertical consistency (or lack thereof) and outliers.  On\na fully warmed-up system, memory should be around 95% in-use, with most of it in\nthe cache column.  CPUs should be in use with no more than 1-2% of iowait and\n2-15% system time. The network throughput should mirror whatever the application\nis doing, so if it's cassandra-stress, it should be steady.  If it's a Hadoop\njob writing in big batches, I'll expect big spikes. Think through the workload\nand learn what to expect <em>and</em> learn what is normal.</p><h3>reading vmstats</h3><p>I am a huge fan of dstat. Back in the Bad Old Days, I had to switch between 3-4\ntools to get an idea of how a system was doing. With dstat I can see all the\nkey metrics in one view and that is important. With a snapshot every 10\nseconds, a full-screen terminal can show me the last few minutes of data,\nmaking it easy to scan the columns visually for patterns and anomalies.\nThis is plain old pattern recognition most of the time; even without knowing\nwhat the stats mean, you should be able to correlate changes in system stats\nwith changes in the client workload. For practice, fire up dstat on an idle\nCassandra node, then fire up a simple cassandra-stress load. The change in\nmetrics over the first couple minutes of load are instructive. How long does\nit take to level out? Does it level out or are metrics swinging wildly?\nOnce the patterns are identified, then it's time to understand whats happening\nunder the hood. Going from left to right of my usual dstat -vn 10:</p><h4>run/blocked/new tasks</h4><p>These show how many processes or threads were running/blocked/created\nduring the sample window. run=~n_cores is ideal for ROI/efficiency,\nbut makes a lot of admins nervous. run=2*n_cores isn't necessarily bad.\nrun&gt;cores is healthy with some head room. Any blocked processes is\nconsidered bad and you should immediately look at the iowait %.\n1-2% iowait isn't necessarily a problem, but it usually points at\nstorage as a bottleneck.</p><h4>memory</h4><p>Memory is perhaps the easiest. Used will usually be your heap +\noffheap + ~500MB. If it's significantly higher, find out where that memory\nwent! Buffers should be a couple hundred MB, rarely more than a gigabyte.\nCache should account for almost all the remaining memory if data &gt; RAM.\nFree memory should be in the 200-300MB range unless the working data size is\nsmaller than installed memory.</p><h4>swap</h4><p>dstat shows the swap columns along with memory. These should always be zeroes.\nAny swap activity whatsoever is a source of hiccups and must be eliminated\nbefore anything else.</p><h4>disk throughput</h4><p>This tells you how many bytes are going in and out of the storage every second.\nIt is fairly accurate and tends to be more useful than IOPS. There isn't a\nstandard range here. Try saturation testing the cluster with large objects to\nfind out what the high end is, then try to size the cluster/tuning to keep it\n25% or so below peak load, or more if there is a tight SLA. You will often see\nactivity on the disks after a load test has completed and that's most likely\ncompaction.</p><h4>interrupts (int) &amp; context switches (ctx)</h4><p>An interrupt occurs when a device needs the CPU to do something, such as pick\nup data from the network or disks. Context switches occur when the kernel has\nto switch out a task on the CPU. Most interrupts are tied to a ctx, which is\nwhy ctx is almost always &gt; interrupts. On a healthy system, ctx should be\n20-40% higher than interrupts. If it's a lot higher, take a look at system\ncall counts with strace. futex is almost always the top call and indicates\na high amount of lock contention.</p><h4>user/system/idle/wait/hiq/siq CPU time</h4><p>User time is CPU time used to run userland processes, i.e. JVM threads.\nSystem time is time spent in kernel code. Idle time is exactly as it sounds.\nhiq/siq are for time spent processing hardware and soft interrupts. These\nare usually zeroes but you'll occasionally see siq time under heavy load.\nA few % is fine. Any more indicates a serious problem with the kernel\nand/or hardware.</p><h4>network send/recv</h4><p>Shows the amount of data flowing over the network.\nI can saturate 10gig links with Cassandra 2.1 and large objects, so\nthis is more important to look at than it has been in the past. This shows\nhow much data is flowing over CQL and storage protocols.\nThe differential between network / disk io / CPU usage gives a good picture\nof how efficiently a system is running. I do this in my head….</p><h2>sjk-plus</h2><p><a href=\"https://github.com/aragozin/jvm-tools\">https://github.com/aragozin/jvm-tools</a>\nis the newest addition to the stable and has one tool in particular that is\nuseful for DSE: ttop a.k.a. \"thread top\", which is exactly what it sounds like.\nWhile top and htop are useful for looking at processes, they cannot tell which JVM\nthread is which and that's where ttop comes in. It's a single jar so it's easy to\npush to a machine and does not require any GUI so it works fine over ssh. So far,\nI've found the following command to be the most useful:</p><p><img src=\"https://tobert.github.io/pages/image_2.png\" alt=\"image alt text\" /></p><p>Which threads trend towards the top is workload-dependent. Most threads in DSE are\nnamed except for all the stuff in the shared pool, which is a mess of things\nunfortunately. On the upside, in my observations of clusters under load they are\nnot a common source of problems.</p><p>The real killer feature is the heap allocation rates, which are directly correlatable\nto GC work. Try setting -o ALLOC instead of CPU to see which threads are putting\npressure on GC.</p><h2>htop</h2><p>Most Linux users are familiar with the 'top' command but it's fairly limited in\nwhat it can display and doesn't look as nice. htop has a nice default display\nthat shows per-core load and all the threads. It can also be configured by hitting\nF2, which is occasionally handy when you want to sort by specific fields or\ndisplay something outside the defaults, e.g. soft page faults.</p><p>In the following screenshot, system time is red, nice time is blue, and user\ntime is green. This may not be consistent across terminals or themes. To see\nthe legend, hit the 'h' key.</p><p>@phact recommends enabling \"[x] Detailed CPU time\"\nunder Setup (F2) to make things like iowait and steal time visible.\nWithout the option, they appear to be unused capacity, which is misleading.</p><p><img src=\"https://tobert.github.io/pages/image_3.png\" alt=\"image alt text\" /></p><h2>powertop and i7z</h2><p>Powertop is not often useful, but is worth checking at least once if you're\nseeing odd fluctuations in performance on bare metal. What you're looking for is\nthe second tab, so start it up with 'sudo powertop' then hit tab once to get to\nthe \"Idle stats\" tab. This will show you the \"C-states\" of the processors and\nhow much time is being spent in them. In short, the higher the C-state number,\nthe higher the latency cost to come out of it. C1E-HSW is cheaper than C3-HSW\nand so on. A lot of articles about latency tuning recommend disabling C-states\naltogether, but I don't think this is responsible or necessary. You probably\nwant to disable the deeper C-states (&gt;3) for Cassandra nodes that will be\nbusy. The power management code in the kernel should handle the rest.</p><p>The next tab is \"Frequency stats\". This will show you if frequency scaling is\nenabled on the CPU. In general, frequency scaling should never be enabled on\nCassandra servers. Even if it \"works\" (and it does), it makes the TSC clock\n(the fastest one) unstable, causing more clock drift, which is bad for\nCassandra systems. Modern Xeon and AMD processors have specializations for\nmaking the TSC more stable, so it's worth measuring before you give up on it.</p><p>The next couple tabs aren't very useful for Cassandra work, but the last\none is in a non-intuitive way. You want all of the items in the \"Tunables\"\ntab to say \"Bad!\" Power management and high performance are almost always\nat odds. You can safely ignore devices that aren't in use by Cassandra,\ne.g. USB controllers and sound devices.</p><p><img src=\"https://tobert.github.io/pages/image_5.png\" alt=\"image alt text\" /></p><p>i7z is an alternative to powertop that was brought to my attention but\nI have not tried it. <a href=\"https://code.google.com/p/i7z/\">https://code.google.com/p/i7z/</a></p><h2>/proc</h2><p>Most of the stats displayed by the tools already discussed come from text files\nin /proc. dstat reads /proc/loadavg for the load average, /proc/stat for the\nVM stats, /proc/diskstats for disk IO, and so on. A few files are really handy\nfor quickly checking things that aren't exported by the usual tools.</p><p>For example, hugepages. In my experience, transparent hugepages aren't\n(usually) a problem for Cassandra like they are for many other databases.\nThat said, you need to be able to know if your JVM is using them or not,\nso the easy thing to do is:</p><p><img src=\"https://tobert.github.io/pages/image_6.png\" alt=\"image alt text\" /></p><p>If the AnonHugePages is slightly larger than your heap, you're all set with THP.\nIf you thought you disabled it, restart Cassandra again to get back to 4K pages.</p><p>Pro tip: you can make a top-like view of /proc files using the watch(1) command, e.g.</p><pre>watch grep Dirty /proc/meminfo\n</pre><p>/proc/interrupts is useful for figuring out which CPUs are handling IO. It is often\ntoo large to display on a screen, so a little awk or scripting may be in order to\nget it down to size. For quick checks, simply cat'ing the file will do.</p><h2>dmesg</h2><p>dumps the kernel's error buffer to stdout. Sometimes you find things really quickly,\nsometimes not. I almost always look at it on both healthy and problematic systems to\nsee if anything is going on. For example, if an application has been OOM killed,\nthere will be a detailed report in the dmesg log. The error buffer is a\nstatically-sized ring, so sometimes when things are really hairy the important\ninformation will get scrolled off, forcing you to grep around in /var/log.</p><h2>strace</h2><p>a.k.a. the life saver. Strace has helped me discover obscure failures more than\nany other tool in my toolbox. The simplest usage of strace involves printing\nout every system call made by a process. Idle Cassandra systems make a huge\nnumber of system calls, so filtering with the -e flag is highly recommended.\nIt can fork a process for tracing by prefixing a command (e.g. strace ls) or\nyou can attach to a running process with -p PID. The -f flag is required to\nget useful dumps from Cassandra - it makes strace attach to all tasks (threads)\nin a given process.</p><p>In order to trace a particular subsystem, e.g. networking, use the -e flag.\nThere are some preset groups of syscalls like \"network\", “file”, and o\nhers. Check out the man page.</p><p><img src=\"https://tobert.github.io/pages/image_7.png\" alt=\"image alt text\" /></p><p>One of the more useful ways to use strace with Cassandra is to see how often\nthe futex() syscall is being used. This is interesting because the syscall\nis only called on contended locks. For that, use\n<code>strace -f -c -e futex -p $PID</code>. Let it run for a few seconds then hit Ctrl-C.</p><p><img src=\"https://tobert.github.io/pages/image_8.png\" alt=\"image alt text\" /></p><h2>smartctl</h2><p>Most hard drives made in the last decade support a protocol called\n<a href=\"https://en.wikipedia.org/wiki/S.M.A.R.T.\">SMART</a>. SATA and SAS SSDs support\nit as well and it's the first place I look when I suspect a problem with\nrive. It isn't always installed by default. The package is usually called\n\"smartmontools\".</p><p>If the system has anything but a simple JBOD controller (e.g. SATA AHCI,\nLSI SAS x008), you may need to specify the device type so smartctl can\nquery the HBA properly. The man page has instructions for doing so.</p><p>The most important section for troubleshooting is the attributes, specifically\nthe RAW_VALUE column. The other columns often have unreliable values,\nespecially the TYPE column, which always looks scarier than things really are.</p><p>Here is a screenshot from \"smartctl -A /dev/sdc\" on a Samsung 840 Pro SSD.\nThis is where you look to find out if flash cells are dying or you suspect\nother kinds of errors.</p><p><img src=\"https://tobert.github.io/pages/image_9.png\" alt=\"image alt text\" /></p><p>The full smartctl -a output for an HDD and SSD are available as a gist:\n<a href=\"https://gist.github.com/tobert/c3f8ca20ea3da623d143\">https://gist.github.com/tobert/c3f8ca20ea3da623d143</a></p><h2>iperf</h2><p>is a network load testing tool that is quick &amp; easy to use. I use it to\nfind the real capability of a network interface, which is often surprisingly\nlower than advertised. On bare metal, you should expect to be able to get\n&gt; 900mbit/s (remember to divide by 8 for bytes!) on 1gig links. For\n10gig links, you may need to use the --parallel option.</p><p>These screenshots show how simple it is to use iperf to find out how much\nbandwidth I can get between my Linux workstation and my Mac over wifi.</p><p><img src=\"https://tobert.github.io/pages/image_10.png\" alt=\"image alt text\" /></p><p><img src=\"https://tobert.github.io/pages/image_11.png\" alt=\"image alt text\" /></p><p>There are a number of switches available, but most of the time you don't need them.\nThe most important two to add are --time and --num to set the amount of time to\nrun the test or the number of bytes to transmit, respectively.</p><h2>pcstat</h2><p><a href=\"https://github.com/tobert/pcstat\">https://github.com/tobert/pcstat</a> is a tool I\nwrote out of frustration. One question we often have is whether or not a given\nfile is being cached by Linux. Linux itself doesn't export this information in\nan easy way, so pcstat gets it via the mincore(2) system call.  The easiest way\nto get it if you have Go installed is \"go get github.com/tobert/pcstat\". This\nwill place a pcstat binary in $GOPATH/bin that you can scp to any Linux server.</p><pre>pcstat -bname /var/lib/cassandra/data/*/*/*-Data.db\n</pre><p><img src=\"https://tobert.github.io/pages/image_12.png\" alt=\"image alt text\" /></p><h2>cl-netstat.pl</h2><p><a href=\"https://github.com/tobert/perl-ssh-tools\">https://github.com/tobert/perl-ssh-tools</a>\nAnother custom tool. This one logs into your whole cluster over ssh and displays\na cluster-wide view of network traffic, disk IOPS, and load averages. I use this\nall the time. It's a bit of a pain to install, but it's worth it. By default it\nupdates every 2 seconds.</p><p><img src=\"https://tobert.github.io/pages/image_13.png\" alt=\"image alt text\" /></p><p>We really need an entire guide like this one for cassandra-stress. For most\nperformance tuning, a very simple cassandra-stress configuration is sufficient\nfor identifying bottlenecks by pushing a saturation load at the cluster. It is\nimportant to keep in mind that sustained saturation load should never be used to\ndetermine production throughput; by definition it is unsustainable. It is, on\noccasion, useful to find the max saturation load, then dial it back by 10-20% as\na starting point for finding a cluster's maximum <strong>sustainable</strong> load.</p><p>There are two workloads in particular that expose most issues quickly, one using\nsmall objects and the other using large objects.</p><h2>small objects for finding TXN/s limits</h2><p>Small objects running at maximum transaction throughput help expose issues with\nnetwork packets-per-second limits, CPU throughput, and as always, GC.\nI've been unable to get enough throughput with small objects to stress 10gig\nnetworks or good SSDs; for those move on to large objects. This is the easiest\ntest to run and is by far the most common, since it requires almost no\nconfiguration of cassandra-stress.</p><p>I usually put my cassandra-stress commands in little shell scripts so it's\neasier to edit and I don't have to rely on command history or, horror of\nhorrors, typing. For example, here's what my small-stress.sh looks like:</p><pre>#!/bin/bash\nexport PATH=/opt/cassandra/bin:/opt/cassandra/tools/bin:$PATH\ncassandra-stress \\\n    write \\\n    n=1000000 \\\n    cl=LOCAL_QUORUM \\\n    -rate threads=500 \\\n    -schema \"replication(factor=3)\" \\\n    -node 192.168.10.12\n</pre><h2>large objects for finding MB/s limits</h2><p>Many of the times I'' asked to look at a cluster, IO is usually suspect. Small\nobjects tend to run up against CPU or GC bottlenecks before they have a chance\nto do enough IO to show whether I'm tuning in the right direction. Using large\nobjects helps here.</p><p>In this example, I'm writing partitions with 32 columns at 2K each for a total\nof 64K per partition. You will probably need to try some different values to get\nthings moving. Sometimes disabling durability on the CF or putting the CL on\ntmpfs is useful to reduce CPU/GC load and move more IO through flushing.</p><pre>cassandra-stress \\\n    write \\\n    n=1000000 \\\n    cl=LOCAL_QUORUM \\\n    -rate threads=500 \\\n    -col \"size=fixed(2048)\" \"n=fixed(32)\" \\\n    -schema \"replication(factor=3)\" \\\n    -node 192.168.10.12\n</pre><p>There are three major places to find settings that impact Cassandra's\nperformance: the java command-line (GC, etc.), the schema, and cassandra.yaml.\nProbably in that order. The inaccuracy of some comments in Cassandra configs is\nan old tradition, dating back to 2010 or 2011. The infamous \"100mb per core\"\natrocity dates back a ways, but we're not here to talk about history. What you\nneed to know is that a lot of the advice in the config commentary is misleading.\nWhenever it says “number of cores” or “number of disks” is a good time to be\nsuspicious. I'm n't going to rewrite the whole yaml file here, but instead cover\nthe few settings that should always be checked when tuning.</p><h2>The Commitlog Bug in 2.1</h2><p>a.k.a. <a href=\"https://issues.apache.org/jira/browse/CASSANDRA-8729\">https://issues.apache.org/jira/browse/CASSANDRA-8729</a></p><p>^ UPGRADE TO &gt;= Cassandra 2.1.9 or DSE 4.7.3 and set:</p><pre>commitlog_segment_recycling: false\n</pre><h3>Prior to 2.1.9:</h3><p>The Jira linked above has most of the gritty details. TL;DR, the workaround is\nto set commitlog_segment_size_in_mb &gt; commitlog_total_space_in_mb which\ncauses Cassandra to drop segments after use rather than reuse them.\n'</p><h2>memtables &amp; flushing</h2><p>One of the most common tweaks we have to make is bumping\nmemtable_flush_writers on systems with sufficient resources. The advice in the\nstock yaml isn't bad in this case but it isn't very nuanced. Generally you can\nstart at 4 and see what happens. If you're seeing dropped mutations under\nsaturation load, try going as high as 8 but probably not much higher since 8\nblocked flushwriters probably means your disks aren't up to the task and the\nload needs to be scaled back.</p><p>I think memtable_flush_writers and memtable_cleanup_threshold should always\nbe set together. The default formula is:  1 / (memtable_flush_writers + 1) so\nif you have a lot of flush writers, your cleanup threshold is going to be very\nlow and cause frequent flushing for no good reason. A safe starting value for a\ncluster with few tables is 0.15 (15% of memtable space). If you have lots of\nactive tables, a smaller value may be better, but watch out for compaction cost.</p><p>Note: memtables aren't compressed so don't expect compressed sstable sizes to\nline up.</p><h2>memtable_allocation_type: offheap_objects</h2><p>Offheap memtables can improve write-heavy workloads by reducing the amount of\ndata stored on the Java heap. 1-2GB of offheap memory should be sufficient for\nmost workloads. The memtable size whould be left at 25% of the heap as well,\nsince it is still in use when offheap is in play.</p><p>An additional performance boost can be realized by installing and enabling\njemalloc. On common distros this is usually a yum/apt-get install away. Worst\ncase you can simply install the .so file in /usr/local/lib or similar.\nInstructions for configuring it are in cassandra-env.sh.</p><h2>concurrent_{reads,writes,counters}</h2><p>Like some other options, the recommendations in the comments for these are\nmisleading. Increasing these values allows Cassandra to issue more IO in\nparallel, which is currently the best way to push the Linux IO stack. You may\nalso want to take a look at /sys/block//queue/nr_requests. Start with 128\non huge machines and go as low as 32 for smaller machines.</p><h2>sstable compression</h2><p>The CPU tradeoff is almost always a net win compared to iowait or the wasted\nspace on the drives. Turning compression off is sometimes faster. The main\ntunable is chunk_length_kb in the compression properties of a table. The\ndefault chunk length is 128K which may be lowered either at CREATE TABLE time or\nwith ALTER TABLE.</p><p>In 2.1, uncompressed tables will use a reader with 64k buffer. If your\nreads are significantly smaller than 64k, using compression to allow Cassandra\nto lower that buffer size will likely be a significant win for you in terms of\nIO wait, latency, and overall read throughput, even if you don't necessarily care\nabout the savings in disk space. In any case, aligning the buffer chunk size to\nbe a multiple of the disk block size (ie: xfs 4k blocks) is optimal.</p><h2>streaming</h2><p>Make sure to always set streaming_socket_timeout_in_ms to\na non-zero value. 1 hour is a conservative choice that will prevent the worst\nbehavior.</p><p><a href=\"https://issues.apache.org/jira/browse/CASSANDRA-8611\">https://issues.apache.org/jira/browse/CASSANDRA-8611</a></p><h2>Java 8</h2><p>Every cluster I've touched for the last couple months has been running Java 8\nwith no problem. JDK8 has some nice improvements in performance across the\nboard, so some clusters will pick up some additional headroom just by upgrading.\nJava 7 is deprecated by Oracle and is on its way out. That said, if for some\nreason (e.g. inverse-conservative policies) you have to stick with JRE7, at\nleast try to get the last update release.</p><h2>OpenJDK</h2><p>There is some remembered pain around OpenJDK that as far as I can tell dates\nback to the initial releases of it in Fedora where it was JDK6 with no JIT and a\nbunch of GNU Classpath and Apache Harmony things bolted on. That didn't last too\nlong before Sun finished OSSing the important parts of Hotspot, making the\nOpenJDK we have today that is, from the server VM's perspective, identical to\nthe Oracle releases of Hotspot. The critical thing to watch out for today with\nOpenJDK is OS packages. Java 8 took a long time to adopt because the early\nreleases were buggy. Do yourself a favor and check the full version of any\nOpenJDK before trusting it. If it's out of date and you still want to use\nOpenJDK, check out the <a href=\"http://www.azulsystems.com/products/zulu\">Zulu</a> packages\nproduced by Azul.</p><h2>heap estimation &amp; GC selection</h2><p>If you haven't read the bit about offheap from above, please check that out. In\ngeneral, our default heap size of 8GB is a good starting point for\nCassandra-only workloads. When adding in Solr, you will almost always want to\nincrease the heap. I've seen it set anywhere from 8GB to 32GB in production\nsystems. I've tested up to 256GB with G1, and while it works great, it's a waste\nof pressious page cache space. G1 works best at 26-32GB, start in that range if\nyou have the spare RAM. When messing around with GC, you should always enable GC\nlogging. With newer JDK7 builds and any JDK8 build, GC log rotation is built-in\nso there's no reason to have it disabled, especially in production. Some folks\nprefer to leave it off in production by default because\n<a href=\"https://groups.google.com/forum/#!topic/mechanical-sympathy/m4cGegwc-sY\">GC log writes occur during STW</a>\nand are synchronous, which may introduce additional hiccups.</p><p>Now that you have GC logging enabled you have a choice: stick with CMS (the\ndevil we know) or switch to G1GC (the devil we barely know).  You can get more\nthroughput from CMS, but it takes careful planning and testing to achieve it,\nwhereas with G1, you can get good performance without a lot of tweaking.</p><h2>useful settings for any (parallel) GC</h2><p>By default, Hotspot caps GC threads at 8, seemingly because of some legacy\nassumptions combined with unrelated side-effects on SPARC. In any case, if the\ntarget system has more than 8 cores, you may want to allow GC to use all of\nthem. This has been observed to reduce STW durations. I haven't seen any\nnegative side-effects. As the comments say, HT cores don't count. See\nalso: \"EC2 cores are not what you think they are\".</p><pre># The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.\n# Machines with &gt; 10 cores may need additional threads.\n# Increase to &lt;= full cores (do not count HT cores).\n#JVM_OPTS=\"$JVM_OPTS -XX:ParallelGCThreads=16\"\n#JVM_OPTS=\"$JVM_OPTS -XX:ConcGCThreads=16\"\n</pre><p>Reference processing isn't usually a big deal for Cassandra, but in some\nworkloads it does start to show up in the GC logs. Since we pretty much always\nwant all the parallel stuff offered by the JVM, go ahead and enable parallel\nreference processing to bring down your p99.9's.</p><pre># Do reference processing in parallel GC.\nJVM_OPTS=\"$JVM_OPTS -XX:+ParallelRefProcEnabled\"\n</pre><h2>CMS</h2><p>The 100mb/core commentary in cassandra-env.sh for setting HEAP_NEWSIZE is\n<strong>wrong</strong>. A useful starting point for CMS is 25% of the heap. For some\nworkloads it may go as high as 50% of the heap, but start at 20-25% and see what\nhappens. Before moving onto testing, add the following settings derived from\n<a href=\"https://issues.apache.org/jira/browse/CASSANDRA-8150\">CASSANDRA-8150</a>. There's\nsome good stuff in there, but be careful not to haphazardly combine all the\nsettings from the various comments. They don't always mix well.</p><p>TODO: add Pierre's notes about CMS</p><pre># [http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html](http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html)\nJVM_OPTS=\"$JVM_OPTS -XX:+UnlockDiagnosticVMOptions\"\nJVM_OPTS=\"$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096\"\n# these will need to be adjusted to the workload; start here\nJVM_OPTS=\"$JVM_OPTS -XX:SurvivorRatio=2\"\nJVM_OPTS=\"$JVM_OPTS -XX:MaxTenuringThreshold=16\"\n# Branson thinks these are cool. TODO: describe what these do.\nJVM_OPTS=\"$JVM_OPTS -XX:+CMSScavengeBeforeRemark\"\nJVM_OPTS=\"$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000\"\nJVM_OPTS=\"$JVM_OPTS -XX:CMSWaitDuration=30000\"\n</pre><p>CMSScavengeBeforeRemark: triggers a Young GC (STW) before running CMS Remark\n(STW) phase. The expected effect is to reduce the duration of the Remark phase.</p><p>CMSWaitDuration: once CMS detects it should start a new cycle, it will wait up\nto that duration (in millis) for a Young GC cycle to happen. The expected effect\nis to reduce the duration of the Initial-Mark (STW) CMS phase.</p><p>SurvivorRatio sizes the Eden and the survivors. SurvivorRatio=N means: divide\nthe young generation by N+2 segments, take N segments for Eden and 1 segment for\neach survivor.</p><p>MaxTenuringThreshold defines how many young GC an object should survive before\nbeing pushed in the old generation. A too high value increases the Young GC\npause time (because of extra copying). A too low value increases pressure on\nCMS.</p><h2>G1GC</h2><p>Recommendation: pick a node, flip its cassandra-env.sh over to the config block\nbelow, then come back here and read while that node generates some GC logs for\nyou to look at.</p><p>G1GC is the newest garbage collection backend in Hotspot. It was introduced\nduring Java 6 and gradually improved through Java 7 and seems to be solid for\nproduction as of Java 8u45. I do not recommend using G1 on any JRE older than\nHotspot 8u40. There have been changes to the algorithm between u40 and u45 and I\nexpect more as adoption increases, so the latest release of Java 8 is\nrecommended. Hotspot 9 is expected default to G1GC, so it's time to start\nlearning how it works.</p><p>The main benefit of the G1 collector is what they call ergonomics.\nTuning CMS is a black art that requires\na lot of iteration to dial in. G1 is usually good out of the box and can be\ndialed in with just a couple parameters to the JVM. According to the various\ndocs on the web, CMS shoots for 1% or less of application CPU time spent in GC\nwhile G1 allows for up to 10%. It's a good deal at half the price, and the observed\nCPU usage is usually much lower than 10%.</p><p>It is critical that you <strong>comment out the -Xmn line when switching to G1</strong>\nPerhaps my favorite feature in G1 is that the eden size is calculated\nautomatically to try to meet the latency target. It also automatically sets the\ntenuring threashold. It does a decent job and may even be useful to try as a way\nto estimate the -Xmn for CMS (switch to G1, run a load, grep Eden\n/var/log/cassandra/gc.log.0.current).</p><p>There are two main settings to use when tuning G1: heap size and MaxGCPauseMillis.</p><p>G1 can scale to over 256GB of RAM and down to 1GB (6GB minimum is recommended)\nThe first thing to try in many situations is to bump the heap by a few\nGB and see what happens. Sometimes it helps, sometimes it doesn't. The results\nare usually obvious within a few minutes, so I'll often go from 8GB to 16GB to\n32GB (when available) to see if it helps. Adding more heap space allows G1 to\n\"waste\" more heap on uncollected garbage that may be mixed with tenured data. If\nit is under pressure to reclaim space for eden, you will see significant memory\ncopy time in the GC logs. That's bound by the memory bandwidth of the system and\nthere isn't much we can do about it, so increasing the heap to allow more slack\nis the easy path.</p><p>The other tunable is -XX:MaxGCPauseMillis=n.  The default in Hotspot 8 is 200ms.\nWhen testing G1 on\nlower-end hardware (mobile CPUs, EC2) it was observed that throughput suffered\ntoo much with the 200ms pause target. Increasing it to 500ms keeps the average\nSTW pause below the default timeouts in cassandra.yaml while allowing for better\nthroughput. The critical thing to keep in mind is that this is a <strong>target</strong> and\nnothing is guaranteed; STW on fast machines might hover around 120ms and never\neven approach the target. Slower machines may exceed the target occasionally,\nwhich is why your timeouts in cassandra.yaml should allow for some slack.</p><pre># Use the Hotspot garbage-first collector.\nJVM_OPTS=\"$JVM_OPTS -XX:+UseG1GC\"\n# Main G1GC tunable: lowering the pause target will lower throughput and vise versa.\n# 200ms is the JVM default and lowest viable setting\n# 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml.\nJVM_OPTS=\"$JVM_OPTS -XX:MaxGCPauseMillis=500\"\n# Have the JVM do less remembered set work during STW, instead\n# preferring concurrent GC. Reduces p99.9 latency.\nJVM_OPTS=\"$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5\"\n# Start GC earlier to avoid STW.\n# The default in Hotspot 8u40 is 40%.\nJVM_OPTS=\"$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25\"\n# For workloads that do large allocations, increasing the region\n# size may make things more efficient. Otherwise, let the JVM\n# set this automatically.\n#JVM_OPTS=\"$JVM_OPTS -XX:G1HeapRegionSize=32m\"\n</pre><h2>useful GC log highlights</h2><p>Set up the above config and kick off some load. Start tailing the GC log and\nwait for the eden size to stabilize. It often gets close within a few seconds of\nrunning load, but give it a minute or so to be sure then start looking at the\ndetailed information. Each GC log section is rather large so I'm not going to\ndocument it here. There are three lines that provide most of what we need to\nknow.</p><pre>[Object Copy (ms): Min: 157.6, Avg: 161.5, Max: 162.2, Diff: 4.6, Sum: 1292.0]\n</pre><p>Object Copy time is embedded in a larger block of stats. With most of the\nsystems I've examined, this is where the vast majority of the STW time is spent,\nso the trick is to tune the JVM so that it does less copying of objects. As\nmentioned earlier, start with adding heap space and offheap memtables.\nCompaction is particularly pragmatic (as observed through jvisualvm or sjk-plus\nttop) and there doesn't seem to be much we can do about it. Throttling\ncompaction can even make it worse by forcing Cassandra to keep objects in memory\nlonger than necessary, causing promotion which leads to memory compaction which\nis bound by memory bandwidth of the system.</p><pre>[Eden: 4224.0M(4224.0M)-&gt;0.0B(4416.0M) Survivors: 576.0M-&gt;448.0M Heap: 6334.9M(8192.0M)-&gt;2063.4M(8192.0M)]\n</pre><p>This is where you can see how much of the heap is being used for eden space. It\nwill go to 0.0B every time this is printed, since with the default logging it\nonly prints it after a STW. The survivors number under G1 rarely go over\n1GB and usually hover in the 200-300MB range. If it goes over 1GB there might be\nsomething wrong in the DB worth investigating. The last part, \"Heap:\"\nshows the total amount of allocated heap space. This will vary the most. If it\nhovers at 90-100% of the total heap, you're probably using Solr and have a lot\nof data and will need a bigger heap.</p><pre>[Times: user=1.73 sys=0.00, real=0.20 secs]\n</pre><p>I don't typically use these final numbers for much tuning, but they're good to\nglance at every once in a while to get an idea how much of your CPU is being\nburned for GC. user= represents the amount of CPU time consumed on all cores and\nis usually a multiple of real=. If sys is significant relative to the other\nnumbers, it probably points at contention somewhere in the system (sometimes\ndebuggable with strace -c). Finally, the real= part is wall-clock time and will\ncorrelate with the observable pause.</p><h2>Always Pre-Touch</h2><pre># Make sure all memory is faulted and zeroed on startup.\n# This helps prevent soft faults in containers and makes\n# transparent hugepage allocation more effective.\nJVM_OPTS=\"$JVM_OPTS -XX:+AlwaysPreTouch\"\n</pre><h2>Disable Biased Locking</h2><p>Biased locking is an optimization introduced in Hotspot 1.5 that optimizes\nsingle-writer locks. It's a win in systems that have mostly uncontended\nlocking. Cassandra is a large system with many contended locks in hot paths\nmaking this optimization counter-productive. The difference between having this\nenabled/disabled is difficult to detect unless the system is running close to\nfull capacity.</p><pre># Biased locking does not benefit Cassandra.\nJVM_OPTS=\"$JVM_OPTS -XX:-UseBiasedLocking\"\n</pre><p><a href=\"https://blogs.oracle.com/dave/entry/biased_locking_in_hotspot\">https://blogs.oracle.com/dave/entry/biased_locking_in_hotspot</a></p><p><a href=\"http://www.azulsystems.com/blog/cliff/2010-01-09-biased-locking\">http://www.azulsystems.com/blog/cliff/2010-01-09-biased-locking</a></p><p><a href=\"http://mechanical-sympathy.blogspot.com/2011/11/biased-locking-osr-and-benchmarking-fun.html\">http://mechanical-sympathy.blogspot.com/2011/11/biased-locking-osr-and-benchmarking-fun.html</a></p><h2>Thread Local Allocation Blocks</h2><p>TLABs are enabled by default in Cassandra, but the option is mixed in with some\nCMS stuff so it occasionally gets dropped by accident when switching to G1 so\nit's worth calling out as important. With the number of threads in play in a\nCassandra instance, it's worth also enabling TLAB resizing if only to recover\nthe TLAB from threads that rarely wake up or do significant allocation. Right\nnow this is just a theory, but being able to increase the size of TLAB is likely\na big win for Cassandra since a few threads (e.g. compaction) allocate large\namounts of memory making any opportunity to avoid a GC lock a big win. That's\nthe theory and although a statistically significant difference between\n+/-ResizeTLAB could not be found in simple tests, this is a common and practical\noptimization that should be enabled.</p><pre># Enable thread-local allocation blocks and allow the JVM to automatically\n# resize them at runtime.\nJVM_OPTS=\"$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB\"\n</pre><h2>other JVM applications</h2><p>A good chunk of applications using Cassandra are built on the JVM. Quite often\neven our own tools (e.g. cassandra-stress) have simple GC and tuning settings\nthat limit performance. Copying Cassandra's settings is not the answer; many of\nthe things that are good for Cassandra are bad for smaller/simpler apps. That\nsaid, here's the settings I use for G1 with cassandra-stress and many of the\nother tools in the distribution, as well as other JVM apps. It's not universal,\nbut perhaps a better starting point than the defaults.</p><pre>java -server -ea \\\n  -Xmx8G -Xms1G \\\n  -XX:+UseG1GC \\\n  -XX:+AggressiveOpts -XX:+UseCompressedOops \\\n  -XX:+OptimizeStringConcat -XX:+UseFastAccessorMethods \\\n  $MAIN\n</pre><p>The most visible deferred cost of writing to Cassandra is compaction. I find it\nuseful to describe it in terms of compound interest: you get to write at every\nhigh throughput now (borrowing), but at some point you have to redo all that IO\n(principal) with a fair amount of waste (interest) to maintain acceptable reads\n(credit score).</p><h2>concurrent_compactors</h2><p>On SSDs I start at 4 and go up from there if the workload requires it. The\nbiggest problem with huge numbers of compactors is the amount of GC it\ngenerates, so make sure to watch your GC logs or p99 client latency to make sure\nthat additional compactors doesn't ruin your latency.</p><p>Whether or not to throttle compactions should follow the same reasoning. With\nCassandra 2.1, compaction properly takes advantage of the OS page cache, so\nread IO isn't as big of a deal as it was on 2.0. Limiting the amount of IO\nallowed for compaction also cuts the amount of GC generated by the compaction\ncode, so it's still a good idea even on very fast storage.</p><h2>STCS</h2><p>This is the original deal and is by far the most widely deployed and tested\ncompaction strategy. This should be the default choice when you\ndon't know what to do or have insufficient information. I haven't done much\ntuning of the various knobs available, so please let me know if there's anything\ninteresting I should be looking at.</p><h2>LCS</h2><p>Use LCS when you need to fill up disks past 50% or have\nreally tight read SLAs. Otherwise, stick with STCS.</p><p>LCS has significant issues with streaming / resulting compaction with node\ndensities &gt; 800GB especially. This is getting addressed in Cassandra 2.2, but is\na real issue when bootstrapping new nodes or data centers currently.</p><h2>DTCS</h2><p><a href=\"http://www.datastax.com/dev/blog/dtcs-notes-from-the-field\">http://www.datastax.com/dev/blog/dtcs-notes-from-the-field</a></p><p>Cassandra uses memory in 3 ways: Java heap, offheap memory, and OS page cache\n(a.k.a. buffer cache but only uncool people call it that anymore).  You want as\nmuch RAM as you can get, up to around 256GB. For a read-heavy system it might\neven make sense to go into the 512GB-2TB range (all the major server vendors\nhave 2TB servers now), but really you want to scale out rather than up whenever\npossible.</p><h2>The Page Cache</h2><p>Cassandra relies heavily on the operating system page cache for caching of data\non disk. Every read that is served from RAM is a read that never gets\nto a disk, which has a systemic effect of reducing load across the board. Every\npage of RAM that is available for caching helps, which is why I'll cheerfully\nrecommend 128GB of RAM even though Cassandra itself only consumes a fraction of\nit directly.</p><p><a href=\"http://queue.acm.org/detail.cfm?id=1814327\">http://queue.acm.org/detail.cfm?id=1814327</a></p><h2>Swap (Always Say Never)</h2><p>Prior to the 2000's, RAM was often the biggest line item on server quotes. I\nremember spending more than $80,000 for 8GB of RAM in a PA-RISC system in ~2001.\nWhen RAM is $10,000/GB and disk is $250/GB, swap makes sense. Even on\nthose systems any swap usage was catastrophic to performance, which is why the\nbusiness was willing to spend the big numbers on RAM. In today's age of &lt; $100\nfor an 8GB DIMM, using swap in any latency-sensitive system is silly.</p><p>Always disable swap. In addition, always set /proc/sys/vm/swappiness to 1 just\nin case it gets reenabled by accident. The default value is 60, which tells the\nkernel to go ahead and swap out applications to make room for page cache\n<em>headdesk</em>. With both settings in place, the system should never swap and that's\none less thing to think about when tracking down latency problems.</p><p>Recommendation:</p><pre>swapoff -a\nsed -i ‘s/^\\(.*swap\\)/#\\1/' /etc/fstab\necho \"vm.swappiness = 1\" &gt; /etc/sysctl.d/swappiness.conf\nsysctl -p /etc/sysctl.d/swappiness.conf\n</pre><h2>numactl &amp; -XX:+UseNUMA</h2><p>The quickest way to tell if a machine is NUMA is to run \"numactl --hardware\".</p><p>One of the big changes to systems in the last decade has been the move from the\nIntel Pentium front-side bus architecture to Non-Uniform Memory Architecture,\na.k.a. NUMA. Really, it's two changes in one: modern x86 CPUs have integrated\nthe memory controller onto the same die. This means that in a multi-socket\nsystem, there are two memory controllers. Rather than making one CPU have to ask\nthe other for all memory, each CPU gets a share of the memory (usually\nsymmetrical), and they only talk to each other when a process executing on one\nCPU needs memory located on the other CPU's memory bus. There are a bunch of\noptimizations in the hardware to make this as painless as possible, but as\nusual, there's still a cost in latency. When an application or thread only uses\nmemory local to the CPU, things go really fast and when that fails, things go\nslower.</p><p>By default, the bin/cassandra script will prepend the JVM command with\n<code>numactl --interleave</code>. This is a good default that will enable decent performance on\nmost systems. That said, there are more options for NUMA systems that may open\nup additional performance. One is to comment out the <code>numactl --interleave</code> in\nbin/cassandra and add -XX:+UseNUMA to cassandra-env.sh. This instructs the JVM\nto handle NUMA directly. The JVM will allocate memory across NUMA domains and,\naccording to docs, will divide GC across domain domains. It does not do any\nthread pinning though (the code exists but is a noop on Linux). I've tested\nUseNUMA with a 256GB heap and it does work, but it's not necessarily\n--interleave.</p><p>The fastest option is for multi-JVM setups on NUMA where you can use numactl\n--cpunodebind to lock a JVM to a particular NUMA node so all memory is local and\nthreads are not allowed to execute on remote cores. This is the highest\nperformance option, but does limit the process to one socket, so use it with\ncaution. There are also problems with availability in the face of a server\nfailure, so please be careful if you try this route.</p><p><a href=\"http://frankdenneman.nl/2015/02/27/memory-deep-dive-numa-data-locality/\">http://frankdenneman.nl/2015/02/27/memory-deep-dive-numa-data-locality/</a></p><h2>zone_reclaim_mode, destroyer of p99s</h2><p>This is usually disabled by default, but if by chance it is enabled, you will\nlikely observe random STW pauses caused by the kernel when zone reclaim fires.</p><p><a href=\"https://www.kernel.org/doc/Documentation/sysctl/vm.txt\">https://www.kernel.org/doc/Documentation/sysctl/vm.txt</a> (at the very bottom)</p><p><a href=\"http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html\">http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html</a></p><p><a href=\"http://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootZoneReclaimMode.html\">http://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootZoneReclaimMode.html</a></p><p>The slowest part of a node is going to be either the disk or the network. Even\nin the age of SSDs, it's difficult to predict which is best. A good 10gig\nnetwork can get below 40µs latency, which can keep up with SATA and SAS. This is\nwhy we're starting to see SSD NAS become usable in the public clouds.</p><p>When a transaction is served entirely out of memory, the client txn latency is\nroughly:</p><pre>memory_txn_latency + network_latency + client_latency\n</pre><p>Cache misses are always worse:</p><pre>disk_latency + memory_txn_latency + network_latency + client_latency\n</pre><p>This is not unique to Cassandra; every durable database with data &gt; RAM has to\ndeal with disks as the wildcard in client latency. Cache miss latency is\ndominated by disk access time. No amount of magic can make that go away (though\nrapid read protection may hide it).</p><p><a href=\"http://tobert.github.io/post/2014-11-13-slides-disk-latency-and-other-random-numbers.html\">http://tobert.github.io/post/2014-11-13-slides-disk-latency-and-other-random-numbers.html</a></p><h2>Solid State Drives (a.k.a. flash drives)</h2><p>Leaving transport/HBA aside for the moment, SSD is absolutely the preferred\nsolution for every workload. There are a few exceptions where HDD makes sense\nfor economical reasons, but the arguments get shaky as soon as you start looking\nat the vast difference in latency and power consumption. SSDs have no moving\nparts as well, so while they can fail, it's a lot more predictable if you\nmonitor the retired cells over SMART.</p><p><a href=\"http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead\">http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead</a></p><p><a href=\"https://laur.ie/blog/2015/06/ssds-a-gift-and-a-curse/\">https://laur.ie/blog/2015/06/ssds-a-gift-and-a-curse/</a></p><p>Most flash on the market today is referred to as MLC, which usually refers to\n2-level cells. This means that each flash cell can hold 2 bits of data. SLC\n(Single Level Cell) is difficult to find these days and is obscenely expensive\nso forget about it. Some Samsung drives are based on TLC (triple) to get 3 bits\nper cell, while SanDisk has some drives at 4 bits per cell. The base\nrecommendation for production workloads is MLC. Some folks are testing TLC for\nhigh-volume/low-write (DTCS) workloads and having some success. YMMV.</p><p><a href=\"https://en.wikipedia.org/wiki/Multi-level_cell\">https://en.wikipedia.org/wiki/Multi-level_cell</a></p><h2>Hard Disk Drives (a.k.a. spinning rust)</h2><p>Hard drives get the moniker \"spinning rust\" because they are literally a few\ndiscs made out of iron spinning while a mechanical arm moves a sensor across\nthem to detect changes in the magnetic field. They have served us well for\ndecades and are by far the cheapest storage available. They still show up in new\nsystems for a few reasons:</p><ul><li>hardware vendors mark SSDs up by 100-2000%</li>\n<li>7200RPM SATA is aroudn $0.05/GB while a Samsung 1TB TLC is $0.50</li>\n<li>high capacity is still easier to come by</li>\n</ul><p>You already know that SSDs are better, so when do HDDs make sense? Since the\npreferred size of a Cassandra &lt;= 2.1 node is still around 4-5TB, the answer is\nusually never. There are exceptions, such as mostly-write workloads + DTCS where\nseeking isn't as big of a problem. Sometimes you already bought the\nmachine or are mitigating against drive failures elsewhere, so they have to be\nmade to work. The trick to getting HDDs to perform as well as possible is to\ntune things for linear IO wherever possible. Sometimes this means deeper queues\n<code>nomerges=0, nr_request=256</code> and sometimes it means getting more RAM. When HDDs\nare the only option, get as much RAM as you can. Every cache hit in RAM means\nless IO on the drive. It's not a big deal on SSDs where random IO isn't\npenalized, but on an HDD that IO will probably cause a seek which is exactly the\nworst thing to do to an HDD.</p><h2>Transports: SATA, SAS, NVMe, PCIe, and virtualized</h2><p>Most of the drives in production today are either SAS or SATA. SAS\nHBAs are preferred over SATA even when using SATA drives. SATA drives work fine\non SAS controllers that can scale a little better. That said, in simple machines,\nusing the onboard AHCI SATA controller is fine. One important difference that's\nuseful to note is that SAS is rated at 1 undetectable error out of\n10^16 bits while SATA drives are typically in the 10^15 range. While Cassandra has\nmultiple levels of mitigation (sstable checksums and replication), this can be a\nuseful way to convince people to move to SAS. Since NL-SAS is basically a\nSATA drive with a SAS controller, they should have the same error correction as\nSAS, but be sure to check the data sheet for the drive.</p><p>NVMe is the new kid on the block. It is a standard similar to AHCI for SATA or\nUHCI for USB that specifies a hardware/driver and physical interface for\nPCI-Express (PCIe) flash devices that is optimized for parallelism, unlike\ntraditional block interfaces.</p><p>For the next year or two it is really important to verify the kernel NVMe driver\nis up-to-date. The early releases of NVMe for Linux were riddled with bugs and\nmissing features. Most kernels released after 2015-05 should be fine.</p><p><a href=\"https://communities.intel.com/community/itpeernetwork/blog/2015/06/09/nvm-express-linux-driver-support-decoded\">https://communities.intel.com/community/itpeernetwork/blog/2015/06/09/nvm-express-linux-driver-support-decoded</a></p><p>PCI-Express cards such as those sold by Intel, FusionIO (now SanDisk), and\nSamsung are still commonly deployed. These are by far the highest-performing\noption, with latencies measured in microseconds.</p><p>When running in a virtualized environment, a number of new variables have to be\nconsidered. The most common and easy to address problem is use of virtual IO\ndrivers rather than emulation. Most of the public cloud images are virtio-native\nthese days. Make sure to check. On some private clouds you may find images\nmisconfigured to use an emulated SCSI adapter from the 1990's or even worse, an\nBX440 IDE adapter. These use significantly more CPU and memory bandwidth than\nvirtio drivers that cooperate with the hypervisor to provide decent performance.\nThat said, the best option is IOMMU access to an underlying device, but that's\nfairly rare. It will be more common in the near future as cloud providers roll\nout NVMe.</p><p><a href=\"http://www.techrepublic.com/blog/data-center/how-sas-near-line-nl-sas-and-sata-disks-compare/\">http://www.techrepublic.com/blog/data-center/how-sas-near-line-nl-sas-and-sata-disks-compare/</a></p><p><a href=\"https://en.wikipedia.org/wiki/NVM_Express\">https://en.wikipedia.org/wiki/NVM_Express</a></p><p><a href=\"http://tobert.org/disk-latency-graphs/\">http://tobert.org/disk-latency-graphs/</a></p><h2>Amazon EBS, Google PD, Azure Volumes</h2><p>Amazon EBS \"standard\" = BAD, AVOID AT ALL COSTS (literally!)</p><p>Amazon EBS \"io1\" or “gp2” = (notbad), go for it!.</p><p>The new SSD-backed \"general purpose\" EBS has a latency of 1ms or less most of the\ntime and pairs nicely with c4.4xlarge instances. Make sure the instance is EBS\noptimized. Make sure to take a look at the <a href=\"http://aws.amazon.com/ebs/details/\">EBS Product Details\npage</a> to see how IO bursting plays out.\nInterestingly, if the volume size exceeds 3.4TB, the volume is automatically\nbumped to 10,000 IOPS which is a great deal compared to io1 volumes and\nespecially i2.2xlarge clusters. io1 volumes are also useful but are more expensive so\nconsider them for the commit log, but most users will be best served by gp2.</p><h2>SAN/NAS</h2><p>Just say no. Friends don't let friends and all that. There are few exceptions to\nthis rule. One is the new wave of PCI-Express DAS flash sold by EMC, Netapp, and\nothers. These devices are quite expensive but are popping up all over the\nstorage industry. In some SAN/NAS shops we may be able to leverage partnerships\nwith the big storage vendors to bring in a managed flash device. The key to\nrecognizing appropriate DAS storage is when it can be deployed as one array per\nrack, or one DAS box per availability zone/rack.</p><h2>discovery</h2><p>When getting acquainted with a new machine, one of the first things to do is\ndiscover what kind of storage is installed. Here are some handy commands:</p><ul><li><code>blockdev --report</code></li>\n<li><code>fdisk -l</code></li>\n<li><code>ls -l /dev/disk/by-id</code></li>\n<li><code>lspci -v # pciutils</code></li>\n<li><code>sg_inq /dev/sda # sg3-utils</code></li>\n<li><code>ls /sys/block</code></li>\n</ul><h2>IO elevator, read-ahead, IO merge</h2><p>Folks spend a lot of time worrying about tuning SSDs, and that's great, but on\nmodern kernels these things usually only make a few % difference at best. That\nsaid, start with these settings as a default and tune from there.</p><p>When in doubt, always use the deadline IO scheduler. The default IO scheduler is\nCFQ, which stands for \"Completely Fair Queueing\". This is the only elevator that\nsupports IO prioritization via cgroups, so if Docker or some other reason for\ncgroups is in play, stick with CFQ. In some cases it makes sense to use the noop\nscheduler, such as in VMs and on hardware RAID controllers, but the difference\nbetween noop and deadline is small enough that I only ever use deadline.\nSome VM-optimized kernels are hard-coded to only have noop and that's fine.</p><pre>echo 1 &gt; /sys/block/sda/queue/nomerges # SSD only! 0 on HDD\necho 8 &gt; /sys/block/sda/queue/read_ahead_kb # up to 128, no higher\necho deadline &gt; /sys/block/sda/queue/scheduler\n</pre><p>I usually start with read_ahead_kb at 8 on SSDs and 64 on hard drives (to line\nup with Cassandra &lt;= 2.2's sstable block size). With mmap IO in &lt;= 2.2 and all\nconfigurations &gt;= 3.0. Setting readahead to 0 is fine on many configurations but\nhas caused problems on older kernels, making 8 a safe choice that doesn't hurt\nlatency.</p><p>Beware: setting readahead very high (e.g. 512K) can look impressive from the\nsystem side by driving high IOPS on the storage while the client latency\ndegrades because the drives are busy doing wasted IO. Don't ask me how I know\nthis without buying me a drink first.</p><h2>TRIM &amp; fstrim</h2><p>Most SSDs and some virtual disks support some form of TRIM that allows the\noperating system to inform the device when a block is no longer referenced so\nthat it can be erased and returned to the free space pool. This helps the disk\ncontroller do efficient wear leveling and avoid latency spikes when the free\ncell pool gets low.</p><p>xfs and ext4 support the 'discard' mount option, but you should not use it. The\nslowest part of an SSD is erasing previously used cells. This can take an\neternity in computer time, tying up command slots and occasionally blocking\napplications. No good. There is an alternative though, and that's\n<a href=\"http://man7.org/linux/man-pages/man8/fstrim.8.html\">'fstrim'</a> or the 'wiper.sh'\nscript from the hdparm package. Run one of them around once a week, one node at a time\njust like repair so that the whole cluster doesn't hiccup at once.</p><h2>partitioning &amp; alignment</h2><p>Whenever possible, I prefer to use GPT disk labels instead of the classic MBR\npartition tables. GPT offers a number of advantages such as support for drives\nlarger than 4TB and better resiliency through writing at both the beginning and\nend of the drive. In addition, the GPT tools on Linux usually align partitions\nto a 1MB boundary automatically, which avoids any potential block misalignment.</p><p>Here is one of the scripts I use for setting up GPT, MDRAID, and xfs. It should\nbe fairly easy to adapt to other environments:\n<a href=\"https://gist.github.com/tobert/4a7ebeb8fe9446687fa8\">https://gist.github.com/tobert/4a7ebeb8fe9446687fa8</a></p><p>Partition misalignment happens on drives with 4KB sectors, which is the size on\nall SSDs and most hard drives manufactured in the last few years. Some of these\ndevices will emulate 512 byte blocks to support older operating systems, so you\ncan't rely on what Linux tells you. Since it's always safe to use 4K alignment\non 512 byte devices, it's best to always align on 4K boundaries or go with 1\nmegabyte to be safe. Intel's EFI specifications recommend aligning on 128MiB\nboundaries, but that's just silly.</p><p>When deploying consumer-grade flash drives, it may make sense to leave some\nspace fallow either by not partitioning part of the drive or by creating a\npartition that will not be used. This free space acts as an additional reservoir\nof spare flash cells for the wear leveling controller in the drive to use and\nmay extend the useful lifetime of the drive.</p><h2>fio</h2><p>fio is the tool of choice for benchmarking filesystems and drives.</p><p><a href=\"http://tobert.github.io/post/2014-04-28-getting-started-with-fio.html\">http://tobert.github.io/post/2014-04-28-getting-started-with-fio.html</a></p><p><a href=\"http://tobert.github.io/post/2014-04-17-fio-output-explained.html\">http://tobert.github.io/post/2014-04-17-fio-output-explained.html</a></p><h2>JBOD</h2><p>It's just a bunch of disks, how hard can it be? Take a look at\n<a href=\"https://issues.apache.org/jira/browse/CASSANDRA-7386\">CASSANDRA-7386</a>,\n<a href=\"https://issues.apache.org/jira/browse/CASSANDRA-8329\">CASSANDRA-8329</a>, and\n<a href=\"https://issues.apache.org/jira/browse/CASSANDRA-6696\">CASSANDRA-6696</a></p><p>JBOD is almost always the fastest option for storage aggregation when the\nsoftware supports it, and in Cassandra's case it does, but with a few caveats. I've\navoided JBOD configs for a while because of these caveats, but it looks like its\ntime may finally be coming with Cassandra 2.2 and 3.0.</p><p>If you're stuck on huge SATA drives, definitely give it a second thought. SATA\ndrives in particular can benefit from JBOD rather than RAID0, since drive\nfailures are likely to be more painful and allowing each drive to have a\nseparate command queue distributes seeks better rather than having all drives do\nall seeks as in RAID0.</p><p><a href=\"http://tobert.github.io/post/2014-06-17-jbod-vs-raid.html\">http://tobert.github.io/post/2014-06-17-jbod-vs-raid.html</a></p><h2>MDRAID</h2><p>I don't have data to support this, but by my estimation, the majority of\nCassandra clusters in production today are using Linux's MDRAID subsystem. It\nhas been around for a long time and is battle-tested. Given a\nchoice, I will take a simple JBOD SAS controller with MDRAID over hardware RAID\nevery time. It's more flexible and all the tools are fairly easy to use. For\nRAID0, there really isn't much hardware can do to accelerate it outside of\ncaching. Even when using RAID5 and RAID6, software RAID is preferrable since\nparity calculation (RAID[56]) is accelerated on Intel CPUs since Westmere.</p><p>RAID0 is common in combination with Cassandra because it provides the simplest\nmanagement combined with good performance and capacity. Drive failures aren't a\nhuge deal since Cassandra is replicating the data.</p><p>A typical RAID0 setup looks like:</p><pre>mdadm --create /dev/md0 --chunk=256 --metadata=1.2 --raid-devices=6 --level=0 /dev/sd[cdefgh]1\n</pre><p>Most of this is straightforward except for the chunk size. Most of the time the\nchunk should be 64-256 bytes and should always be a power of 4096. The fio\nnumbers are best at 128-256K and that tends to be the size of \"erase blocks\" on\nSSDs, so that's what I usually go with.</p><p>Of course, there are other RAID types and sometimes they make sense. It\nsometimes makes sense to use RAID1 for a commit log or RAID10 in situations\nwhere maximum availability is required or servicing hard drives is\ndifficult/expensiver, e.g. remote datacenters with expensive remote hands. RAID5\nand RAID6 don't show up much, but I've used them with Cassandra to good effect.\nYou get some redundancy for the price of lower write speed (reads in modern\nparity RAID are often better than you'd expect). The trick with parity RAID is\nmaking extra sure the filesystem is informed of the stripe width so it can\nallocate space in stripe-sized chunks. Always make sure to check\n<code>/sys/block/md0/md/stripe_cache_size</code> and set it to 16KB or\npossibly higher.</p><h2>HW RAID</h2><p>The biggest difficulty with HW RAID is that most of the CLI tools are really,\nreally awful. Another problem is that they present virtual block devices to the\nOS, making it difficult to probe drives directly or read their SMART data.\nThat said, they can be set up to offer decent performance. It is often helpful\nto check the firmware errata of a RAID card before chasing other parts of the\nsystem as HW manufacturers have to ship firmware to the factory far in advance\nof GA, which sometimes results in buggy firmware being shipped.</p><p>The other thing to keep an eye out for is write-through v.s. write-back caches\non these cards. The NVRAM on HW RAID cards is their biggest advantage, often\ncoming in sizes of 512MB or even bigger these days. When a battery backup is\npresent on the card, write-back caching can provide incredible speedups. The\nbattery is necessary to keep the NVRAM online during a power failure so any\noutstanding IOs can be flushed to stable storage when the power comes back.\nThese batteries have to be serviced every few years, so some users will opt\nout of the cost and hassle. In that case, it can still be set to write-through\ncaching for some additional performance, but most of the time I'd opt for a JBOD\ncard + MDRAID to with plenty of RAM keep things simple.</p><h2>LVM / device-mapper</h2><p>TODO: evaluate <a href=\"https://www.kernel.org/doc/Documentation/device-mapper/cache.txt\">dm-cache</a></p><p>TODO: evaluate dm-delay for latency simulation</p><p>The Linux kernel includes a block IO virtualization layer called device-mapper.\nThe current LVM system is built on top of this and closely resembles the LVM\nfrom HP-UX. Most of the time, LVM is only in the way on Cassandra systems, but\nit does show up frequently since many enterprises use LVM for all of\ntheir disk management. The Redhat/CentOS/Fedora installers default to installing\nthe OS on an LV, so it's bound to show up.</p><p>The critical commands to know are <code>vgdisplay -v</code> and <code>vgscan</code>. vgdisplay will\nshow you all of the volume groups on a system and with -v it will also show you\nthe LVs and PVs. vgscan will scan all the drives in the system looking for LVM\nPV signatures.</p><p>LVM also includes mirroring and striping modules based on dm-raid. These can be\nused in place of MDRAID, but given the complexity in LVM,\nmy recommendation is to stick with MDRAID. This may change as time marches on\nand dm-cache becomes a little easier to use.</p><p>Since LVM is built on device-mapper, you can find LVs by running <code>ls /dev/mapper/</code>.\nThis is why you'll often see <code>/dev/mapper</code> in the device name in\nmount listings. The /dev/$VG/ paths are symlinks to the devmapper devices.\nAnother useful trick is to use the dmsetup command directly. This gives you\nlow-level access behind LVMs back to examine disk layouts. In particular,\n<code>dmsetup ls</code> is useful.</p><p>Cassandra relies on a standard filesystem for storage. The choice of\nfilesystem and how it's configured can have a large impact on performance.</p><p>One common performance option that I find amusing is the <code>noatime</code> option. It\nused to bring large gains in performance by avoiding the need to write to inodes\nevery time a file is accessed. Many years ago, the Linux kernel changed the\ndefault atime behavior from synchronous to what is called <code>relatime</code> which means\nthe kernel will batch atime updates in memory for a while and update inodes only\nperiodically. This removes most of the performance overhead of atime, making the\nnoatime tweak obsolete.</p><p>Another option I've seen abused a few times is the barrier/nobarrier flag. A\nfilesystem barrier is a transaction marker that filesystems use to tell\nunderlying devices which IOs need to be committed together to achieve\nconsistency. Barriers may be disabled on Cassandra systems to get better disk\nthroughput, but this should NOT be done without full understanding of what\nit means. Without barriers in place, filesystems may come back from a power\nfailure with missing or corrupt data, so please read the mount(8) man page first\nand proceed with caution.</p><h2>xfs (just do it)</h2><p>xfs is the preferred filesystem for Cassandra. It is one of most mature\nfilesystems on Earth, having started in SGI Irix, now with well over a decade in\nthe Linux kernel. It offers great performance over time and can be tuned to\nsupport a variety of underlying storage configurations.</p><p>mkfs.xfs will try to detect drive and RAID settings automatically. It almost\nalways gets this wrong because the Linux kernel gets it wrong because most\ndrives lie to the operating system in order to support ancient operating systems\nthat hard-coded 512 byte blocks. All of that is to say, when creating new\nfilesystems, always explicitly set the block size and RAID parameters to be sure\nthey're correct.</p><p>For partitions or whole drives, setting just the block size should be\nsufficient. Nearly every drive sold in the last few years has a 4K block size.\nSetting a 4K block size on a 512 byte device doesn't hurt much, while setting a\n512 byte block size on a 4K device causes extra work for the drive in the form\nof read-modify write for 512 byte block updates. TL;DR, always set -s size=4096.</p><pre>mkfs.xfs -s size=4096 /dev/sdb1\n</pre><p>And on a RAID device (adjust to the local configuration):</p><pre>mkfs.xfs -s size=4096 -d su=262144 -d sw=6 /dev/md0\n</pre><p>This is a potential SSD optimization but the data so far is inconclusive. It\ndoesn't seem to hurt anything though, so I'm mentioning it in hopes that someone\nelse will figure out if it's worth the effort. The idea is to set the stripe\nwidth to the erase block size of the underlying SSD, usually 128K (256 * 512)\nand then set the stripe unit (a.k.a. chunk size) to 4K (8 * 512) to match the\nblock size.</p><pre>mkfs.xfs -f -s size=4096 -d sunit=8 -d swidth=256 /dev/sdb1\n</pre><p>If you're setting the sunit/swidth, it's worth passing the same values through\nto mount via mount -o or /etc/fstab. The man page says these only need to be set\nwhen changing the geometry of a RAID device, but when they're not set the kernel\nreports the wrong values for them, so to be safe always set them in /etc/fstab.</p><h2>ext4 (if you must)</h2><p>The ext4 filesystem is evolved from the ext line of filesystems in the Linux\nkernel. It is almost as fast as ext2 was, and much faster than ext3. ext2 and\next3 filesystems can be upgraded in-place to ext4 and you should do so,\nespecially for ext3. While ext2 is a bit faster than ext4 due to the lack\nof a journal, it is not recommended since it will block reboots on fsck after\npower failures.</p><p>Choose ext4 when the local policy demands it and follow the same RAID alignment\nguidance as xfs.</p><h2>ZFS (if you love yourself and don't need commercial support)</h2><p>ZFS-on-Linux has been around for a few years now and is quite stable. I've\ndeployed Cassandra on ZFS and it's a beautiful fit. The big downside is that\nthere's no commercial support available. One feature that works particularly\nwell with Cassandra is ZFS's SLOG and L2ARC devices, which allow you to use an\nSSD for journaling and caching in front of slower drives. It also offers inline\ncompression which may be handy for getting better compression ratios than those\nbuilt into Cassandra.</p><p><a href=\"http://zfsonlinux.org/\">http://zfsonlinux.org/</a></p><h2>btrfs</h2><p>btrfs has a reputation for being unreliable and should not be deployed to\nproduction systems without extensive testing to see if it's safe for your\nworkload. I've run btrfs in production with Cassandra in the past and it worked\ngreat, particularly on EC2 ephemeral disks with LZO compression enabled. Your\nmileage will certainly vary, so only choose btrfs if you're willing to risk\nsome data and spend the time testing.</p><p>It often surprises me how little discussion there is around network design and\noptimization. Cassandra is completely reliant on the network, and while we do a\ngood job of not trusting it, a little extra work in setting things up can make\nthe experience much smoother and provide better performance and availability.</p><p>Kernel tuning for network throughput is in the Linux section of this doc.</p><h2>NIC selection (1g/10g/vNIC/etc.)</h2><p>Prior to Cassandra 2.1, my guidance around networking was \"use 1gig,\nwhatever\". With 2.1, however, it's quite a bit easier to push machines to the\nlimit of the network. Saturating 1g interfaces is fairly easy with large write\nworkloads. 10gig is now the recommendation for high-performance clusters.</p><p>When you're stuck dealing with virtual machines, avoid emulated NICs at\nall costs. These usually show up as a Realtek or Intel e1000 adapter in the\nguest operating system and the performance is abysmal. On KVM and\nVirtualBox it should be \"virtio-net\", for Xen, it's \"xen-net\" (IIRC), and on\nVMware it should be “vmxnet3”. While the virtual NICs are much better than\nemulated NICs, the best option is often referred to as vNICs. These are usually\n10gig cards that can negotiate with the hypervisor to create shards of the NIC\nthat can be mapped directly into the guest operating system's kernel as a\nhardware device. This bypasses a lot of memory copying and CPU time and allows\nfor nearly 100% bare metal performance inside a VM. In EC2 this is known as\nenhanced networking, which should always be enabled when available.</p><h2>packet coalescing &amp; EC2</h2><p>If you're hitting a performance limit in EC2 and don't have enhanced networking\nenabled, you're probably hitting the secret packets-per-second limit in EC2.\nThere are two ways to get around this. Either enable enhanced networking (which\nrequires VPC), or enable message coalescing in Cassandra (available in 2.1.5 or\nDSE 4.7.0).</p><p><a href=\"http://www.datastax.com/dev/blog/performance-doubling-with-message-coalescing\">http://www.datastax.com/dev/blog/performance-doubling-with-message-coalescing</a></p><p>Today's x86 CPUs are all multi-core and most of the Intel chips offer something\ncalled hyper-threading that makes the core count appear to double. More on that\nbelow. A system with more than one physical CPU installed is said to be\nmulti-socket, in that the motherboard has multiple CPU sockets with associated\nmemory banks (see NUMA above). Just to make sure the terminology is straight: a\nnode is a single motherboard with one or more CPU sockets with memory banks. A\nCPU may have many processing cores.</p><p>While Cassandra runs fine on many kinds of processors, from Raspberry Pis to\nMacbooks to high-end servers, all production loads should use a CPU that supports\nECC memory to avoid silent memory corruption.</p><p>When choosing CPUs, the #1 most important feature to select for is <strong>cache</strong>. An\nL1 cache hit is 0.5 nanoseconds. L2 is 7ns. Reading from RAM takes 100ns. In\nreality, RAM is even slower than that in a multi-core world, since cores often\nneed to synchronize to make sure they're not mutating the same area of memory.\nThe more cache there is on the CPU, the less often this happens.</p><p>There are usually one or two CPU models in the sweet spot of price/performance.\nThe way to find it is to look at the distribution of cache sizes first and find\nthe cache size just below that of the most expensive CPU. For example, if the\ntop-end Xeon has 32MB of cache and costs $2000 each, there's probably a\ndifferent CPU for around $600 that has 24MB of cache with a clock speed right\naround halfway between the slowest and fastest. This where the best value is.\nYou're almost always better off buying more systems to spread the work out\nrather than paying the markup on the fastest CPU available.</p><h2>a word on hyperthreading</h2><p>A hyperthread is a virtual core or \"sibling\" core that allows a single core to\npretend as if it's 2 cores. They exist to get more work out of the silicon by\npipelining multiple tasks in parallel for the same backing silicon. Many\ndatabases advise that this be turned off, but overall it's a benefit for\nCassandra.</p><p>Recommendation: enable HT</p><p>EC2 note: the newer generations of EC2 instances seem to be claiming\nhyperthreading cores as real cores. If you take a look at /proc/cpuinfo on an\ni2.2xlarge, you will see 8 cores assigned to the system. If you look a little\ncloser, you can see the \"sibling id\" field indicates that half of those cores\nare indeed hyperthreading cores, so you in effect only have 4 cores worth of\nsilicon in those VMs.</p><h2>C-states &amp; frequency scaling</h2><p>See also: \"powertop\" under observation tools.</p><p>Over the last few years, the cost of power for datacenters has become a more and\nmore prominent consideration when buying hardware. To that end, even Xeon\nprocessors now have power management features built into them and a lot of the\ntime this stuff is enabled out of the box. The impact on Cassandra is that when\na processor goes into power saving mode, there is a latency cost for waking it\nback up. Sometimes it's a few microseconds, sometimes it's in the milliseconds.\nThis is fine for a lot of applications that don't use the CPU all the time or\nare not latency-sensitive, it can significantly impact throughput and even the\nstability of the clock when misconfigured.</p><p>On RHEL6, CentOS6, and other older LTS distros, the default idle driver for\nIntel chips is called \"intel_idle\". This driver is very aggressive about\nputting the processor to sleep and regularly causes client-visible latency\nhiccups. To disable it, add \"intel_idle.max_cstate=0 processor.max_cstate=0\nidle=mwait\" to the kernel command line then reboot. This is not necessary on\nRHEL7 and similarly modern distros. The idle driver can be verified with\n<code>cat /sys/devices/system/cpu/cpuidle/current\\_driver</code>. There is also a nuclear option\navailable on Linux in the form of booting with <code>idle=poll</code>. This is <strong>absolutely\nnot</strong> recommended for general use; it makes the CPU run at 100% 24x7 which may\nshorten the lifetime of the hardware and wastes a lot of power. That said, it\ncan be handy to use it when you're in doubt and want to entirely eliminate CPU\nsleeping as a source of latency.</p><p>Another thing I've seen recently is machines that get configured with frequency\nscaling on by default. Frequency scaling is great on laptops where minimum power\nconsumption is more important than throughput, but it has particularly nasty\nside-effects for Cassandra. The first is that with the CPUs running at lower\nclock speeds, latency will be higher. Another is that performance will be\ninconsistent. Lastly, and most nastily, it seems to destabilize the tsc clock on\nthe system which may cause time drift.</p><pre># make sure the CPUs run at max frequency\nfor sysfs_cpu in /sys/devices/system/cpu/cpu[0-9]*\ndo\n     echo performance &gt; $sysfs_cpu/cpufreq/scaling_governor\ndone\n</pre><p><a href=\"http://jpbempel.blogspot.gr/2015/09/why-bios-settings-matter-and-not-size.html\">http://jpbempel.blogspot.gr/2015/09/why-bios-settings-matter-and-not-size.html</a></p><h2>clock sources</h2><p>The vast majority of Cassandra instances run on x86 CPUs, where\nthere are multiple clock sources available. The fastest and most common is\ncalled \"<a href=\"https://en.wikipedia.org/wiki/Time_Stamp_Counter\">tsc</a>\" which stands\nfor Time Stamp Counter. This is a register on x86 CPUs, so it is very fast. It\nhas a major downside in that it isn't guaranteed to be stable. This is part of\nthe reason why x86 machines have so much clock drift, making NTP a requirement\nfor every Cassandra node. There are alternatives available, such as HPET, ACPI,\nand the various paravirtual clocks (kvm, xen, hyperv). The problem with these\nclocks is that they sit on a bus and take an order of magnitude or more time to\nread results compared to tsc. For some applications, like Cassandra, that hammer\non the gettimeofday() syscall to get the system time, it can have a direct\nimpact on performance.</p><p>Some of the clouds are starting to move over to using paravirtual clocks,\npresumably to reduce the amount of clock drift in VMs. The xen paravirtual clock\nin particular has been observed by Netflix to cause performance problems, so\nit's a good idea to switch back to tsc, then double-check that NTP is working.</p><pre>echo tsc &gt; /sys/devices/system/clocksource/clocksource0/current_clocksource\n</pre><p>Source: <a href=\"http://www.brendangregg.com/blog/2015-03-03/performance-tuning-linux-instances-on-ec2.html\">http://www.brendangregg.com/blog/2015-03-03/performance-tuning-linux-instances-on-ec2.html</a></p><h2>sysctl.conf</h2><p>The primary interface for tuning the Linux kernel is the /proc virtual\nfilesystem. In recent years, the /sys filesystem has expanded on what /proc\ndoes.  Shell scripts using echo and procedural code are difficult to manage\nautomatically, as we all know from handling cassandra-env.sh. This led the\ndistros to create /etc/sysctl.conf, and in modern distros, /etc/sysctl.conf.d.\nThe sysctl command reads the files in /etc and applies the settings to the\nkernel in a consistent, declarative fashion.</p><p>The following is a block of settings I use almost every server I touch. Most\nof these are safe to apply live and should require little tweaking from site to\nsite. Note: I have NOT tested these extensively with multi-DC, but most of them\nshould be safe. Those items that may need extra testing for multi-DC have\ncomments indicating it.</p><p><a href=\"http://tobert.github.io/post/2014-06-24-linux-defaults.html\">http://tobert.github.io/post/2014-06-24-linux-defaults.html</a></p><pre># The first set of settings is intended to open up the network stack performance by\n# raising memory limits and adjusting features for high-bandwidth/low-latency\n# networks.\nnet.ipv4.tcp_rmem = 4096 87380 16777216\nnet.ipv4.tcp_wmem = 4096 65536 16777216\nnet.core.rmem_max = 16777216\nnet.core.wmem_max = 16777216\nnet.core.netdev_max_backlog = 2500\nnet.core.somaxconn = 65000\nnet.ipv4.tcp_ecn = 0\nnet.ipv4.tcp_window_scaling = 1\nnet.ipv4.ip_local_port_range = 10000 65535\n# this block is designed for and only known to work in a single physical DC\n# TODO: validate on multi-DC and provide alternatives\nnet.ipv4.tcp_syncookies = 0\nnet.ipv4.tcp_timestamps = 0\nnet.ipv4.tcp_sack = 0\nnet.ipv4.tcp_fack = 1\nnet.ipv4.tcp_dsack = 1\nnet.ipv4.tcp_orphan_retries = 1\n# significantly reduce the amount of data the kernel is allowed to store\n# in memory between fsyncs\n# dirty_background_bytes says how many bytes can be dirty before the kernel\n# starts flushing in the background. Set this as low as you can get away with.\n# It is basically equivalent to trickle_fsync=true but more efficient since the\n# kernel is doing it. Older kernels will need to use vm.dirty_background_ratio\n# instead.\nvm.dirty_background_bytes = 10485760\n# Same deal as dirty_background but the whole system will pause when this threshold\n# is hit, so be generous and set this to a much higher value, e.g. 1GB.\n# Older kernels will need to use dirty_ratio instead.\nvm.dirty_bytes = 1073741824\n# disable zone reclaim for IO-heavy applications like Cassandra\nvm.zone_reclaim_mode = 0\n# there is no good reason to limit these on server systems, so set them\n# to 2^31 to avoid any issues\n# Very large values in max_map_count may cause instability in some kernels.\nfs.file-max = 1073741824\nvm.max_map_count = 1073741824\n# only swap if absolutely necessary\n# some kernels have trouble with 0 as a value, so stick with 1\nvm.swappiness = 1\n</pre><p>On vm.max_map_count:</p><p><a href=\"http://linux-kernel.2935.n7.nabble.com/Programs-die-when-max-map-count-is-too-large-td317670.html\">http://linux-kernel.2935.n7.nabble.com/Programs-die-when-max-map-count-is-too-large-td317670.html</a></p><h2>limits.conf (pam_limits)</h2><p>The DSE and DSC packages install an /etc/security/limits.d/ file by default that\nshould remove most of the problems around pam_limits(8). Single-user systems\nsuch as database servers have little use for these limitations, so I often turn\nthem off globally using the following in /etc/security/limits.conf. Some\nusers may already be customizing this file, in which case change all of the\nasterisks to cassandra or whatever the user DSE/Cassandra is running as.</p><pre>* - nofile     1000000\n* - memlock    unlimited\n* - fsize      unlimited\n* - data       unlimited\n* - rss        unlimited\n* - stack      unlimited\n* - cpu        unlimited\n* - nproc      unlimited\n* - as         unlimited\n* - locks      unlimited\n* - sigpending unlimited\n* - msgqueue   unlimited\n</pre><h2>chrt</h2><p>The Linux kernel's default policy for new processes is SCHED_OTHER. The\nSCHED_OTHER policy is designed to make interactive tasks such as X windows and\naudio/video playback work well. This means the scheduler assigns tasks very\nshort time slices on the CPU so that other tasks that may need immediate service\ncan get time. This is great for watching cat videos on Youtube, but not so great\nfor a database, where interactive response is on a scale of milliseconds rather\nthan microseconds. Furthermore, Cassandra's threads park themselves properly.\nSetting the scheduling policy to SCHED_BATCH seems more appropriate and can\nopen up a little more throughput. I don't have good numbers on this yet, but\nobservations of dstat on a few clusters have convinced me it's useful and\ndoesn't impact client latency.</p><pre>chrt --batch 0 $COMMAND\nchrt --batch 0 --all-tasks --pid $PID\n</pre><p>You can inject these into cassandra-env.sh or\n/etc/{default,sysconfig}/{dse,cassandra} by using the $$ variable that returns\nthe current shell's pid. Child processes inherity scheduling policies, so if you\nset the startup shell's policy, the JVM will inherit it. Just add this line to\none of those files:</p><pre>chrt --batch 0 --pid $$\n</pre><h2>taskset &amp; isolcpus &amp; irqbalance</h2><p>Sometimes a machine ends up spending more time processing IO requests\n(interrupts) than it does getting application (Cassandra) work done. This often\nmanifests as high system CPU time and a large amount of context switches and\ninterrupts observed in vmstat/dstat. There are two major approaches to handling\nthis. One is to reserve a core or two for interrupt processing. This is common\nin real-time use cases such as music production or high-frequency trading. The\nother is to evenly distribute interrupts over the cores in a system. One of the\ndifficult-to-observe benefits of a reserved CPU core is that the kernel's code\ncan stay hot in cache on that core. The reserved core may never surpass 10%\nutilization, but the latency benefits are sometimes worth it.</p><h2>taskset</h2><p>The easiest way to get started on a running system is with the taskset utility.\nTaskset can be used to tell the Linux scheduler which CPUs are available to a\nprocess or thread. The results of taskset are usually observable within a couple\nseconds. After moving load off one of the cores, it may be necessary to manually\nmove interrupts over to the core.</p><pre>taskset -apc 2-7 $CASS_PID\ntaskset -c 2-7 ./cassandra -f\ntaskset -pc 2-7 $$ # in /etc/{default,sysconfig}/{dse,cassandra}\n</pre><h2>isolcpus</h2><p>If you can reboot, you can reserve cores for the kernel by using the isolcpus=\nkernel command line option. In my tests, the kernel automatically schedules its\ntasks on the reserved CPU. You may need to do some additional IRQ management to\nget it right, but it's worth the effort when it works out. In order to enable it\nyou will need to edit the grub configuration in /etc/default/grub on Debian or\n/etc/sysconfig/grub on Redhat. For quick tests you can edit /boot/grub/grub.conf\n(grub2) or /boot/grub/menu.lst (grub1 &amp; pvgrub (EC2)) and add it to the end of\nthe kernel options. This will probably get reverted on the next kernel\nupgrade, so make sure to do it in the way prescribed by the Linux distribution.\nMy test machine's /proc/cmdline looks like this:</p><pre>BOOT_IMAGE=../vmlinuz-linux root=PARTUUID=91012260-6834-425a-b488-9dd9f537a294 rw isolcpus=0-1 initrd=../initramfs-linux.img\n</pre><p>Make sure irqbalance is disabled or configured to wire interrupts to the\nselected CPU. On NUMA machines it's important to make sure the reserved core is\non the socket that manages the PCIe devices. Core 0 is almost always the right\nchoice.</p><h2>irqbalance</h2><p>Some of the advice you'll find on the internet says to disable irqbalance for\nperformance. This was good advice for early versions of it that were prone to\nrebalancing storms, but it isn't bad these days. If in doubt, disable the\nirqbalance service and run \"irqbalance --oneshot\" once at boot time and forget\nabout it.</p><h2>dirty tricks with renice, ionice, and taskset</h2><p>This is one of my favorite dirty tricks, combined with renice/ionice. If you've\never noticed the \"-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42\" settings\nin Cassandra's JVM arguments, they're there so that compaction can be set to a\nlower priority than other threads in Cassandra. This is visible in the default\nhtop configuration in the nice field.</p><p><img src=\"https://tobert.github.io/pages/image_14.png\" alt=\"image alt text\" /></p><p>Linux has always used a 1:1 threading model and even uses the global pid space\nto assign ids to threads, so they're still visible if you know how to look for\nthem. htop displays thread ids by default and standard top can do it if you hit\nH. With ps, the -L flag makes them show up. We can take advantage of this to do\nsome prioritization and pinning outside of Cassandra so we don't have to wait\naround for features to make their way into DSE.</p><pre>for tid in $(ps -eLo tid,args,nice |awk '/java.*4$/{print $1}')\ndo\n  taskset -pc 5-6 $tid     # pin to cores 5 and 6\n  renice 20 $tid           # set the lowest nice priority\n  ionice -c 2 -n 7 -p $tid # set IO to the lowest best-effort priority\ndone\n</pre><p>After that, your htop will look more like this (load is done but compaction is still going):</p><p><img src=\"https://tobert.github.io/pages/image_15.png\" alt=\"image alt text\" /></p><p>Note: ionice requires the cfq IO scheduler, which is usually not the case in\nproperly tuned systems. It is one of the main reasons for sticking with CFQ\noutside of Docker with io limits.</p><p>I did not come up with all of this on my own. The list of folks I've learned from\neven in just the last two years is excessively long. Thank you, all.</p>","id":"8777d174-861f-5ec5-9d01-586f14aa4bfa","title":"Amy's Cassandra 2.1 tuning guide","origin_url":"https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html","url":"https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html","wallabag_created_at":"2016-12-22T13:56:48+00:00","published_at":"2015-09-24T00:00:00+00:00","published_by":"['']","reading_time":72,"domain_name":"tobert.github.io","preview_picture":"https://tobert.github.io/pages/image_1.png","tags":["cassandra","troubleshooting and tuning","performance"],"description":"Amy's Cassandra 2.1 Tuning Guide (2015)I really appreciate all the folks who have told me that this guide helped them\nin some way. I'm happy to hear that.I've pushed this small update to change my nam..."},{"content":"<p dir=\"auto\">Welcome to the Awesome Accord repository! This guide provides resources and examples for implementing ACID transactions in Apache Cassandra. Learn how to leverage distributed transactions for building reliable applications.</p><ul dir=\"auto\"><li><strong>Quick Start with Docker</strong>: Single-node deployment for immediate testing</li>\n<li><strong>Lab Environment</strong>: Multi-node cluster setup for development</li>\n<li><strong>Use Cases &amp; Examples</strong>: Production-ready implementations</li>\n<li><strong>Learning Resources</strong>: Documentation and best practices</li>\n</ul><p dir=\"auto\">Accord is in active development and still a feature branch in the Apasche Cassandra® Repo. You will find bug. What we ask is that you help with a contribution of a bug report.</p><p dir=\"auto\">You can use the <a href=\"https://github.com/pmcfadin/awesome-accord/discussions\">Github discussions</a> bug report forum for this or use the Planet Cassandra Discord channel for accord listed below. A bug report should have the folowing:</p><ul dir=\"auto\"><li>The data model used</li>\n<li>Actions to reproduce the bug</li>\n<li>Full stack trace from system.log</li>\n</ul><p dir=\"auto\">If you have suggestions about syntax or improving the overall developer expirience, we want to hear about that to! Add it as a suggestion or feature request using <a href=\"https://github.com/pmcfadin/awesome-accord/discussions\">Github discussions</a> or let us know in the Planet Cassandra Discord.</p><p dir=\"auto\">Now, on to the fun!</p><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"docker pull pmcfadin/cassandra-accord docker run -d --name cassandra-accord -p 9042:9042 pmcfadin/cassandra-accord\"><pre>docker pull pmcfadin/cassandra-accord\ndocker run -d --name cassandra-accord -p 9042:9042 pmcfadin/cassandra-accord</pre></div><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"brew tap rustyrazorblade/rustyrazorblade brew install easy-cass-lab\"><pre>brew tap rustyrazorblade/rustyrazorblade\nbrew install easy-cass-lab</pre></div><ul dir=\"auto\"><li><strong>Banking Transactions</strong>: Account transfers with ACID guarantees</li>\n<li><strong>Inventory Management</strong>: Race-free inventory tracking</li>\n<li><strong>User Management</strong>: Multi-table atomic operations</li>\n</ul><ul dir=\"auto\"><li>Provide feedback and bug reports in the <a href=\"https://github.com/pmcfadin/awesome-accord/discussions\">repository forum</a></li>\n<li><a href=\"https://discord.gg/GrRCajJqmQ\" rel=\"nofollow\">Join our Discord Community</a> for discussions and support</li>\n<li>Review our <a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/CONTRIBUTING.md\">Contributor Guide</a></li>\n<li>Submit issues and improvements through GitHub</li>\n</ul><div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"/ ├── docker/ # Docker configuration and setup ├── easy-cass-lab/ # Multi-node testing environment ├── examples/ # Implementation examples │ ├── banking/ # Financial transaction examples │ ├── inventory/ # Stock management examples │ └── user-mgmt/ # User operations examples └── docs/ # Guides and documentation\"><pre>/\n├── docker/              # Docker configuration and setup\n├── easy-cass-lab/      # Multi-node testing environment\n├── examples/           # Implementation examples\n│   ├── banking/       # Financial transaction examples\n│   ├── inventory/     # Stock management examples\n│   └── user-mgmt/     # User operations examples\n└── docs/              # Guides and documentation\n</pre></div><p dir=\"auto\">Our <a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/docs/README.md\">documentation</a> includes:</p><ul dir=\"auto\"><li>Comprehensive setup instructions</li>\n<li>Transaction patterns and implementations</li>\n<li>Performance optimization guides</li>\n<li>Troubleshooting and best practices</li>\n</ul><ol dir=\"auto\"><li>Choose your deployment option:\n<ul dir=\"auto\"><li><a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/docker/README.md\">Docker Guide</a></li>\n<li><a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/easy-cass-lab/README.md\">Easy-Cass-Lab Guide</a></li>\n</ul></li>\n<li>Follow the <a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/docs/quickstart.md\">Quick Start Guide</a></li>\n<li>Explore <a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/examples\">example implementations</a></li>\n<li>Connect with our <a href=\"https://discord.gg/GrRCajJqmQ\" rel=\"nofollow\">Discord community</a></li>\n<li>Feedback! <a href=\"https://github.com/pmcfadin/awesome-accord/discussions\">Github Discussions</a></li>\n</ol><div class=\"highlight highlight-source-sql notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"BEGIN TRANSACTION LET fromBalance = (SELECT account_balance FROM ks.accounts WHERE account_holder='alice'); IF fromBalance.account_balance &gt;= 20 THEN UPDATE ks.accounts SET account_balance -= 20 WHERE account_holder='alice'; UPDATE ks.accounts SET account_balance += 20 WHERE account_holder='bob'; END IF COMMIT TRANSACTION;\"><pre>BEGIN TRANSACTION\n    LET fromBalance = (SELECT account_balance \n                      FROM ks.accounts \n                      WHERE account_holder='alice');\n    IF fromBalance.account_balance &gt;= 20 THEN\n        UPDATE ks.accounts \n        SET account_balance -= 20 \n        WHERE account_holder='alice';\n        UPDATE ks.accounts \n        SET account_balance += 20 \n        WHERE account_holder='bob';\n    END IF\nCOMMIT TRANSACTION;</pre></div><p dir=\"auto\">Apache License 2.0</p>","id":"b239385a-84ae-5556-b454-b52325f69e75","title":"GitHub - pmcfadin/awesome-accord: Repository of all kinds of things to help you get up and running with ACID transactions on Apache Cassandra®","origin_url":"https://github.com/pmcfadin/awesome-accord","url":"https://github.com/pmcfadin/awesome-accord","wallabag_created_at":"2025-01-16T16:28:31+00:00","published_at":null,"published_by":"['pmcfadin']","reading_time":1,"domain_name":"github.com","preview_picture":"https://opengraph.githubassets.com/3e477fb2dd2b1ded1c5b53477f4848297badc75ece00c5b49bad1476fdb76167/pmcfadin/awesome-accord","tags":["acid","open.source","cassandra","accord"],"description":"Welcome to the Awesome Accord repository! This guide provides resources and examples for implementing ACID transactions in Apache Cassandra. Learn how to leverage distributed transactions for building..."}],"tagSets":[{"tag":"cassandra","articles":[{"content":"<p dir=\"auto\">Welcome to the Awesome Accord repository! This guide provides resources and examples for implementing ACID transactions in Apache Cassandra. Learn how to leverage distributed transactions for building reliable applications.</p><ul dir=\"auto\"><li><strong>Quick Start with Docker</strong>: Single-node deployment for immediate testing</li>\n<li><strong>Lab Environment</strong>: Multi-node cluster setup for development</li>\n<li><strong>Use Cases &amp; Examples</strong>: Production-ready implementations</li>\n<li><strong>Learning Resources</strong>: Documentation and best practices</li>\n</ul><p dir=\"auto\">Accord is in active development and still a feature branch in the Apasche Cassandra® Repo. You will find bug. What we ask is that you help with a contribution of a bug report.</p><p dir=\"auto\">You can use the <a href=\"https://github.com/pmcfadin/awesome-accord/discussions\">Github discussions</a> bug report forum for this or use the Planet Cassandra Discord channel for accord listed below. A bug report should have the folowing:</p><ul dir=\"auto\"><li>The data model used</li>\n<li>Actions to reproduce the bug</li>\n<li>Full stack trace from system.log</li>\n</ul><p dir=\"auto\">If you have suggestions about syntax or improving the overall developer expirience, we want to hear about that to! Add it as a suggestion or feature request using <a href=\"https://github.com/pmcfadin/awesome-accord/discussions\">Github discussions</a> or let us know in the Planet Cassandra Discord.</p><p dir=\"auto\">Now, on to the fun!</p><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"docker pull pmcfadin/cassandra-accord docker run -d --name cassandra-accord -p 9042:9042 pmcfadin/cassandra-accord\"><pre>docker pull pmcfadin/cassandra-accord\ndocker run -d --name cassandra-accord -p 9042:9042 pmcfadin/cassandra-accord</pre></div><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"brew tap rustyrazorblade/rustyrazorblade brew install easy-cass-lab\"><pre>brew tap rustyrazorblade/rustyrazorblade\nbrew install easy-cass-lab</pre></div><ul dir=\"auto\"><li><strong>Banking Transactions</strong>: Account transfers with ACID guarantees</li>\n<li><strong>Inventory Management</strong>: Race-free inventory tracking</li>\n<li><strong>User Management</strong>: Multi-table atomic operations</li>\n</ul><ul dir=\"auto\"><li>Provide feedback and bug reports in the <a href=\"https://github.com/pmcfadin/awesome-accord/discussions\">repository forum</a></li>\n<li><a href=\"https://discord.gg/GrRCajJqmQ\" rel=\"nofollow\">Join our Discord Community</a> for discussions and support</li>\n<li>Review our <a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/CONTRIBUTING.md\">Contributor Guide</a></li>\n<li>Submit issues and improvements through GitHub</li>\n</ul><div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"/ ├── docker/ # Docker configuration and setup ├── easy-cass-lab/ # Multi-node testing environment ├── examples/ # Implementation examples │ ├── banking/ # Financial transaction examples │ ├── inventory/ # Stock management examples │ └── user-mgmt/ # User operations examples └── docs/ # Guides and documentation\"><pre>/\n├── docker/              # Docker configuration and setup\n├── easy-cass-lab/      # Multi-node testing environment\n├── examples/           # Implementation examples\n│   ├── banking/       # Financial transaction examples\n│   ├── inventory/     # Stock management examples\n│   └── user-mgmt/     # User operations examples\n└── docs/              # Guides and documentation\n</pre></div><p dir=\"auto\">Our <a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/docs/README.md\">documentation</a> includes:</p><ul dir=\"auto\"><li>Comprehensive setup instructions</li>\n<li>Transaction patterns and implementations</li>\n<li>Performance optimization guides</li>\n<li>Troubleshooting and best practices</li>\n</ul><ol dir=\"auto\"><li>Choose your deployment option:\n<ul dir=\"auto\"><li><a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/docker/README.md\">Docker Guide</a></li>\n<li><a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/easy-cass-lab/README.md\">Easy-Cass-Lab Guide</a></li>\n</ul></li>\n<li>Follow the <a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/docs/quickstart.md\">Quick Start Guide</a></li>\n<li>Explore <a href=\"https://github.com/pmcfadin/awesome-accord/blob/main/examples\">example implementations</a></li>\n<li>Connect with our <a href=\"https://discord.gg/GrRCajJqmQ\" rel=\"nofollow\">Discord community</a></li>\n<li>Feedback! <a href=\"https://github.com/pmcfadin/awesome-accord/discussions\">Github Discussions</a></li>\n</ol><div class=\"highlight highlight-source-sql notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"BEGIN TRANSACTION LET fromBalance = (SELECT account_balance FROM ks.accounts WHERE account_holder='alice'); IF fromBalance.account_balance &gt;= 20 THEN UPDATE ks.accounts SET account_balance -= 20 WHERE account_holder='alice'; UPDATE ks.accounts SET account_balance += 20 WHERE account_holder='bob'; END IF COMMIT TRANSACTION;\"><pre>BEGIN TRANSACTION\n    LET fromBalance = (SELECT account_balance \n                      FROM ks.accounts \n                      WHERE account_holder='alice');\n    IF fromBalance.account_balance &gt;= 20 THEN\n        UPDATE ks.accounts \n        SET account_balance -= 20 \n        WHERE account_holder='alice';\n        UPDATE ks.accounts \n        SET account_balance += 20 \n        WHERE account_holder='bob';\n    END IF\nCOMMIT TRANSACTION;</pre></div><p dir=\"auto\">Apache License 2.0</p>","id":"b239385a-84ae-5556-b454-b52325f69e75","title":"GitHub - pmcfadin/awesome-accord: Repository of all kinds of things to help you get up and running with ACID transactions on Apache Cassandra®","origin_url":"https://github.com/pmcfadin/awesome-accord","url":"https://github.com/pmcfadin/awesome-accord","wallabag_created_at":"2025-01-16T16:28:31+00:00","published_at":null,"published_by":"['pmcfadin']","reading_time":1,"domain_name":"github.com","preview_picture":"https://opengraph.githubassets.com/3e477fb2dd2b1ded1c5b53477f4848297badc75ece00c5b49bad1476fdb76167/pmcfadin/awesome-accord","tags":["acid","open.source","cassandra","accord"],"description":"Welcome to the Awesome Accord repository! This guide provides resources and examples for implementing ACID transactions in Apache Cassandra. Learn how to leverage distributed transactions for building..."},{"content":"<p dir=\"auto\">Visual Flow is an ETL tool designed for effective data manipulation via convenient and user-friendly interface. The tool has the following capabilities:</p><ul dir=\"auto\"><li>Can integrate data from heterogeneous sources:\n<ul dir=\"auto\"><li>AWS S3</li>\n<li>Cassandra</li>\n<li>Click House</li>\n<li>DB2</li>\n<li>Dataframe (for reading)</li>\n<li>Elastic Search</li>\n<li>IBM COS</li>\n<li>Kafka</li>\n<li>Local File</li>\n<li>MS SQL</li>\n<li>Mongo</li>\n<li>MySQL/Maria</li>\n<li>Oracle</li>\n<li>PostgreSQL</li>\n<li>Redis</li>\n<li>Redshift</li>\n</ul></li>\n<li>Leverage direct connectivity to enterprise applications as sources and targets</li>\n<li>Perform data processing and transformation</li>\n<li>Run custom code</li>\n<li>Leverage metadata for analysis and maintenance</li>\n</ul><p dir=\"auto\">Visual Flow application is divided into the following repositories:</p><p dir=\"auto\"><a href=\"https://github.com/ibagroup-eu/Visual-Flow/blob/main/CONTRIBUTING.md\">Check the official guide</a>.</p><p dir=\"auto\">Visual flow is an open-source software licensed under the <a href=\"https://github.com/ibagroup-eu/Visual-Flow/blob/main/LICENSE\">Apache-2.0 license</a>.</p>","id":"0470ca53-9271-53a1-98cb-20f2a6ae5380","title":"GitHub - ibagroup-eu/Visual-Flow: Visual-Flow main repository","origin_url":"https://github.com/ibagroup-eu/Visual-Flow","url":"https://github.com/ibagroup-eu/Visual-Flow","wallabag_created_at":"2024-12-02T13:34:31+00:00","published_at":null,"published_by":"['ibagroup-eu']","reading_time":null,"domain_name":"github.com","preview_picture":"https://opengraph.githubassets.com/9187fdecad3a37939c1971bcdec19ffed4090307ee508b009f47c7bcd49a7f8d/ibagroup-eu/Visual-Flow","tags":["mongo","nocode","elasticsearch","open.source","cassandra","data.pipeline","elastic","aws.s3","etl","low.code","postgres"],"description":"Visual Flow is an ETL tool designed for effective data manipulation via convenient and user-friendly interface. The tool has the following capabilities:Can integrate data from heterogeneous sources:\nA..."},{"content":"<p dir=\"auto\"><a href=\"https://github.com/datastax/cql-proxy/actions/workflows/test.yml\"><img src=\"https://github.com/datastax/cql-proxy/actions/workflows/test.yml/badge.svg\" alt=\"GitHub Action\" class=\"c13\" referrerpolicy=\"no-referrer\" /></a> <a href=\"https://goreportcard.com/report/github.com/datastax/cql-proxy\" rel=\"nofollow\"><img src=\"https://camo.githubusercontent.com/e1c32ff51117d37ba38fd853bb54c63214d25a3a367d0de90a00a03124924acb/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f64617461737461782f63716c2d70726f7879\" alt=\"Go Report Card\" data-canonical-src=\"https://goreportcard.com/badge/github.com/datastax/cql-proxy\" class=\"c13\" referrerpolicy=\"no-referrer\" /></a></p><p dir=\"auto\"><a target=\"_blank\" rel=\"noopener noreferrer\" href=\"https://github.com/datastax/cql-proxy/blob/main/cql-proxy.png\"><img src=\"https://github.com/datastax/cql-proxy/raw/main/cql-proxy.png\" alt=\"cql-proxy\" class=\"c13\" referrerpolicy=\"no-referrer\" /></a></p><p dir=\"auto\"><code>cql-proxy</code> is designed to forward your application's CQL traffic to an appropriate database service. It listens on a local address and securely forwards that traffic.</p><p dir=\"auto\">The <code>cql-proxy</code> sidecar enables unsupported CQL drivers to work with <a href=\"https://astra.datastax.com/\" rel=\"nofollow\">DataStax Astra</a>. These drivers include both legacy DataStax <a href=\"https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/common/driverMatrix.html\" rel=\"nofollow\">drivers</a> and community-maintained CQL drivers, such as the <a href=\"https://github.com/gocql/gocql\">gocql</a> driver and the <a href=\"https://github.com/scylladb/scylla-rust-driver\">rust-driver</a>.</p><p dir=\"auto\"><code>cql-proxy</code> also enables applications that are currently using <a href=\"https://cassandra.apache.org/\" rel=\"nofollow\">Apache Cassandra</a> or <a href=\"https://www.datastax.com/products/datastax-enterprise\" rel=\"nofollow\">DataStax Enterprise (DSE)</a> to use Astra without requiring any code changes. Your application just needs to be configured to use the proxy.</p><p dir=\"auto\">If you're building a new application using DataStax <a href=\"https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/common/driverMatrix.html\" rel=\"nofollow\">drivers</a>, <code>cql-proxy</code> is not required, as the drivers can communicate directly with Astra. DataStax drivers have excellent support for Astra out-of-the-box, and are well-documented in the <a href=\"https://docs.datastax.com/en/astra/docs/connecting-to-astra-databases-using-datastax-drivers.html\" rel=\"nofollow\">driver-guide</a> guide.</p><p dir=\"auto\">Use the <code>-h</code> or <code>--help</code> flag to display a listing all flags and their corresponding descriptions and environment variables (shown below as items starting with <code>$</code>):</p><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"$ ./cql-proxy -h Usage: cql-proxy Flags: -h, --help Show context-sensitive help. -b, --astra-bundle=STRING Path to secure connect bundle for an Astra database. Requires '--username' and '--password'. Ignored if using the token or contact points option ($ASTRA_BUNDLE). -t, --astra-token=STRING Token used to authenticate to an Astra database. Requires '--astra-database-id'. Ignored if using the bundle path or contact points option ($ASTRA_TOKEN). -i, --astra-database-id=STRING Database ID of the Astra database. Requires '--astra-token' ($ASTRA_DATABASE_ID) --astra-api-url=&quot;https://api.astra.datastax.com&quot; URL for the Astra API ($ASTRA_API_URL) --astra-timeout=10s Timeout for contacting Astra when retrieving the bundle and metadata ($ASTRA_TIMEOUT) -c, --contact-points=CONTACT-POINTS,... Contact points for cluster. Ignored if using the bundle path or token option ($CONTACT_POINTS). -u, --username=STRING Username to use for authentication ($USERNAME) -p, --password=STRING Password to use for authentication ($PASSWORD) -r, --port=9042 Default port to use when connecting to cluster ($PORT) -n, --protocol-version=&quot;v4&quot; Initial protocol version to use when connecting to the backend cluster (default: v4, options: v3, v4, v5, DSEv1, DSEv2) ($PROTOCOL_VERSION) -m, --max-protocol-version=&quot;v4&quot; Max protocol version supported by the backend cluster (default: v4, options: v3, v4, v5, DSEv1, DSEv2) ($MAX_PROTOCOL_VERSION) -a, --bind=&quot;:9042&quot; Address to use to bind server ($BIND) -f, --config=CONFIG YAML configuration file ($CONFIG_FILE) --debug Show debug logging ($DEBUG) --health-check Enable liveness and readiness checks ($HEALTH_CHECK) --http-bind=&quot;:8000&quot; Address to use to bind HTTP server used for health checks ($HTTP_BIND) --heartbeat-interval=30s Interval between performing heartbeats to the cluster ($HEARTBEAT_INTERVAL) --idle-timeout=60s Duration between successful heartbeats before a connection to the cluster is considered unresponsive and closed ($IDLE_TIMEOUT) --readiness-timeout=30s Duration the proxy is unable to connect to the backend cluster before it is considered not ready ($READINESS_TIMEOUT) --idempotent-graph If true it will treat all graph queries as idempotent by default and retry them automatically. It may be dangerous to retry some graph queries -- use with caution ($IDEMPOTENT_GRAPH). --num-conns=1 Number of connection to create to each node of the backend cluster ($NUM_CONNS) --proxy-cert-file=STRING Path to a PEM encoded certificate file with its intermediate certificate chain. This is used to encrypt traffic for proxy clients ($PROXY_CERT_FILE) --proxy-key-file=STRING Path to a PEM encoded private key file. This is used to encrypt traffic for proxy clients ($PROXY_KEY_FILE) --rpc-address=STRING Address to advertise in the 'system.local' table for 'rpc_address'. It must be set if configuring peer proxies ($RPC_ADDRESS) --data-center=STRING Data center to use in system tables ($DATA_CENTER) --tokens=TOKENS,... Tokens to use in the system tables. It's not recommended ($TOKENS)\"><pre>$ ./cql-proxy -h\nUsage: cql-proxy\nFlags:\n  -h, --help                                              Show context-sensitive help.\n  -b, --astra-bundle=STRING                               Path to secure connect bundle for an Astra database. Requires '--username' and '--password'. Ignored if using the\n                                                          token or contact points option ($ASTRA_BUNDLE).\n  -t, --astra-token=STRING                                Token used to authenticate to an Astra database. Requires '--astra-database-id'. Ignored if using the bundle path\n                                                          or contact points option ($ASTRA_TOKEN).\n  -i, --astra-database-id=STRING                          Database ID of the Astra database. Requires '--astra-token' ($ASTRA_DATABASE_ID)\n      --astra-api-url=\"https://api.astra.datastax.com\"    URL for the Astra API ($ASTRA_API_URL)\n      --astra-timeout=10s                                 Timeout for contacting Astra when retrieving the bundle and metadata ($ASTRA_TIMEOUT)\n  -c, --contact-points=CONTACT-POINTS,...                 Contact points for cluster. Ignored if using the bundle path or token option ($CONTACT_POINTS).\n  -u, --username=STRING                                   Username to use for authentication ($USERNAME)\n  -p, --password=STRING                                   Password to use for authentication ($PASSWORD)\n  -r, --port=9042                                         Default port to use when connecting to cluster ($PORT)\n  -n, --protocol-version=\"v4\"                             Initial protocol version to use when connecting to the backend cluster (default: v4, options: v3, v4, v5, DSEv1,\n                                                          DSEv2) ($PROTOCOL_VERSION)\n  -m, --max-protocol-version=\"v4\"                         Max protocol version supported by the backend cluster (default: v4, options: v3, v4, v5, DSEv1, DSEv2)\n                                                          ($MAX_PROTOCOL_VERSION)\n  -a, --bind=\":9042\"                                      Address to use to bind server ($BIND)\n  -f, --config=CONFIG                                     YAML configuration file ($CONFIG_FILE)\n      --debug                                             Show debug logging ($DEBUG)\n      --health-check                                      Enable liveness and readiness checks ($HEALTH_CHECK)\n      --http-bind=\":8000\"                                 Address to use to bind HTTP server used for health checks ($HTTP_BIND)\n      --heartbeat-interval=30s                            Interval between performing heartbeats to the cluster ($HEARTBEAT_INTERVAL)\n      --idle-timeout=60s                                  Duration between successful heartbeats before a connection to the cluster is considered unresponsive and closed\n                                                          ($IDLE_TIMEOUT)\n      --readiness-timeout=30s                             Duration the proxy is unable to connect to the backend cluster before it is considered not ready\n                                                          ($READINESS_TIMEOUT)\n      --idempotent-graph                                  If true it will treat all graph queries as idempotent by default and retry them automatically. It may be\n                                                          dangerous to retry some graph queries -- use with caution ($IDEMPOTENT_GRAPH).\n      --num-conns=1                                       Number of connection to create to each node of the backend cluster ($NUM_CONNS)\n      --proxy-cert-file=STRING                            Path to a PEM encoded certificate file with its intermediate certificate chain. This is used to encrypt traffic\n                                                          for proxy clients ($PROXY_CERT_FILE)\n      --proxy-key-file=STRING                             Path to a PEM encoded private key file. This is used to encrypt traffic for proxy clients ($PROXY_KEY_FILE)\n      --rpc-address=STRING                                Address to advertise in the 'system.local' table for 'rpc_address'. It must be set if configuring peer proxies\n                                                          ($RPC_ADDRESS)\n      --data-center=STRING                                Data center to use in system tables ($DATA_CENTER)\n      --tokens=TOKENS,...                                 Tokens to use in the system tables. It's not recommended ($TOKENS)</pre></div><p dir=\"auto\">To pass configuration to <code>cql-proxy</code>, either command-line flags, environment variables, or a configuration file can be used. Using the <code>docker</code> method as an example, the following samples show how the token and database ID are defined with each method.</p><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"docker run -p 9042:9042 --rm datastax/cql-proxy:v0.1.5 --astra-token &lt;astra-token&gt; --astra-database-id &lt;astra-datbase-id&gt;\"><pre>docker run -p 9042:9042 \\\n  --rm datastax/cql-proxy:v0.1.5 \\\n  --astra-token &lt;astra-token&gt; --astra-database-id &lt;astra-datbase-id&gt;</pre></div><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"docker run -p 9042:9042 --rm datastax/cql-proxy:v0.1.5 -e ASTRA_TOKEN=&lt;astra-token&gt; -e ASTRA_DATABASE_ID=&lt;astra-datbase-id&gt;\"><pre>docker run -p 9042:9042  \\\n  --rm datastax/cql-proxy:v0.1.5 \\\n  -e ASTRA_TOKEN=&lt;astra-token&gt; -e ASTRA_DATABASE_ID=&lt;astra-datbase-id&gt;</pre></div><p dir=\"auto\">Proxy settings can also be passed using a configuration file with the <code>--config /path/to/proxy.yaml</code> flag. This can be mixed and matched with command-line flags and environment variables. Here are some example configuration files:</p><div class=\"highlight highlight-source-yaml notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"contact-points: - 127.0.0.1 username: cassandra password: cassandra port: 9042 bind: 127.0.0.1:9042 # ...\"><pre>contact-points:\n  - 127.0.0.1\nusername: cassandra\npassword: cassandra\nport: 9042\nbind: 127.0.0.1:9042\n# ...</pre></div><p dir=\"auto\">or with a Astra token:</p><div class=\"highlight highlight-source-yaml notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"astra-token: &lt;astra-token&gt; astra-database-id: &lt;astra-database-id&gt; bind: 127.0.0.1:9042 # ...\"><pre>astra-token: &lt;astra-token&gt;\nastra-database-id: &lt;astra-database-id&gt;\nbind: 127.0.0.1:9042\n# ...</pre></div><p dir=\"auto\">All configuration keys match their command-line flag counterpart, e.g. <code>--astra-bundle</code> is <code>astra-bundle:</code>, <code>--contact-points</code> is <code>contact-points:</code> etc.</p><p dir=\"auto\">Multi-region failover with DC-aware load balancing policy is the most useful case for a multiple proxy setup.</p><p dir=\"auto\">When configuring <code>peers:</code> it is required to set <code>--rpc-address</code> (or <code>rpc-address:</code> in the yaml) for each proxy and it must match is corresponding <code>peers:</code> entry. Also, <code>peers:</code> is only available in the configuration file and cannot be set using a command-line flag.</p><p dir=\"auto\">Here's an example of configuring multi-region failover with two proxies. A proxy is started for each region of the cluster connecting to it using that region's bundle. They all share a common configuration file that contains the full list of proxies.</p><p dir=\"auto\"><em>Note:</em> Only bundles are supported for multi-region setups.</p><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"cql-proxy --astra-bundle astra-region1-bundle.zip --username token --password &lt;astra-token&gt; --bind 127.0.0.1:9042 --rpc-address 127.0.0.1 --data-center dc-1 --config proxy.yaml\"><pre>cql-proxy --astra-bundle astra-region1-bundle.zip --username token --password &lt;astra-token&gt; \\\n  --bind 127.0.0.1:9042 --rpc-address 127.0.0.1 --data-center dc-1 --config proxy.yaml</pre></div><div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"cql-proxy ---astra-bundle astra-region2-bundle.zip --username token --password &lt;astra-token&gt; --bind 127.0.0.2:9042 --rpc-address 127.0.0.2 --data-center dc-2 --config proxy.yaml\"><pre>cql-proxy ---astra-bundle astra-region2-bundle.zip --username token --password &lt;astra-token&gt; \\\n  --bind 127.0.0.2:9042 --rpc-address 127.0.0.2 --data-center dc-2 --config proxy.yaml</pre></div><p dir=\"auto\">The peers settings are configured using a yaml file. It's a good idea to explicitly provide the <code>--data-center</code> flag, otherwise; these values are pulled from the backend cluster and would need to be pulled from the <code>system.local</code> and <code>system.peers</code> table to properly setup the peers <code>data-center:</code> values. Here's an example <code>proxy.yaml</code>:</p><div class=\"highlight highlight-source-yaml notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"peers: - rpc-address: 127.0.0.1 data-center: dc-1 - rpc-address: 127.0.0.2 data-center: dc-2\"><pre>peers:\n  - rpc-address: 127.0.0.1\n    data-center: dc-1\n  - rpc-address: 127.0.0.2\n    data-center: dc-2</pre></div><p dir=\"auto\"><em>Note:</em> It's okay for the <code>peers:</code> to contain entries for the current proxy itself because they'll just be omitted.</p><p dir=\"auto\">There are three methods for using <code>cql-proxy</code>:</p><ul dir=\"auto\"><li>Locally build and run <code>cql-proxy</code></li>\n<li>Run a docker image that has <code>cql-proxy</code> installed</li>\n<li>Use a Kubernetes container to run <code>cql-proxy</code></li>\n</ul><ol dir=\"auto\"><li>\n<p dir=\"auto\">Build <code>cql-proxy</code>.</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"go build\"><pre>go build</pre></div>\n</li>\n<li>\n<p dir=\"auto\">Run with your desired database.</p>\n<ul dir=\"auto\"><li>\n<p dir=\"auto\"><a href=\"https://astra.datastax.com/\" rel=\"nofollow\">DataStax Astra</a> cluster:</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"./cql-proxy --astra-token &lt;astra-token&gt; --astra-database-id &lt;astra-database-id&gt;\"><pre>./cql-proxy --astra-token &lt;astra-token&gt; --astra-database-id &lt;astra-database-id&gt;</pre></div>\n<p dir=\"auto\">The <code>&lt;astra-token&gt;</code> can be generated using these <a href=\"https://docs.datastax.com/en/astra/docs/manage-application-tokens.html\" rel=\"nofollow\">instructions</a>. The proxy also supports using the <a href=\"https://docs.datastax.com/en/astra/docs/obtaining-database-credentials.html#_getting_your_secure_connect_bundle\" rel=\"nofollow\">Astra Secure Connect Bundle</a> along with a client ID and secret generated using these <a href=\"https://docs.datastax.com/en/astra/docs/manage-application-tokens.html\" rel=\"nofollow\">instructions</a>:</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"./cql-proxy --astra-bundle &lt;your-secure-connect-zip&gt; --username &lt;astra-client-id&gt; --password &lt;astra-client-secret&gt;\"><pre>./cql-proxy --astra-bundle &lt;your-secure-connect-zip&gt; \\\n--username &lt;astra-client-id&gt; --password &lt;astra-client-secret&gt;\n</pre></div>\n</li>\n<li>\n<p dir=\"auto\"><a href=\"https://cassandra.apache.org/\" rel=\"nofollow\">Apache Cassandra</a> cluster:</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"./cql-proxy --contact-points &lt;cluster node IPs or DNS names&gt; [--username &lt;username&gt;] [--password &lt;password&gt;]\"><pre>./cql-proxy --contact-points &lt;cluster node IPs or DNS names&gt; [--username &lt;username&gt;] [--password &lt;password&gt;]</pre></div>\n</li>\n</ul></li>\n</ol><ol dir=\"auto\"><li>\n<p dir=\"auto\">Run with your desired database.</p>\n<ul dir=\"auto\"><li>\n<p dir=\"auto\"><a href=\"https://astra.datastax.com/\" rel=\"nofollow\">DataStax Astra</a> cluster:</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"docker run -p 9042:9042 datastax/cql-proxy:v0.1.5 --astra-token &lt;astra-token&gt; --astra-database-id &lt;astra-database-id&gt;\"><pre>docker run -p 9042:9042 \\\n  datastax/cql-proxy:v0.1.5 \\\n  --astra-token &lt;astra-token&gt; --astra-database-id &lt;astra-database-id&gt;</pre></div>\n<p dir=\"auto\">The <code>&lt;astra-token&gt;</code> can be generated using these <a href=\"https://docs.datastax.com/en/astra/docs/manage-application-tokens.html\" rel=\"nofollow\">instructions</a>. The proxy also supports using the <a href=\"https://docs.datastax.com/en/astra/docs/obtaining-database-credentials.html#_getting_your_secure_connect_bundle\" rel=\"nofollow\">Astra Secure Connect Bundle</a>, but it requires mounting the bundle to a volume in the container:</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"docker run -v &lt;your-secure-connect-bundle.zip&gt;:/tmp/scb.zip -p 9042:9042 --rm datastax/cql-proxy:v0.1.5 --astra-bundle /tmp/scb.zip --username &lt;astra-client-id&gt; --password &lt;astra-client-secret&gt;\"><pre>docker run -v &lt;your-secure-connect-bundle.zip&gt;:/tmp/scb.zip -p 9042:9042 \\\n--rm datastax/cql-proxy:v0.1.5 \\\n--astra-bundle /tmp/scb.zip --username &lt;astra-client-id&gt; --password &lt;astra-client-secret&gt;</pre></div>\n</li>\n<li>\n<p dir=\"auto\"><a href=\"https://cassandra.apache.org/\" rel=\"nofollow\">Apache Cassandra</a> cluster:</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"docker run -p 9042:9042 datastax/cql-proxy:v0.1.5 --contact-points &lt;cluster node IPs or DNS names&gt; [--username &lt;username&gt;] [--password &lt;password&gt;]\"><pre>docker run -p 9042:9042 \\\n  datastax/cql-proxy:v0.1.5 \\\n  --contact-points &lt;cluster node IPs or DNS names&gt; [--username &lt;username&gt;] [--password &lt;password&gt;]</pre></div>\n</li>\n</ul></li>\n</ol><p dir=\"auto\">If you wish to have the docker image removed after you are done with it, add <code>--rm</code> before the image name <code>datastax/cql-proxy:v0.1.5</code>.</p><p dir=\"auto\">Using Kubernetes with <code>cql-proxy</code> requires a number of steps:</p><ol dir=\"auto\"><li>\n<p dir=\"auto\">Generate a token following the Astra <a href=\"https://docs.datastax.com/en/astra/docs/manage-application-tokens.html#_create_application_token\" rel=\"nofollow\">instructions</a>. This step will display your Client ID, Client Secret, and Token; make sure you download the information for the next steps. Store the secure bundle in <code>/tmp/scb.zip</code> to match the example below.</p>\n</li>\n<li>\n<p dir=\"auto\">Create <code>cql-proxy.yaml</code>. You'll need to add three sets of information: arguments, volume mounts, and volumes. A full example can be found <a href=\"https://github.com/datastax/cql-proxy/blob/main/k8s/cql-proxy.yml\">here</a>.</p>\n</li>\n</ol><ul dir=\"auto\"><li>\n<p dir=\"auto\">Argument: Modify the local bundle location, username and password, using the client ID and client secret obtained in the last step to the container argument.</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"command: [&quot;./cql-proxy&quot;] args: [&quot;--astra-bundle=/tmp/scb.zip&quot;,&quot;--username=Client ID&quot;,&quot;--password=Client Secret&quot;]\"><pre>command: [\"./cql-proxy\"]\nargs: [\"--astra-bundle=/tmp/scb.zip\",\"--username=Client ID\",\"--password=Client Secret\"]\n</pre></div>\n</li>\n<li>\n<p dir=\"auto\">Volume mounts: Modify <code>/tmp/</code> as a volume mount as required.</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"volumeMounts: - name: my-cm-vol mountPath: /tmp/\"><pre>volumeMounts:\n  - name: my-cm-vol\n  mountPath: /tmp/\n</pre></div>\n</li>\n<li>\n<p dir=\"auto\">Volume: Modify the <code>configMap</code> filename as required. In this example, it is named <code>cql-proxy-configmap</code>. Use the same name for the <code>volumes</code> that you used for the <code>volumeMounts</code>.</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"volumes: - name: my-cm-vol configMap: name: cql-proxy-configmap\"><pre>volumes:\n  - name: my-cm-vol\n    configMap:\n      name: cql-proxy-configmap        \n</pre></div>\n</li>\n</ul><ol start=\"3\" dir=\"auto\"><li>\n<p dir=\"auto\">Create a configmap. Use the same secure bundle that was specified in the <code>cql-proxy.yaml</code>.</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"kubectl create configmap cql-proxy-configmap --from-file /tmp/scb.zip\"><pre>kubectl create configmap cql-proxy-configmap --from-file /tmp/scb.zip </pre></div>\n</li>\n<li>\n<p dir=\"auto\">Check the configmap that was created.</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"kubectl describe configmap cql-proxy-configmap Name: cql-proxy-configmap Namespace: default Labels: &lt;none&gt; Annotations: &lt;none&gt; Data ==== BinaryData ==== scb.zip: 12311 bytes\"><pre>kubectl describe configmap cql-proxy-configmap\n  Name:         cql-proxy-configmap\n  Namespace:    default\n  Labels:       &lt;none&gt;\n  Annotations:  &lt;none&gt;\n  Data\n  ====\n  BinaryData\n  ====\n  scb.zip: 12311 bytes</pre></div>\n</li>\n<li>\n<p dir=\"auto\">Create a Kubernetes deployment with the YAML file you created:</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"kubectl create -f cql-proxy.yaml\"><pre>kubectl create -f cql-proxy.yaml</pre></div>\n</li>\n<li>\n<p dir=\"auto\">Check the logs:</p>\n<div class=\"highlight highlight-source-shell notranslate position-relative overflow-auto\" dir=\"auto\" data-snippet-clipboard-copy-content=\"kubectl logs &lt;deployment-name&gt;\"><pre>kubectl logs &lt;deployment-name&gt;</pre></div>\n</li>\n</ol><p dir=\"auto\">Drivers that use token-aware load balancing may print a warning or may not work when using cql-proxy. Because cql-proxy abstracts the backend cluster as a single endpoint this doesn't always work well with token-aware drivers that expect there to be at least \"replication factor\" number of nodes in the cluster. Many drivers print a warning (which can be ignored) and fallback to something like round-robin, but other drivers might fail with an error. For the drivers that fail with an error it is required that they disable token-aware or configure the round-robin load balancing policy.</p>","id":"228f2fac-87de-5dab-b7ff-0e0c6d40dcbf","title":"GitHub - datastax/cql-proxy: A client-side CQL proxy/sidecar.","origin_url":"https://github.com/datastax/cql-proxy","url":"https://github.com/datastax/cql-proxy","wallabag_created_at":"2024-11-01T17:26:01+00:00","published_at":null,"published_by":"['datastax']","reading_time":8,"domain_name":"github.com","preview_picture":"https://opengraph.githubassets.com/c2528e3426d98910ed27819e048b4c1081fab2ed2c7adbea6e6a3b1872deb30a/datastax/cql-proxy","tags":["migration","proxy","cassandra","cql"],"description":" cql-proxy is designed to forward your application's CQL traffic to an appropriate database service. It listens on a local address and securely forwards that traffic.The cql-proxy sidecar enables unsu..."},{"content":"<header>\n</header><p>Zero Downtime Migration (ZDM) Proxy is an open-source component developed in Go and based on client-server architecture. It enables you to migrate from one Apache Cassandra® cluster to another without downtime or code changes in the application client.</p><p>For details on ZDM Proxy, see <a href=\"https://github.com/datastax/zdm-proxy\" target=\"_blank\" rel=\"noopener noreferrer\">zdm-proxy GitHub</a>.</p><p>When using ZDM Proxy, the client connects to the proxy rather than to the source cluster. The proxy connects both to the source cluster and the target cluster. It sends read requests to the source cluster only, while write requests are forwarded to both clusters.</p><p>For details on how ZDM Proxy works, see <a href=\"https://docs.datastax.com/en/data-migration/introduction.html\" target=\"_blank\" rel=\"noopener noreferrer\">Introduction to Zero Downtime Migration</a>.</p><ul><li>Apache Cassandra instance to migrate to the Aiven platform (migration source)</li>\n<li>Aiven for Apache Cassandra service where to migrate your external instance (migration target)</li>\n<li><a href=\"https://aiven.io/docs/tools/cli\">Aiven CLI client installed</a></li>\n<li><code>cqlsh</code> <a href=\"https://cassandra.apache.org/doc/latest/cassandra/getting_started/installing.html\" target=\"_blank\" rel=\"noopener noreferrer\">installed</a></li>\n</ul><p><a href=\"https://aiven.io/docs/products/cassandra/howto/connect-cqlsh-cli\">Connect to your Aiven for Apache Cassandra service</a> using <code>cqlsh</code>, for example.</p><div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">cqlsh --ssl-u avnadmin -p YOUR_SECRET_PASSWORD cassandra-target-cluster-name.a.avns.net 12345<br /></pre></div><p>You can expect to receive output similar to the following:</p><div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">Connected to a1b2c3d4-1a2b-3c4d-5e6f-a1b2c3d4e5f6 at cassandra-target-cluster-name.a.avns.net:12345<br />[cqlsh 6.1.0 | Cassandra 4.0.11 | CQL spec 3.4.5 | Native protocol v5]<br /></pre></div><p>In your target service, create the same keyspaces and tables you have in your source Apache Cassandra cluster.</p><div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">create keyspace KEYSPACE_NAME with replication ={'class':'SimpleStrategy', 'replication_factor':3};<br />create table KEYSPACE_NAME.TABLE_NAME (n_id int, value int, primary key (n_id));<br /></pre></div><p>Download the ZDM Proxy's binary from <a href=\"https://github.com/datastax/zdm-proxy/releases\" target=\"_blank\" rel=\"noopener noreferrer\">ZDM Proxy releases</a>.</p><div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">wget https://github.com/datastax/zdm-proxy/releases/download/v2.1.0/zdm-proxy-linux-amd64-v2.1.0.tgz<br />tar xf zdm-proxy-linux-amd64-v2.1.0.tgz<br /></pre></div><p>Check if the binary has been downloaded successfully using <code>ls</code> in the relevant directory. You can expect to receive output similar to the following:</p><div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">LICENSE  zdm-proxy-linux-amd64-v2.1.0.tgz  zdm-proxy-v2.1.0<br /></pre></div><ol><li>\n<p>Specify connection information by setting <code>ZDM_TARGET_*</code> and <code>ZDM_ORIGIN_*</code> environment variables using the <code>export</code> command.</p>\n<div class=\"theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary\"><p>note</p><div class=\"admonitionContent_BuS1\"><p><code>ORIGIN</code> refers to the source service.</p></div></div>\n</li>\n<li>\n<p>Run the binary.</p>\n</li>\n</ol><div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">exportZDM_ORIGIN_CONTACT_POINTS=localhost<br />exportZDM_ORIGIN_USERNAME=cassandra<br />exportZDM_ORIGIN_PASSWORD=cassandra<br />exportZDM_ORIGIN_PORT=1234<br />exportZDM_TARGET_CONTACT_POINTS=cassandra-target-cluster-name.a.avns.net<br />exportZDM_TARGET_USERNAME=avnadmin<br />exportZDM_TARGET_PASSWORD=YOUR_SECRET_PASSWORD<br />exportZDM_TARGET_PORT=12345<br />exportZDM_TARGET_TLS_SERVER_CA_PATH=\"/tmp/ca.pem\"<br />exportZDM_TARGET_ENABLE_HOST_ASSIGNMENT=false<br /># ZDM_ORIGIN_ENABLE_HOST_ASSIGNMENT=false  # (may be needed, see note)<br />./zdm-proxy-v2.1.0<br /></pre></div><div class=\"theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary\"><p>ENABLE_HOST_ASSIGNMENT</p><div class=\"admonitionContent_BuS1\"><p>Make sure you set the ZDM_TARGET_ENABLE_HOST_ASSIGNMENT variable. Otherwise, ZDM Proxy tries to connect to one of internal addresses of the cluster nodes, which are unavailable from outside. If this occurs to your source cluster, set <code>ZDM_ORIGIN_ENABLE_HOST_ASSIGNMENT=false</code>.</p></div></div><p>To connect to ZDM Proxy, use, for example, <code>cqlsh</code>. Provide connection details and, if your source or target require authentication, specify target username and password.</p><p>Check more details on using the credentials in <a href=\"https://docs.datastax.com/en/data-migration/introduction.html\" target=\"_blank\" rel=\"noopener noreferrer\">Client application credentials</a>.</p><p>The port that ZDM Proxy uses is 14002, which can be overridden.</p><ol><li>\n<p>Connect using ZDM Proxy.</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">cqlsh -u avnadmin -p YOUR_SECRET_PASSWORD localhost 14002<br /></pre></div>\n<p>You can expect to receive output similar to the following:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">Connected to CLUSTER_NAME at localhost:14002<br />[cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v4]<br /></pre></div>\n</li>\n<li>\n<p>Check data in the table.</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">select * from KEYSPACE_NAME.TABLE_NAME;<br /></pre></div>\n<p>You can expect to receive output similar to the following:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">n_id | value<br />------+-------<br />1|42<br />2|44<br />3|46<br />(3 rows)<br /></pre></div>\n</li>\n<li>\n<p>Insert more data into the table to test how ZDM Proxy handles write request.</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">insert into KEYSPACE_NAME.TABLE_NAME (n_id, value) values (4, 48);<br />insert into KEYSPACE_NAME.TABLE_NAME (n_id, value) values (5, 50);<br /></pre></div>\n</li>\n<li>\n<p>Check again data inside the table.</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">select * from KEYSPACE_NAME.TABLE_NAME;<br /></pre></div>\n<p>You can expect to receive output similar to the following:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">n_id | value<br />------+-------<br />5|50<br />1|42<br />2|44<br />4|48<br />3|46<br />(5 rows)<br /></pre></div>\n</li>\n</ol><ol><li>\n<p>Connect to the source:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">cqlsh localhost 1234<br /></pre></div>\n<p>You can expect to receive output similar to the following:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">Connected to SOURCE_CLUSTER_NAME at localhost:1234<br />[cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v5]<br /></pre></div>\n</li>\n<li>\n<p>Check data in the table:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">select * from KEYSPACE_NAME.TABLE_NAME;<br /></pre></div>\n<p>You can expect to receive output similar to the following:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">n_id | value<br />------+-------<br />5|50<br />1|42<br />2|44<br />4|48<br />3|46<br />(5 rows)<br /></pre></div>\n<p>ZDM Proxy has forwarded both the write request and the read request to the source cluster. As a result, all the values are there: both newly-added ones (<code>50</code> and <code>48</code>) and previously added ones (<code>42</code>, <code>44</code>, and <code>46</code>).</p>\n</li>\n</ol><ol><li>\n<p>Connect to the target service.</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">cqlsh --ssl-u avnadmin -p YOUR_SECRET_PASSWORD cassandra-target-cluster-name.a.avns.net 12345<br /></pre></div>\n<p>You can expect to receive output similar to the following:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">Connected to a1b2c3d4-1a2b-3c4d-5e6f-a1b2c3d4e5f6 at cassandra-target-cluster-name.a.avns.net:12345<br />[cqlsh 6.1.0 | Cassandra 4.0.11 | CQL spec 3.4.5 | Native protocol v5]<br /></pre></div>\n</li>\n<li>\n<p>Check data in the table.</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">select * from KEYSPACE_NAME.TABLE_NAME;<br /></pre></div>\n<p>You can expect to receive output similar to the following:</p>\n<div class=\"language-bash codeBlockContainer_Ckt0 theme-code-block codeBlockContent_biex c3\"><pre class=\"codeBlockLines_e6Vv\">n_id | value<br />------+-------<br />5|50<br />4|48<br />(2 rows)<br /></pre></div>\n<p><code>50</code> and <code>48</code> are there in the target table since ZDM Proxy has forwarded the write request to the target service. <code>42</code>, <code>44</code>, and <code>46</code> are not there since ZDM Proxy has not sent the read request to the target service.</p>\n</li>\n</ol><ul><li><a href=\"https://github.com/datastax/zdm-proxy\" target=\"_blank\" rel=\"noopener noreferrer\">zdm-proxy GitHub</a></li>\n<li><a href=\"https://docs.datastax.com/en/data-migration/introduction.html\" target=\"_blank\" rel=\"noopener noreferrer\">Introduction to Zero Downtime Migration</a></li>\n<li><a href=\"https://github.com/datastax/zdm-proxy/releases\" target=\"_blank\" rel=\"noopener noreferrer\">ZDM Proxy releases</a></li>\n<li><a href=\"https://docs.datastax.com/en/data-migration/connect-clients-to-target.html\" target=\"_blank\" rel=\"noopener noreferrer\">Client application credentials</a></li>\n</ul>","id":"c4220cd6-6736-5c34-870e-18564cff721d","title":"Migrate to Aiven for Apache Cassandra® with no downtime | Aiven docs","origin_url":"https://aiven.io/docs/products/cassandra/howto/zdm-proxy","url":"https://aiven.io/docs/products/cassandra/howto/zdm-proxy","wallabag_created_at":"2024-11-01T17:25:08+00:00","published_at":null,"published_by":null,"reading_time":4,"domain_name":"aiven.io","preview_picture":"https://aiven.io/docs/images/site-preview.png","tags":["migration","proxy","cassandra","aiven"],"description":"\nZero Downtime Migration (ZDM) Proxy is an open-source component developed in Go and based on client-server architecture. It enables you to migrate from one Apache Cassandra® cluster to another withou..."}]},{"tag":"performance","articles":[{"content":"<div class=\"top-section\"><div class=\"container-fluid main-content-area\"><div id=\"scroll-status-bar\"><div id=\"scroll-status-percent\"></div><div class=\"blog-post-hero container-fluid\"><div class=\"blog-post-hero-bg\"><div class=\"blog-post-hero-bg-img\"></div><div class=\"container\"><img src=\"https://sematext.com/wp-content/uploads/2021/10/critical-cassandra-metrics-to-monitor.jpg\" id=\"the-featured-image\" width=\"1140\" height=\"626\" alt=\"image\" /></div></div><div class=\"container-fluid container-single-blog-post\"><div class=\"container\"><article class=\"single-article-blog-post\" id=\"post-53726\"><main><section><div id=\"the-content\"><p>Apache Cassandra is a distributed database known for its high availability, fault tolerance, and near-linear scaling. It was initially developed by Facebook, but it is a widely used open-source system used by the largest tech companies in the world. There are numerous reasons behind its popularity, including no single point of failure, exceptional horizontal scaling with a data layout designed as a perfect fit for time-series data.</p><p>However, despite these perks, like any other system, Cassandra is prone to performance issues. This makes monitoring imperative. And it all starts with knowing what to measure. In this article, we will explain the <strong>key Cassandra performance metrics</strong> you should monitor to make sure everything is up and running at all times.</p><h2>What Is Cassandra and How Does It Work?</h2><p>Let’s keep it short – Apache Cassandra is a distributed NoSQL database designed to provide fault-tolerant and highly available architecture with performance in mind.</p><p>As a distributed system Cassandra is built out of nodes. A <strong>node</strong> is a single instance of Apache Cassandra that can operate on its own. Multiple nodes can form a <strong>cluster</strong> – a distributed system holding common data and responding to query requests. Cassandra works in a master-less architecture where each node communicates in a <strong>peer to peer </strong>fashion using a protocol known as <strong>Gossip</strong>. The <strong>gossip</strong> protocol is designed so that each node is informed about the state of all other nodes and a single node performs <strong>gossip</strong> communication with up to three other nodes every second.</p><p>The <strong>cluster</strong> can be divided into <strong>data centers</strong> and <strong>racks</strong>, just like the real-life data centers are divided. In Cassandra terminology, a <strong>data center</strong> is designed to hold multiple <strong>racks</strong> and a single <strong>rack </strong>holds a complete replica of the data.</p><p><img data-lazyloaded=\"1\" src=\"https://sematext.com/wp-content/uploads/2021/10/cassandra-metrics-3.png\" alt=\"image\" /></p><noscript><img src=\"https://sematext.com/wp-content/uploads/2021/10/cassandra-metrics-3.png\" alt=\"image\" /></noscript><p><em>Cassandra Cluster Logical Overview</em></p><p>When it comes to the data, Cassandra stores it in tables that are organized the same way as in any other database – in rows and columns. A single table is called a column family. The tables themselves are grouped into keyspaces, where a keyspace usually holds logically similar data – for example, from a business perspective. The keyspace is also used for data replication, and the replication itself is configured on a keyspace level.</p><p>Getting back to the tables. Each table defines a primary key that is built of the partition key and the clustering columns. Cassandra uses the partition key to index the data. All data that share a common partition key make a single data partition – a basic unit for data retrieval and storage. The clustering columns are optional.</p><p>Needless to say, Apache Cassandra is a complicated, distributed system and it’s not uncommon for users to encounter operation problems and difficulties. Everything breaks eventually, from the low-level bare metal components, up to the high-level software. It is not unusual for users to deal with network issues and CPU utilization problems, especially on very large clusters. Cassandra is written in Java and uses both off-heap and heap memory, which means that as the volume of data grows, you may hit issues with the garbage collector. Finally, because of the amount of data that you will process you may need to deal with the hard disk space and performance of your I/O subsystem. All of these can be avoided by keeping an eye for the relevant metrics with the help of a good <a href=\"https://sematext.com/integrations/cassandra\">Cassandra monitoring tool</a>.</p><h2>How Is Cassandra Performance Measured?</h2><p>The most complex, distributed systems provide a set of metrics that you should take care of, monitor, and alert on to ensure that your system is healthy and working well. Apache Cassandra is no different. It provides a plethora of performance metrics which we can divide into three categories:</p><ul><li>Dedicated Apache Cassandra metrics that describe how the system and its parts perform.</li><li><a href=\"https://sematext.com/blog/jvm-metrics/\">Java Virtual Machine metrics</a> that tell you about the execution environment on which Apache Cassandra is running.</li><li><a href=\"https://sematext.com/server-monitoring/\">Operating system metrics</a> describing the metrics related to the bare metal servers, virtual machines, or containers, depending on the environment that you are using.</li></ul><h3>Dedicated Cassandra Performance Metrics</h3><p>When monitoring Apache Cassandra clusters, is the metrics that the distributed data store exposes via the JMX interface. There are many Cassandra performance metrics exposed in the JMX and having visibility into most of them is a good idea. You never know what can be useful when troubleshooting.</p><h4>Nodes</h4><p>One of the most important Cassandra metrics is the number of nodes that are currently available and connected to form a cluster. The ability to store the data and respond to queries is directly related to the availability of nodes.</p><h4>Compaction Metrics</h4><p><a href=\"https://cassandra.apache.org/doc/latest/operating/compaction/index.html\">Compaction</a> is the operation of merging multiple smaller instances of <a href=\"https://cassandra.apache.org/doc/latest/architecture/storage_engine.html#sstables\">SSTable</a> into one bigger SSTable that contains all the data from the smaller tables. Because of that, it can be very expensive and resource-consuming. Having visibility into compaction performance is critical for long-term observability – the <a href=\"https://sematext.com/blog/cassandra-monitoring-tools/\">Cassandra monitoring tool</a> of your choice needs to provide the number of compactions and the number of compacted bytes.</p><p>During compaction, until the process ends, the total disk space used may be double that before the compaction. Because of that, you should consider leaving about 50% of space free to account for compactions and, of course, set up appropriate alerts to inform you when the amount of free disk space is close to a level where compaction could fail.</p><h4>Read and Write Performance Metrics</h4><p>The next set of metrics is dedicated to clients and the read and write side of the operations. You should measure the number of reads happening in a given period, the request latency, and the number of timeouts and failures. Your Cassandra monitoring tool should provide the top-level view and allow for slicing and dicing through the data showing you the aggregated view, per node view, per keyspace view, and per table view. The same goes for write operations.</p><p>You should see the number of write requests happening and write latency. Local writes and reads may also be important when troubleshooting.</p><h4>Table Metrics</h4><p>Table metrics are also essential. The ones you should pay close attention to are partition size, tombstone scans, and the number of SSTables per read.</p><h5>Partition Size</h5><p>Partition size is crucial for cluster performance. Cassandra uses it as a unit of data storage, replication, and retrieval, thus directly dictating the performance of your Cassandra tables. The ideal partition size varies but is usually below 100MB and not less than 10 – 20MB.</p><h5>Tombstones</h5><p>Cassandra produces <a href=\"https://cassandra.apache.org/doc/latest/operating/compaction.html#tombstones-and-garbage-collection-gc-grace\">tombstones</a> when you delete the data. They are markers of the deleted data. Data in Cassandra is immutable by design, and because of that, it can only be physically removed from the SSTable during compactions. Because of that, you should keep an eye on how they affect your disk space.</p><h5>SSTables Per Read</h5><p>Similar to tombstones, the number of SSTables per read is related to the immutability of the data in Cassandra. A single table can be built of multiple SSTables, which are written sequentially. A single read operation can result in reading multiple SSTables to retrieve the relevant data. The more SSTables Cassandra needs to read to return the data, the more resources are required to complete the read operation. This is why you should minimize the number whenever possible.</p><h4>Other Metrics</h4><p>As we mentioned earlier, other Apache Cassandra performance metrics can be helpful and you should consider monitoring them.</p><h5>Caches</h5><p>There are two types of caches in Cassandra – the key cache and the row cache. Cassandra uses the key cache to store the location of row keys in memory so that the rows can be accessed without the need to hit the disk. The row cache stores the rows themselves in memory. By using the caches, Cassandra reduces the need to read the data from the disk and trades the memory usage for performance.</p><p>You need to monitor the key cache requests and row cache requests, which tell how many requests to a given cache type were made, and the key cache hit ratio and the row cache hit ratio, which show the percentage of results retrieved from the cache instead of the disk.</p><h5>Threadpool</h5><p>Cassandra is designed to handle the high load, withstand backpressure, and perform asynchronous tasks. Monitoring various thread pools is crucial for understanding Cassandra’s performance and bottlenecks. Each thread pool exposes the number of active, pending, and blocked tasks. Accumulated, pending, and blocked tasks usually tell about performance issues and the need for more processing power or different data and query architecture.</p><h5>Bloom Filter</h5><p>In the read path, Cassandra merges the data stored on a disk inside the SSTables with the data stored in memory. To minimize the amount of checking for data existence in the SSTables on the disk Cassandra uses a data structure called bloom filter.</p><p>The bloom filter is a probabilistic data structure that can tell Cassandra that the data is definitely not in a given file or that the data may be present in a given file. The key metrics to monitor here are the amount of space used by bloom filters, the number of false positives, and the ratio. You can reduce the number of false positives by assigning more memory to the bloom filters.</p><h3>Java Virtual Machine Metrics</h3><p><a href=\"https://cassandra.apache.org/\">Apache Cassandra</a> is a JVM-based application that comes with all the usual JVM pros and cons. From the developer’s perspective, memory management is easier and requires less hassle – you just use an object and forget about it, letting the JVM do the cleaning up. But that means that something has to clean up all the unused objects in memory. This is where the <a href=\"https://sematext.com/blog/java-garbage-collection/\">Java Garbage Collection</a> comes in and the metrics that come with it.</p><p>A proper <a href=\"https://sematext.com/integrations/cassandra-monitoring/\">Cassandra monitoring tool</a> should provide metrics that allow you to check and troubleshoot issues with the Java Virtual Machine, such as JVM memory utilization and garbage collection count and time. You can read more about them in our guide about <a href=\"https://sematext.com/blog/jvm-metrics/\">JVM metrics</a>.</p><h3>Operating System Metrics</h3><p>You can’t ignore Operating System metrics either. Information such as CPU utilization, memory usage, and disk usage is essential and can play a major role when it comes to Cassandra performance.</p><h4>CPU Utilization</h4><p>Your CPU is used for data processing and query handling. The more spare CPU cycles you have on a given node, the data and queries it can process. The <strong>user</strong> part of the CPU usage will show you your Cassandra process needs, while the <strong>wait</strong> can point to a bottleneck in I/O or network. As with every Java application, CPU cycles are also needed for garbage collection, so keep that in mind when planning.</p><h4>Memory Usage</h4><p>Memory usage is crucial for every JVM-based application. The newest version of Cassandra leverages both off-heap and heap memory. This means that you not only need to set the heap size of your Cassandra nodes correctly but also have enough off-heap memory for keeping your cluster performance at its best.</p><h4>Disk Usage</h4><p>Disk and I/O are crucial – Cassandra keeps its data on the disk, and each query may require a substantial number of I/O operations to return the results. You need to be sure that your hardware can handle your data retrieval needs. You also need to be sure that you have enough space to hold your data and handle the compaction process.</p><h2>Monitor Cassandra Performance with Sematext</h2><p><img data-lazyloaded=\"1\" data-placeholder-resp=\"1999x1017\" src=\"https://sematext.com/wp-content/uploads/2021/10/cassandra-metrics-1.png\" class=\"alignnone\" alt=\"monitoring cassandra performance metrics with sematext\" width=\"1999\" height=\"1017\" /></p><noscript><img class=\"alignnone\" src=\"https://sematext.com/wp-content/uploads/2021/10/cassandra-metrics-1.png\" alt=\"monitoring cassandra performance metrics with sematext\" width=\"1999\" height=\"1017\" /></noscript><p><a href=\"https://sematext.com/cloud/\">Sematext Cloud</a> and its <a href=\"https://sematext.com/integrations/cassandra\">Apache Cassandra monitoring</a> integration provide all that you need to monitor your distributed database. Everything is within a single view available without distractions:</p><ul><li>The overview report gives you a perfect start point for your metrics, painting a picture of the whole cluster.</li><li>A dedicated Cassandra report that provides an in-depth view of all relevant metrics related to the distributed database.</li><li>The OS report provides necessary operating system metrics such as CPU and memory utilization and visibility into your network traffic.</li><li>Finally, the JVM metrics give the full view of the Java Virtual Machine, such as metrics related to garbage collection and per-heap space memory utilization.</li></ul><p>Using the dedicated split-view, you can correlate all the available metrics with other metrics, <a href=\"https://sematext.com/logsene/\">logs</a>, and <a href=\"https://sematext.com/experience/\">real user monitoring</a> data, making Sematext a perfect visibility tool.</p><p>Sematext allows you to set up alerts on any metric or log the event and supports both threshold-based and anomaly-based alerts for full flexibility. You don’t have to watch your metrics over and over. Once you configure your alerts, you can sleep well, and Sematext will let you know if something is wrong.</p><p>If you want to see how Sematext stacks against similar solutions, read our article about the best <a href=\"https://sematext.com/blog/cassandra-monitoring-tools/\">Cassandra monitoring tools</a> available today.</p><h2>Get Started with Cassandra Monitoring</h2><p>As the distributed database Apache Cassandra can quickly become an operational challenge without visibility into what is happening from a global perspective as well as on a node level. You need to have full visibility from top to bottom, but that is not enough. You need to be sure that your monitoring system can notify you when an issue happens and also predict issues before your customers notice them.</p><p>One of the tools that will give you all of that is Sematext’s <a href=\"https://sematext.com/integrations/\">Apache Cassandra monitoring</a> integration. String monitoring your Cassandra cluster by creating the Sematext Cloud account and then the Cassandra monitoring App. Don’t forget to create a Logs App as well to ship you Cassandra logs for a full observability experience.</p><div id=\"jp-relatedposts\" class=\"jp-relatedposts\"><h3 class=\"jp-relatedposts-headline\"><em>You might also like</em></h3></div></div><div id=\"twitter-button\"><p class=\"text-center\"><a href=\"https://apps.sematext.com/ui/registration\" id=\"continue-conversation-twitter\" class=\"g-btn-outline-orange\">Start Your Free Trial</a></p></div></section><aside><div class=\"aside-blog-content\"><div class=\"aside-blog-content-search\"><form role=\"search\" method=\"get\" class=\"form-search\" action=\"https://sematext.com/\"><div class=\"input-group\">\n<label class=\"screen-reader-text\" for=\"s\">Search for:</label>\n<input type=\"text\" class=\"form-control search-query\" placeholder=\"Search…\" value=\"\" name=\"s\" title=\"Search for:\" /><button type=\"submit\" class=\"btn btn-default\" name=\"submit\" id=\"searchsubmit\" value=\"search\">\n</button></div></form></div><div id=\"related-content\"><div class=\"hiring-block\"><h4>Sematext is Hiring</h4><ul><li><a href=\"https://sematext.com/jobs/devops-engineer/\">DevOps Engineer</a></li><li><a href=\"https://sematext.com/jobs/customer-success-manager/\">Customer Success Manager</a></li><li><a href=\"https://sematext.com/jobs/job-product-marketing-manager/\">Product Marketing Manager</a></li><li><a href=\"https://sematext.com/jobs/job-product-manager/\">Product  Manager</a></li><li><a href=\"https://sematext.com/jobs/job-full-stack-developer/\">Full Stack Developer</a></li><li><a href=\"https://sematext.com/jobs/job-search-consulting-and-search-solutions-architect/\">Solr / Elasticsearch Solutions Architect</a></li></ul><p><a href=\"https://sematext.com/jobs/\" title=\"Sematext Jobs\">See all jobs</a></p></div><div class=\"write-to-us\"><h4>Do you have a cool story to share?</h4><p><a href=\"https://sematext.com/contact/\" title=\"Contact Us\">Write for us</a></p></div></div></div></aside></main><footer><div id=\"alternative-sharing-block\"></div></footer></article></div></div></div><div class=\"footer-area\"><div class=\"container footer-inner\"><div class=\"col-md-3 col-sm-6\"><h4>Products</h4><ul><li><a href=\"https://sematext.com/cloud/\" title=\"Sematext Cloud\">Sematext Cloud</a></li><li><a href=\"https://sematext.com/spm/\" title=\"Infrastructure Monitoring\">Infrastructure Monitoring</a></li><li><a href=\"https://sematext.com/logsene/\" title=\"Log Management\">Log Management</a></li><li><a href=\"https://sematext.com/experience/\" title=\"Real User Monitoring\">Real User Monitoring</a></li><li><a href=\"https://sematext.com/synthetic-monitoring/\" title=\"Synthetic Monitoring\">Synthetic Monitoring</a></li><li><a href=\"https://sematext.com/tracing/\" title=\"Distributed Transaction Tracing\">APM / Tracing</a></li><li><a href=\"https://sematext.com/enterprise/\" title=\"Sematext Enterprise\">Sematext Enterprise</a></li></ul></div><div class=\"col-md-2 col-sm-6\"><h4>Services</h4><ul><li><a href=\"https://sematext.com/consulting/\" title=\"Consulting\">Consulting</a></li><li><a href=\"https://sematext.com/support/\" title=\"Support\">Support</a></li><li><a href=\"https://sematext.com/training/\" title=\"Training\">Training</a></li></ul></div><div class=\"col-md-2 col-sm-6\"><h4>About</h4><ul><li><a href=\"https://sematext.com/about/\" title=\"Company\">Company</a></li><li><a href=\"https://sematext.com/blog/\" title=\"Blog\">Blog</a></li><li><a href=\"https://sematext.com/jobs/\" title=\"Jobs\">Jobs</a></li><li><a href=\"https://sematext.com/customers/\" title=\"Customers\">Customers</a></li><li><a href=\"https://status.sematext.com/\" title=\"Status\">Status</a></li></ul></div><div class=\"col-md-2 col-sm-6\"><h4>Contact</h4><ul><li><i class=\"fa fa-phone fa-fw\"> <a href=\"tel:+1%20347-480-1610\">+1 347-480-1610</a></i></li><li><i class=\"fa fa-envelope fa-fw\"> <a href=\"mailto:info@sematext.com\">info@sematext.com</a></i></li><li><i class=\"fa fa-map-marker fa-fw\"> <a href=\"https://www.google.com/maps/place/540+President+St,+Brooklyn,+NY+11215,+EE.+UU./@40.6773068,-73.9875385,17z/data=!3m1!4b1!4m5!3m4!1s0x89c25a55722bfff7:0x2143eab42dc5c96d!8m2!3d40.67713!4d-73.984982\" target=\"_blank\">Brooklyn, NY USA</a></i></li><li class=\"social-networks\">\n<a href=\"https://twitter.com/sematext\"><i class=\"fa fa-twitter\" aria-hidden=\"true\"></i></a>\n<a href=\"https://www.facebook.com/Sematext/\"><i class=\"fa fa-facebook\" aria-hidden=\"true\"></i></a>\n<a href=\"https://github.com/sematext\"><i class=\"fa fa-github\" aria-hidden=\"true\"></i></a>\n<a href=\"https://www.linkedin.com/company/294493/\"><i class=\"fa fa-linkedin\" aria-hidden=\"true\"></i></a></li></ul></div><div class=\"col-md-3 col-sm-12\"><p>\n<strong>© Sematext Group. All rights reserved</strong>\n<br /><a href=\"https://sematext.com/legal/terms-of-service/\">Terms Of Service</a> · <a href=\"https://sematext.com/legal/privacy/\">Privacy Policy</a></p><figure><a href=\"https://www.softwareadvice.com/network-monitoring/#top-products\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-01.png\" alt=\"Software Advice 2020 Front Runners\" /></a>\n<a href=\"https://www.softwareadvice.com/reporting-tools/#top-products\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-02.png\" alt=\"Software Advice 2021 Front Runners\" /></a>\n<a href=\"https://www.getapp.com/business-intelligence-analytics-software/analytics-reporting/category-leaders/\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-03.png\" alt=\"GetApp Category Leaders 2021\" /></a>\n<a href=\"https://crozdesk.com/it/application-performance-monitoring-apm-software/sematext-cloud\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-04.png\" alt=\"Crozdesk 2020 Quality Choice\" /></a>\n<a href=\"https://crozdesk.com/it/application-performance-monitoring-apm-software/sematext-cloud\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-05.png\" alt=\"Crozdesk 2020 Trusted Vendor\" /></a>\n<a href=\"https://crozdesk.com/it/application-performance-monitoring-apm-software/sematext-cloud\"><img src=\"https://sematext.com/wp-content/themes/sematext-next/inc/images/crozdesk-badges/badge-06.png\" alt=\"Crozdesk 2020 Happiest Users\" /></a></figure></div></div><footer id=\"colophon\" class=\"site-footer\" role=\"contentinfo\"><div class=\"container\"><div class=\"copyright col-md-12\"><p>\nApache Lucene, Apache Solr and their respective logos are trademarks of the Apache Software Foundation.\nElasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S.\nand in other countries. Sematext Group, Inc. is not affiliated with Elasticsearch BV.</p></div></div></footer></div></div></div></div>","id":"070dd491-a966-5c75-86e3-0327d7804b22","title":"How Do You Monitor Cassandra Performance: Key Metrics to Measure","origin_url":"https://sematext.com/blog/cassandra-monitoring/","url":"https://sematext.com/blog/cassandra-monitoring/","wallabag_created_at":"2021-11-08T17:30:27+00:00","published_at":"2021-10-04T10:55:25+00:00","published_by":"['Rafal Kuć']","reading_time":11,"domain_name":"sematext.com","preview_picture":"https://sematext.com/wp-content/uploads/2021/10/critical-cassandra-metrics-to-monitor.jpg","tags":["monitoring","cassandra","performance"],"description":"Apache Cassandra is a distributed database known for its high availability, fault tolerance, and near-linear scaling. It was initially developed by Facebook, but it is a widely used open-source system..."},{"content":"<p>This is our third post in our series on performance tuning with Apache Cassandra.  In our first post, we discussed how we can use <a href=\"https://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html\">Flame Graphs</a> to visually diagnose performance problems.  In our second post, we discussed <a href=\"http://thelastpickle.com/blog/2018/04/11/gc-tuning.html\">JVM tuning</a>, and how the different JVM settings can have an affect on different workloads.</p><p>In this post, we’ll dig into a table level setting which is usually overlooked: compression.  Compression options can be specified when creating or altering a table, and it defaults to enabled if not specified.  The default is great when working with write heavy workloads, but can become a problem on read heavy and mixed workloads.</p>\n<p>Before we get into optimizations, let’s take a step back to understand the basics of compression in Cassandra.  Once we’ve built a foundation of knowledge, we’ll see how to apply it to real world workloads.</p>\n<h2 id=\"how-it-works\">How it works</h2>\n<p>When we create a table in Cassandra, we can specify a variety of table options in addition to our fields.  In addition to options such as using <a href=\"https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html\">TWCS for our compaction strategy</a>, specifying <a href=\"https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html\">gc grace seconds</a>, and caching options, we can also tell Cassandra how we want it to compress our data.  If the compression option is not specified, LZ4Compressor will be used, which is known for it’s excellent performance and compression rate. In addition to the algorithm, we can specify our <code class=\"language-plaintext highlighter-rouge\">chunk_length_in_kb</code>, which is the size of the uncompressed buffer we write our data to as an intermediate step before writing to disk.  Here’s an example of a table using LZ4Compressor with 64KB chunk length:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>create table sensor_data ( \n    id text primary key, \n    data text) \nWITH compression = {'sstable_compression': 'LZ4Compressor', \n                    'chunk_length_kb': 64};\n</pre></div></div>\n<p>We can examine how well compression is working at the table level by checking <code class=\"language-plaintext highlighter-rouge\">tablestats</code>:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>$ bin/nodetool tablestats tlp_stress\nKeyspace : tlp_stress\n\tRead Count: 89766\n\tRead Latency: 0.18743983245326737 ms\n\tWrite Count: 8880859\n\tWrite Latency: 0.009023213069816781 ms\n\tPending Flushes: 0\n\t\tTable: sensor_data\n\t\tSSTable count: 5\n\t\tOld SSTable count: 0\n\t\tSpace used (live): 864131294\n\t\tSpace used (total): 864131294\n\t\tOff heap memory used (total): 2472433\n\t\tSSTable Compression Ratio: 0.8964684393508305\n\t\tCompression metadata off heap memory used: 140544\n</pre></div></div>\n<p>The <code class=\"language-plaintext highlighter-rouge\">SSTable Compression Ratio</code> line above tells us how effective compression is.  Compression ratio is calculated by the following:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>compressionRatio = (double) compressed/uncompressed;\n</pre></div></div>\n<p>meaning the smaller the number, the better the compression.  In the above example our compressed data is taking up almost 90% of the original data, which isn’t particularly great.</p>\n<h2 id=\"how-data-is-written\">How data is written</h2>\n<p>I’ve found digging into the codebase, profiling and working with a debugger to be the most effective way to learn how software works.</p>\n<p>When data is written to / read from SSTables, we’re not dealing with convenient typed objects, we’re dealing with streams of bytes.  Our compressed data is written in the <code class=\"language-plaintext highlighter-rouge\">CompressedSequentialWriter</code> class, which extends <code class=\"language-plaintext highlighter-rouge\">BufferedDataOutputStreamPlus</code>.  This writer uses a temporary buffer. When the data is written out to disk the buffer is compressed and some meta data about it is recorded to a <code class=\"language-plaintext highlighter-rouge\">CompressionInfo</code> file.  If there is more data than available space in the buffer, the buffer is written to, flushed, and the buffer starts fresh to be written to again (and perhaps flushed again).   You can see this in <code class=\"language-plaintext highlighter-rouge\">org/apache/cassandra/io/util/BufferedDataOutputStreamPlus.java</code>:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>@Override\npublic void write(byte[] b, int off, int len) throws IOException\n{\n    if (b == null)\n        throw new NullPointerException();\n    // avoid int overflow\n    if (off &lt; 0 || off &gt; b.length || len &lt; 0\n        || len &gt; b.length - off)\n        throw new IndexOutOfBoundsException();\n    if (len == 0)\n        return;\n    int copied = 0;\n    while (copied &lt; len)\n    {\n        if (buffer.hasRemaining())\n        {\n            int toCopy = Math.min(len - copied, buffer.remaining());\n            buffer.put(b, off + copied, toCopy);\n            copied += toCopy;\n        }\n        else\n        {\n            doFlush(len - copied);\n        }\n    }\n}\n</pre></div></div>\n<p>The size of this buffer is determined by <code class=\"language-plaintext highlighter-rouge\">chunk_length_in_kb</code>.</p>\n<h2 id=\"how-data-is-read\">How data is read</h2>\n<p>The read path in Cassandra is (more or less) the opposite of the write path.  We pull chunks out of SSTables, decompress them, and return them to the client.  The full path is a little more complex - there’s a a <code class=\"language-plaintext highlighter-rouge\">ChunkCache</code> (managed by <a href=\"https://github.com/ben-manes/caffeine\">caffeine</a>) that we go through, but that’s beyond the scope of this post.</p>\n<p>During the read path, the entire chunk must be read and decompressed.  We’re not able to selectively read only the bytes we need.  The impact of this is that if we are using 4K chunks, we can get away with only reading 4K off disk.  If we use 256KB chunks, we have to read the entire 256K.  This might be fine for a handful of requests but when trying to maximize throughput we need to consider what happens when we have requests in the thousands per second.  If we have to read 256KB off disk for ten thousand requests a second, we’re going to need to read 2.5GB per second off disk, and that can be an issue no matter what hardware we are using.</p>\n<h3 id=\"what-about-page-cache\">What about page cache?</h3>\n<p>Linux will automatically leverage any RAM that’s not being used by applications to keep recently accessed filesystem blocks in memory.  We can see how much page cache we’re using by using the <code class=\"language-plaintext highlighter-rouge\">free</code> tool:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>$ free -mhw\n              total        used        free      shared     buffers       cache   available\nMem:            62G        823M         48G        1.7M        261M         13G         61G\nSwap:          8.0G          0B        8.0G\n</pre></div></div>\n<p>Page cache can be a massive benefit if you have a working data set that fits in memory. With smaller data sets this is incredibly useful, but Cassandra was built to solve big data problems. Typically that means having a lot more data than available RAM.  If our working data set on each node is 2 TB, and we only have 20-30 GB of free RAM, it’s very possible we’ll serve <em>almost none of our requests out of cache</em>.  Yikes.</p>\n<p>Ultimately, we need to ensure we use a chunk length that allows us to minimize our I/O.  Larger chunks can compress better, giving us a smaller disk footprint, but we end up needing more hardware, so the space savings becomes meaningless for certain workloads.  There’s no perfect setting that we can apply to every workload.  Frequently, the most reads you do, the smaller the chunk size.  Even this doesn’t apply uniformly; larger requests will hit more chunks, and will benefit from a larger chunk size.</p>\n<h2 id=\"the-benchmarks\">The Benchmarks</h2>\n<p>Alright - enough with the details!  We’re going to run a simple benchmark to test how Cassandra performs with a mix of read and write requets with a simple key value data model.  We’ll be doing this using our stress tool, <a href=\"https://github.com/thelastpickle/tlp-stress\">tlp-stress</a> (commit 40cb2d28fde).  We will get into the details of this stress tool in a later post - for now all we need to cover is that it includes a key value workload out of the box we can leverage here.</p>\n<p>For this test I installed Apache Cassandra 3.11.3 on an AWS c5d.4xlarge instance running Ubuntu 16.04 following the instructions on cassandra.apache.org, and updated all the system packages using <code class=\"language-plaintext highlighter-rouge\">apt-get upgrade</code>.  I’m only using a single node here in order to isolate the compression settings and not introduce noise from the network overhead of running a full cluster.</p>\n<p>The ephemeral NVMe disk is using XFS and mounted it at <code class=\"language-plaintext highlighter-rouge\">/var/lib/cassandra</code>. I set readahead using <code class=\"language-plaintext highlighter-rouge\">blockdev  --setra 0 /dev/nvme1n1</code> so we can see the impact that compression has on our disk requests and not hide it with page cache.</p>\n<p>For each workload, I put the following command in a shell script, and ran tlp-stress from a separate c5d.4xlarge instance (passing the chunk size as the first parameter):</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>$ bin/tlp-stress run KeyValue -i 10B -p 10M --populate -t 4 \\\n  --replication \"{'class':'SimpleStrategy', 'replication_factor':1}\" \\\n  --field.keyvalue.value='book(100,200)' -r .5  \\\n  --compression \"{'chunk_length_in_kb': '$1', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}\" \\\n  --host 172.31.42.30\n</pre></div></div>\n<p>This runs a key value workload across 10 million partitions (<code class=\"language-plaintext highlighter-rouge\">-p 10M</code>), pre-populating the data (<code class=\"language-plaintext highlighter-rouge\">--populate</code>), with 50% reads (<code class=\"language-plaintext highlighter-rouge\">-r .5</code>), picking 100-200 words from of one of the books included in the stress tool (<code class=\"language-plaintext highlighter-rouge\">--field.keyvalue.value='book(100,200)'</code>).  We can specify a compression strategy using <code class=\"language-plaintext highlighter-rouge\">--compression</code>.</p>\n<p>For the test I’ve used slightly modified Cassandra configuration files to reduce the effect of GC pauses by increasing the total heap (12GB) as well as the new gen (6GB).  I spend a small amount of time on this as optimizing it perfectly isn’t necessary.  I also set compaction throughput to <code class=\"language-plaintext highlighter-rouge\">160</code>.</p>\n<p>For the test, I monitored the JVM’s allocate rate using the <a href=\"https://github.com/aragozin/jvm-tools\">Swiss Java Knife</a> (sjk-plus) and disk / network / cpu usage with dstat.</p>\n<h3 id=\"default-64kb-chunk-size\">Default 64KB Chunk Size</h3>\n<p>The first test used the default of 64KB chunk length.  I started the stress command and walked away to play with my dog for a bit.  When I came back, I was through about 35 million requests:</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/stress-64.png\" alt=\"stress 64kb\" /></p>\n<p>You can see in the above screenshot our 5 minute rate is about 22K writes / second and 22K reads/ second.  Looking at the output of dstat at this time, we can see we’re doing between 500 and 600MB / second of reads / second:</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/dstat-64.png\" alt=\"DStat 64KB\" /></p>\n<p>Memory allocation fluctuated a bit, but it hovered around 1GB/s:</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/sjk-64.png\" alt=\"sjk 4kb\" /></p>\n<p>Not the most amazing results in the world.  Of the disk reads, some of that throughput can be attributed to compaction, which we’ll always have to contend with in the real world.  That’s capped at 160MB/s, leaving around 400MB/s to handle reads.  That’s a lot considering we’re only sending 25MB across the network.  That means we’re doing over 15x the disk I/O than our network I/O.  We are very much disk bound in this workload.</p>\n<h3 id=\"4kb-chunk-size\">4KB Chunk Size</h3>\n<p>Let’s see if the 4KB chunk size does any better.  Before the test I shut down Cassandra, cleared the data directory, and started things back up.  I ran the same stress test above using the above shell script, passing 4 as the chunk size.  I once again played fetch with my dog for a bit and came back after around the same time as the previous test.</p>\n<p>Looking at the stress output, it’s immediately obvious there’s a significant improvement:</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/stress-4.png\" alt=\"stress\" /></p>\n<p>In almost every single metric reported by the metric library the test with 4KB outperforms the 64KB test.  Our throughput is better (62K ops / second vs 44K ops / second in the 1 minute rate), and our p99 for reads is better (13ms vs 24ms).</p>\n<p>If we’re doing less I/O on each request, how does that impact our total disk and network I/O?</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/dstat-4.png\" alt=\"dstat 4kb\" /></p>\n<p>As you can see above, there’s a massive improvement.  Disk I/O is significantly reduced from making smaller (but more) requests to disk, and our network I/O is significantly higher from responding to more requests.</p>\n<p><img src=\"https://thelastpickle.com/files/2018-08-08-compression-performance/sjk-4.png\" alt=\"sjk 4kb\" /></p>\n<p>It was initially a small surprise to see an increased heap allocation rate (because we’re reading WAY less data into memory), but this is simply the result of doing a lot more requests.  There are a lot of objects created in order to satisfy a request; far more than the number created to read the data off disk.  More requests results in higher allocation.  We’d want to ensure those objects don’t make it into the Old Gen as we go through <a href=\"https://thelastpickle.com/blog/2018/04/11/gc-tuning.html\">JVM tuning</a>.</p>\n<h3 id=\"off-heap-memory-usage\">Off Heap Memory Usage</h3>\n<p>The final thing to consider here is off heap memory usage.  Along side each compressed SSTable is compression metadata.  The compression files have names like <code class=\"language-plaintext highlighter-rouge\">na-9-big-CompressionInfo.db</code>.  The compression metadata is stored in memory, off the Cassandra heap.  The size of the offheap usage is directly proportional to the amount of chunks used.  More chunks = more space used.  More chunks are used when a smaller chunk size is used, hence more offheap memory is used to store the metadata for each chunk.  It’s important to understand this trade off.  A table using 4KB chunks will use 16 times the memory as one using 64KB chunks.</p>\n<p>In the example I used above the memory usage can be seen as follows:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>Compression metadata off heap memory used: 140544 \n</pre></div></div>\n<h3 id=\"changing-existing-tables\">Changing Existing Tables</h3>\n<p>Now that you can see how a smaller chunk size can benefit read heavy and mixed workloads, it’s time to try it out.  If you have a table you’d like to change the compression setting on, you can do the following at the cqlsh shell:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>cqlsh:tlp_stress&gt; alter table keyvalue with compression = {'sstable_compression': 'LZ4Compressor', 'chunk_length_kb': 4};\n</pre></div></div>\n<p>New SSTables that are written after this change is applied will use this setting, but existing SSTables won’t be rewritten automatically.  Because of this, you shouldn’t expect an immediate performance difference after applying this setting.  If you want to rewrite every SSTable immediately, you’ll need to do the following:</p>\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre>nodetool upgradesstables -a tlp_stress keyvalue\n</pre></div></div>\n<h2 id=\"conclusion\">Conclusion</h2>\n<p>The above is a single test demonstrating how a tuning compression settings can affect Cassandra performance in a significant way.  Using out of the box settings for compression on read heavy or mixed workloads will almost certainly put unnecessary strain on your disk while hurting your read performance.  I highly recommend taking the time to understand your workload and analyze your system resources to understand where your bottleneck is, as there is no absolute correct setting to use for every workload.</p>\n<p>Keep in mind the tradeoff between memory and chunk size as well.  When working with a memory constrained environment it may seem tempting to use 4KB chunks everywhere, but it’s important to understand that it’ll use more memory.  In these cases, it’s a good idea to start with smaller tables that are read from the most.</p>","id":"b16bdb40-b8c3-54e5-8afa-60ee43153f3d","title":"Apache Cassandra Performance Tuning - Compression with Mixed Workloads","origin_url":"https://thelastpickle.com/blog/2018/08/08/compression_performance.html","url":"https://thelastpickle.com/blog/2018/08/08/compression_performance.html","wallabag_created_at":"2019-12-02T18:51:11+00:00","published_at":null,"published_by":"['']","reading_time":11,"domain_name":"thelastpickle.com","preview_picture":"https://thelastpickle.com/android-chrome-192x192.png","tags":["cassandra","troubleshooting and tuning","performance"],"description":"This is our third post in our series on performance tuning with Apache Cassandra.  In our first post, we discussed how we can use Flame Graphs to visually diagnose performance problems.  In our second..."},{"content":"<p>There are multiple dimensions where <a data-mil=\"22348\" href=\"https://intellipaat.com/blog/tutorial/cassandra-tutorial/cassandra-overview/\" target=\"_blank\" rel=\"noopener\">Cassandra performance</a> can be tuned. Some of them are described below:<br /><strong>Write Operations:</strong><br />Commit log and data dirs (sstables) should be on different disks. Commit log uses sequential write however, if SSTables share the same drive with commit log , I/O contention between commit log &amp; SSTables may deteriorate commit log writes and SSTable reads.<br /> <br /><strong>Read Operations:</strong><br />A good rule of thumb is 4 concurrent_reads per processor core. May increase the value for systems with fast I/O storage.<br /> <br /><strong>Cassandra Compaction Contention:</strong><br />Reduce the frequency of memtable flush by increasing the memtable size or preventing too pre-mature flushing. Less frequent memtable flush results in fewer SSTables files and less compaction. Fewer compaction reduces SSTables I/O contention, and therefore improves read operations. Bigger memtables absorb more overwrites for updates to the same keys, and therefore accommodating more read/write operations between each flushes.<br /> <br /><strong>Memory Cache</strong>:<br />Do not increase Cassandra cache size unless there is enough physical memory (RAM). Avoid memory swapping at any cost.<br /> <br /><strong>Row Cache:</strong><br />The row cache holds the entire content of a row in memory. It provides data caching instead of reading data from the disk. good if column’s data is small so the cache is big enough to hold most of the hotspot data. Bad if column’s data is too large so the cache is not big enough to hold most of the hotspot data. It’s bad for high write/read ratios. By default, it is off. If hit ratio is below 30%, row cache should be disabled.<br /> <br /><strong>Key Cache Tuning:</strong><br />The key cache holds the location of data in memory for each column family. Its Effective if there are hot data spot &amp; cannot use row cache effectively because of the large column size. By default, Cassandra caches 200000 keys per column family. Use absolute number for <strong>keys_cached</strong> instead of percentage.<br /> <br /><strong>JVM:</strong><br />Minimum and Maximum Java Heap Size should be half of available physical memory. Size of young generation heap should be 1/4 of Java Heap. Do NOT increase the size without confirming there are enough available physical memory- Always reserves memory for OS File cache.<br />A detailed understanding of <a data-mil=\"22348\" href=\"https://intellipaat.com/blog/apache-cassandra-a-brief-intro/\" target=\"_blank\" rel=\"noopener\">Apache Cassandra</a> is available in this blog post for your perusal!</p>","id":"2f1edaee-2329-5d58-a6d0-6dce016c28f4","title":"Tuning Cassandra Performance","origin_url":"https://intellipaat.com/blog/tutorial/cassandra-tutorial/tuning-cassandra-performance/","url":"https://intellipaat.com/blog/tutorial/cassandra-tutorial/tuning-cassandra-performance/","wallabag_created_at":"2019-12-02T18:50:11+00:00","published_at":"2019-09-05T08:00:00+00:00","published_by":"['']","reading_time":1,"domain_name":"intellipaat.com","preview_picture":"https://intellipaat.com/blog/wp-content/uploads/2020/09/Certification-in-Bigdata-Analytics-IITG.jpg","tags":["cassandra","troubleshooting and tuning","performance"],"description":"There are multiple dimensions where Cassandra performance can be tuned. Some of them are described below:Write Operations:Commit log and data dirs (sstables) should be on different disks. Commit log u..."},{"content":"<p>In this topic, i will  cover the basics of general Apache Cassandra performance tuning: when to do performance tuning, how to avoid and identify problems, and methodologies to improve.</p><h4>When do you need to tune performance ?</h4><h4>optimizing:</h4><p>when things work but could be better. we want to get better performance.</p><h4><strong>Troubleshooting:</strong></h4><p>fixing a problem that impact performance could actually be broken, could just be slow in clusters, something broken can manifest as slow performance.</p><h4><strong>What are some examples of performance related complaints an admin might receive regarding Cassandra ?</strong></h4><p>Performance-related complaints:</p><ul><li> it’s slow.</li>\n<li>certain queries are slow.</li>\n<li>program X that uses the cluster is slow.</li>\n<li>A node went down.</li>\n</ul><h4>Latency, Throughput and the U.S.E Method:</h4><p>Bad methodology<br />how not to approach performance-related problems?</p>streetlight anti-method\n\trandom change anti-method\n\tblame someone else anti-method<h4>In performance tuning, what are we trying to improve ?</h4><p>latency – how long a cluster,node,server or I/O subsystem takes to respond to a request<br />throughput – how many transactions of a given size (or range) a cluster,node or I/O subsystem can complete in a given timeframe ?<br />how many operations by seconds our cluster node are processing ?<br />But we can’t forget COST !!</p><h4>How are latency and throughput related ?</h4><p>theoretically, they are independent of each other.<br />however, change in latency can have a proportional effect on throughput</p><h4>What causes a change in latency and throughput ?</h4><p>Understanding performance tuning:utilization,saturation,errors,availability</p><h4>What is utilization ? saturation ? errors ? availability ?</h4><p>Utilization – how heavily are the resources being stressed.<br />Saturation – the degree to which a resource has queued work it cannot service<br />Errors – recoverable failure or exception events in the course normal operation<br />Availability – whether a given resource can service work or not</p><h4>GOAL:</h4><p>What is the first step in achieving any performance <br /><strong>tuning goal ?</strong><br />setting a goal!<br />what are some examples of commonly heard cassandra performance tuning goals?<br />reads should be faster<br />writes to table x should be faster<br />the cluster should be able to complete x transactions per second</p><h4>what should a clearly defined performance goal take into account ?</h4><p>Writing SLA (service level agreement)<br />we need to know :<br />what is the type of operation or query ?<br />read or write workload<br />select, insert,update or delete<br />we need to understand latency: expressed as percentile<br />rank e.g 95th percentile read latency is 2 ms<br />throughput: operations per second<br />size: expressed in average bytes<br />We have to think about duration: expressed in minutes or hours<br />scope: keyspace, table, query<br />Example of SLA: “the cluster should be able to sustain<br />20000 2KB read operations per second from table X for<br />two hours with a 95th percentile read latency of 3 ms.”</p><h4>After setting a goal, how can achievement of a goal verified?</h4><p>timing tooks in your application<br />query tracing<br />jmeter test plan<br />customizable cassandra-stress</p><h4>Time in computer performance tuning:</h4><p>how long is a millisecond ?<br />why do we care about milliseconds ?</p><h4>Common latency timings in cassandra:</h4><pre>  reads from main memory should take between 36 and 130 microseconds&#13;\n  reads from an SSD should take between 100 microseconds and 12 milliseconds&#13;\n  reads from a Serial Attached SCSI rotational drive should take between 8 milliseconds and 40 milliseconds&#13;\n  reads from a SATA rotational drive take more than 15 milliseconds&#13;\n</pre><p><strong>Example:</strong><br /><strong> Workload characterization:</strong><br />classroom use case and cassandra story middle sized financial firm uses cassandra to manage distributed data 42 million stock quotes driven by a particular set of queries</p><h4>what queries drove this data model ?</h4><p>retrieve information for a specific stock trade by trade ID find all information about stock trades for a specific stock ticker and range timestamps find all information about stock trades that occurred on a specific date over a short period of time</p><h4>How do you characterize the workload ?</h4><p>what is the load being placed on your cluster<br />calling application or API<br />remote IP address</p><h4>Who is causing the load ?</h4><p>code path or stack trace</p><h4>Why is the load being called ?</h4><p>What are the load characteristics<br />throughput<br />Direction(read/write)<br />include variance<br />keyspace and column family<br />How do you characterize your workload ?<br />how is the load changing over time and is there<br />a daily pattern ?<br />is your workload read heavy or write heavy?<br />how big is your data ?<br />how much data on each node (bytes on node=data density)?<br />does active data fit in buffer cache ?</p><h4>Performance impact of Data Model: How does the data model affect performance ?</h4><p>poorly shaped rows (too narrow or too wide (we have partition to large)<br />hotspots (particular areas with a lot of reads/writes)<br />poor primary or secondary indexes<br />too many tombstones (lot of delete)</p><h4>So data model considerations:</h4><p>understand how primary key affects performance<br />take a look at query patterns and adjust how tables are modeled<br />see how replication factor and/or consistency level impact performance<br />change in compaction strategy can have a positive (or negative) impact<br />parallelize reads/writes if necessary<br />look at how moving infrequently accessed data can improve performance<br />see how per column family cache is having an impact<br />what is the relationship between the data model and cassandra’s read path optimizations (key/row cache,bloom filters, index) ?<br />nesting data (allows for greater degree of flexibility in the column family structure.) (keep all data to the same partition to satisfy a given query) but it can be easy to find model to keep most active data sets in cache, frequently accessed data which are in cache can improve performance.</p><h4>Methodologies:</h4><p>active performance tuning: we focus in the particular problem and we verify if it is fixed – suspect there’s a problem, isolate problem using tools, determine if problem is in cassandra ,environment or both.<br />verify problems and test for reproducibility,fix problems using tuning strategies, test, test and test again<br />Passive performance tuning : regular system “sanity checks”: looking some giving threshold,something we adjust for growth regularly monitor key health areas in cassandra/environment.<br />identify and tune for future growth/scalability.<br />apply tuning strategies as needed.<br /><strong> we have to use the USE Method as tool for troubleshooting</strong><br />this method gives us the methodology to look for on all components of the system not only one. it is the strategy defined by Brendan Gregg <a href=\"http://www.brendangregg.com/USEmethod/use-linux.html\" rel=\"nofollow\">http://www.brendangregg.com/USEmethod/use-linux.html</a>.</p><p>Also performs a health check of various system components to identity bottlenecks and errors<br />separated by components, type and metric to narrow scope and find location of problem.</p><p>what are two things performance tuning to improve ? latency and throuput.<br />what are two types of performance tuning methodologies ? active and passive</p><p>what tool can be used to get a performance baseline ? jmeter or cassandra-stress</p><h4><strong>Cassandra-stress:</strong></h4><p>Interpreting the output of cassandra-stress</p><p>Each line reports data for the interval between the last elapsed time and current elapsed time, which is set by the –progress-interval option (default 10 seconds).</p><pre>[hduser@base ~]$ cassandra-stress write -node 192.168.56.71 &#13;\nINFO  04:11:53 Did not find Netty's native epoll transport in the classpath, defaulting to NIO.&#13;\nINFO  04:11:56 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)&#13;\nINFO  04:11:56 New Cassandra host /192.168.56.71:9042 added&#13;\nINFO  04:11:56 New Cassandra host /192.168.56.72:9042 added&#13;\nINFO  04:11:56 New Cassandra host /192.168.56.73:9042 added&#13;\nINFO  04:11:56 New Cassandra host /192.168.56.74:9042 added&#13;\nConnected to cluster: Training_Cluster&#13;\nDatatacenter: datacenter1; Host: /192.168.56.71; Rack: rack1&#13;\nDatatacenter: datacenter1; Host: /192.168.56.72; Rack: rack1&#13;\nDatatacenter: datacenter1; Host: /192.168.56.73; Rack: rack1&#13;\nDatatacenter: datacenter1; Host: /192.168.56.74; Rack: rack1&#13;\nCreated keyspaces. Sleeping 1s for propagation.&#13;\nSleeping 2s...&#13;\nWarming up WRITE with 50000 iterations...&#13;\nFailed to connect over JMX; not collecting these stats&#13;\nWARNING: uncertainty mode (err&lt;) results in uneven workload between thread runs, so should be used for high level analysis only&#13;\nRunning with 4 threadCount&#13;\nRunning WRITE with 4 threads until stderr of mean &lt; 0.02&#13;\nFailed to connect over JMX; not collecting these stats&#13;\ntype,      total ops,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max ms,  sum ms,  sdv ms,      mb&#13;\ntotal,           402,     403,     403,     403,    10,4,     5,9,    31,4,    86,2,   132,7,   132,7,    1,0,  0,00000,      0,      0,       0,       0,       0,       0&#13;\ntotal,           979,     482,     482,     482,     8,1,     6,3,    17,5,    45,9,    88,3,    88,3,    2,2,  0,06272,      0,      0,       0,       0,       0,       0&#13;\ntotal,          1520,     530,     530,     530,     7,5,     6,4,    14,8,    24,3,   103,9,   103,9,    3,2,  0,07029,      0,      0,       0,       0,       0,       0&#13;\ntotal,          1844,     321,     321,     321,    11,9,     6,4,    33,0,   234,5,   248,5,   248,5,    4,2,  0,06134,      0,      0,       0,       0,       0,       0&#13;\ntotal,          2229,     360,     360,     360,    11,3,     5,4,    43,6,   127,6,   145,4,   145,4,    5,3,  0,06577,      0,      0,       0,       0,       0,       0&#13;\ntotal,          2457,     199,     199,     199,    20,2,     5,9,    82,2,   125,4,   203,6,   203,6,    6,4,  0,11009,      0,      0,       0,       0,       0,       0&#13;\ntotal,          2904,     443,     443,     443,     8,9,     7,0,    23,0,    37,3,    56,1,    56,1,    7,4,  0,09396,      0,      0,       0,       0,       0,       0&#13;\ntotal,          3246,     340,     340,     340,    11,7,     7,4,    41,6,    87,0,   101,0,   101,0,    8,5,  0,08625,      0,      0,       0,       0,       0,       0&#13;\ntotal,          3484,     235,     235,     235,    16,6,     7,2,    76,1,   151,8,   152,1,   152,1,    9,5,  0,09208,      0,      0,       0,       0,       0,       0&#13;\ntotal,          3679,     196,     196,     196,    19,5,     8,1,    86,0,   156,5,   174,8,   174,8,   10,5,  0,09960,      0,      0,       0,       0,       0,       0&#13;\ntotal,          4083,     369,     369,     369,    11,1,     7,3,    38,3,    90,8,   114,9,   114,9,   11,6,  0,09041,      0,      0,       0,       0,       0,       0&#13;\ntotal,          4411,     320,     320,     320,    11,4,     7,4,    39,7,    61,1,    93,6,    93,6,   12,6,  0,08422,      0,      0,       0,       0,       0,       0&#13;\ntotal,          4683,     227,     227,     227,    17,1,     6,0,    90,1,   153,3,   199,8,   199,8,   13,8,  0,08478,      0,      0,       0,       0,       0,       0&#13;\ntotal,          5131,     445,     445,     445,     8,9,     7,6,    19,3,    26,8,    50,8,    50,8,   14,8,  0,07997,      0,      0,       0,       0,       0,       0&#13;\ntotal,          5661,     521,     521,     521,     7,5,     5,4,    17,5,    63,1,    89,2,    89,2,   15,8,  0,07788,      0,      0,       0,       0,       0,       0&#13;\ntotal,          6179,     512,     512,     512,     7,7,     6,2,    16,7,    39,3,    59,1,    59,1,   16,8,  0,07578,      0,      0,       0,       0,       0,       0&#13;\ntotal,          6427,     245,     245,     245,    15,9,     6,1,    56,3,    94,0,   180,6,   180,6,   17,8,  0,07567,      0,      0,       0,       0,       0,       0&#13;\ntotal,          6831,     394,     394,     394,    10,2,     5,6,    39,0,    90,4,   111,7,   111,7,   18,9,  0,07129,      0,      0,       0,       0,       0,       0&#13;\ntotal,          7071,     235,     235,     235,    16,9,     7,4,    58,3,   149,7,   235,3,   235,3,   19,9,  0,07150,      0,      0,       0,       0,       0,       0&#13;\ntotal,          7532,     455,     455,     455,     8,7,     6,1,    17,2,    90,5,   142,2,   142,2,   20,9,  0,06840,      0,      0,       0,       0,       0,       0&#13;\ntotal,          7890,     353,     353,     353,    10,9,     7,3,    35,4,    80,6,   149,5,   149,5,   21,9,  0,06532,      0,      0,       0,       0,       0,       0&#13;\ntotal,          8172,     288,     288,     288,    13,6,     7,1,    45,6,    89,1,    89,7,    89,7,   22,9,  0,06374,      0,      0,       0,       0,       0,       0&#13;\ntotal,          8355,     171,     171,     171,    22,7,     9,1,    82,5,   137,7,   151,5,   151,5,   24,0,  0,06656,      0,      0,       0,       0,       0,       0&#13;\ntotal,          8614,     235,     235,     235,    16,9,     8,0,    56,4,    98,7,   139,5,   139,5,   25,1,  0,06622,      0,      0,       0,       0,       0,       0&#13;\ntotal,          9027,     402,     402,     402,     9,6,     8,5,    20,7,    26,5,    30,8,    30,8,   26,1,  0,06346,      0,      0,       0,       0,       0,       0&#13;\ntotal,          9496,     463,     463,     463,     8,5,     7,6,    17,4,    23,2,    30,1,    30,1,   27,1,  0,06139,      0,      0,       0,       0,       0,       0&#13;\ntotal,          9912,     408,     408,     408,     9,6,     7,5,    23,4,    33,0,    38,0,    38,0,   28,1,  0,05903,      0,      0,       0,       0,       0,       0&#13;\ntotal,         10275,     359,     359,     359,    11,0,     8,8,    26,4,    33,7,    44,7,    44,7,   29,1,  0,05693,      0,      0,       0,       0,       0,       0&#13;\ntotal,         10528,     251,     251,     251,    15,6,     9,5,    45,4,   176,1,   295,9,   295,9,   30,1,  0,05602,      0,      0,       0,       0,       0,       0&#13;\ntotal,         10711,     181,     181,     181,    21,1,     7,8,    53,8,   340,8,   396,8,   396,8,   31,1,  0,05597,      0,      0,       0,       0,       0,       0&#13;\ntotal,         10947,     233,     233,     233,    17,3,     9,3,    55,5,    94,7,   123,7,   123,7,   32,1,  0,05584,      0,      0,       0,       0,       0,       0&#13;\ntotal,         11130,     177,     177,     177,    21,0,    10,6,    86,7,   115,6,   151,9,   151,9,   33,2,  0,05706,     &#13;\n...&#13;\n&#13;\nResults:&#13;\nop rate                   : 388 [WRITE:388]&#13;\npartition rate            : 388 [WRITE:388]&#13;\nrow rate                  : 388 [WRITE:388]&#13;\nlatency mean              : 10,2 [WRITE:10,2]&#13;\nlatency median            : 7,1 [WRITE:7,1]&#13;\nlatency 95th percentile   : 25,9 [WRITE:25,9]&#13;\nlatency 99th percentile   : 71,2 [WRITE:71,2]&#13;\nlatency 99.9th percentile : 150,3 [WRITE:150,3]&#13;\nlatency max               : 396,8 [WRITE:396,8]&#13;\nTotal partitions          : 63058 [WRITE:63058]&#13;\nTotal errors              : 0 [WRITE:0]&#13;\ntotal gc count            : 0&#13;\ntotal gc mb               : 0&#13;\ntotal gc time (s)         : 0&#13;\navg gc time(ms)           : NaN&#13;\nstdev gc time(ms)         : 0&#13;\nTotal operation time      : 00:02:42&#13;\nSleeping for 15s&#13;\n&#13;\n&#13;\nData\t                            Description&#13;\n--------------------------------------------------------------------------------------------------&#13;\ntotal\t                           :Total number of operations since the start of the test.&#13;\ninterval_op_rate\t           :Number of operations performed per second during the interval &#13;\n(default 10 seconds).&#13;\ninterval_key_rate\t           :Number of keys/rows read or written per second during the interval &#13;\n(normally be the same as interval_op_rate unless doing range slices).&#13;\nlatency\tAverage latency            : for each operation during that interval.&#13;\n95th                               :95% of the time the latency was less than the number displayed in the column &#13;\n(Cassandra 1.2 or later).&#13;\n99th                               :99% of the time the latency was less than the number displayed in the column &#13;\n(Cassandra 1.2 or later).&#13;\nelapsed\tNumber of seconds          :elapsed since the beginning of the test.&#13;\n</pre><h4><strong>Cassandra tuning</strong>:</h4><p>successfull performance tuning is all about understanding.<br />understanding what some of metrics means.<br />how the software which we are tuning is architecture ?<br />in the distributed system we have to know how the software of one node<br />is working together with software of others nodes ?<br />that is for the next section come from :<br />how different pieces of cassandra can be put together ?</p><p>we can be talk in about cassandra, data model.<br />how we can get metric and how they can be performing.</p><p>The next section, we talk about environment tuning: JVM and operating system<br />The next section will be focussing on disk tuning and compaction tuning</p><h4><strong>Examine cluster and node health and tuning:</strong></h4><p>Discuss table design and tuning: successfull performance tuning is all about understanding.<br />understanding what kind of some metrics means ? how the software which we are<br />tuning is architecture ? with the distributed system, we have to how the software in one node works together with the software with others nodes?<br />some metrics we will expose<br />we are talking about cassandra, data model then we will talk about the environment tuning as the JVM and the operating system, also we will discuss disk tuning and compaction tuning</p><p>let’s talk about cluster and node tuning:what activities are in a cassandra cluster<br />happen between nodes ?<br />what happening on the network ? when we dig into some performance problem, we will look into all nodes not only one:<br />answers are: coordinator,gossip,replication,repair,read repair,bootstrapping,<br />node removal, node decommissioning.<br />we have to dig not only one node but additional node</p><p>Deeling in one node, whay does a cassandra node do ? read performance, write,<br />monitor,participate in the clusters, maintain consistency.<br />internally cassandra has lot of things to do, there is a architecture that<br />how is it organize ? how does cassandra organize all of that work ?<br />the answer of this is SEDA (staged event driven architecture), so we have<br />several thread pool and the messaging service to know how the queue works.</p><p><strong>What is a thread pool ?</strong><br />example, we have thread pool:workers:6</p><pre>                                                                                              worker thread&#13;\n  task          queue:max pending tasks:7                worker thread&#13;\n  task task  task task  task  task task task task =&gt;     worker thread &#13;\n  task                                task               worker thread&#13;\n  blocked tasks&#13;\n</pre><h4><strong>what are cassandra’s thread pools ?</strong></h4><p>read readstage:32</p><p>write (mutationstage,flushwriter,memtablepostflusher,countermutation,<br />migrationstage)<br />monitor (memorymeter:1,tracing)<br />participate in cluster (requestresponsestage:%CPUs,pendingrangecalculator:1,gossipstage:1)<br />maintain consistency(commitlogarchiver:1,miscstage:1 snapshoting/replicate data after node remove)<br />antientropystage:1 repair consistency – merkle tree build,internal responsestage:#CPUs,HintedHanoff:1,readrepairstage:#CPUs)<br />we will know , for example the single thread on monitor (memorymeter:1)<br />which one has the configurable number of threads ? its read (readstage:32*)</p><p>the utility like nodetool tpstats which can give us metrics inside into , how much work each these pools are doing ? how many appending, active blocks:<br />active – number of messages pulled off the queue,currently being processed by a thread.<br />pending – number of messages in queue waiting for a thread.<br />completed – number of messages completed.<br />blocked -when a pool reaches its max thread count it will begin queuing.<br />until the max size is reached. when this is reached it will block until there is room in the queue.<br />total blocked/all time blocked – total number of messages that have been blocked.</p><p><strong>cassandra thread pools: multi-threaded stages:</strong><br />readstage (affected by main memory,disk) -perform local reads<br />mutationstage (affected by CPU,main memory,disk) – perform local insert/update,<br />schema merge,commit log replay,hints in progress)<br />requestresponsestage (affected by network,other nodes) – when a response to a request is received this is the stage used to execute any callbacks that were created with the original request.<br />flushwriter (affected by CPU,disk) -sort and write memtables to disk<br />hintedhandoo (one thread per host being sent hints,affected by disk,network,others nodes) –<br />sends missing mutations to others nodes<br />memorymeter (several separate threads) – measure memory usage and live ratio of a memtable.<br />readrepairstage (affected by network,others nodes) -perform read repair<br />countermutation (formerly replicateOnWriteStage) -performs counter writes on non-coordinator nodes and replicates after a local write<br />internalresponsestage -responds to non-client initiated messages, including bootstrapping and schema checking.</p><p><strong>single-threaded stages:</strong><br />gossipstage (affected by network) – gossip communication<br />antientropystage(affected by network,other nodes) – build merkle tree and repair consistency<br />migrationstage(affected by network,other nodes) – make schema changes<br />miscstage (affected by disk,network,pthers nodes) – snapshotting,replicating data after node remove<br />memtablepostflusher (affected by disk) -operations after flushing the memtable.<br />Discard commit log files that have all data in them persisted to sstables.<br />flush non-column family backed secondary indexes<br />tracing -for query tracing<br />commitlogarchiver (formally commitlog_archiver) -back up or restore a commit log.</p><p><strong>Messages types:</strong></p><p>handled by the readstage thread pool<br />(read: read data from cache or disk,range_slice: read a range of data,paged_range:<br />read part of a range of data)<br />)</p><p>handled by the mutationstage thread pool: (mutation:write (insert or update) data,<br />counter_mutation:changes counter columns,read_repair:update out-of-sync data discovered during a read)</p><p>handled by readresponsestage:(request_response:respond to a coordinator,_trace:<br />used to trace a query (enable tracing or every(trace probability) queries)<br />binary:deprecated</p><p><strong>Question:why are some messages types “droppable” ?</strong><br />if the message set in one of the queue to long and the to long will be decide by the timeout by the cassandra.yaml, it will be dropped.<br />why cassandra will do that ? why cassandra does not do my work ?<br />if the node has a suffisant resource to do the job, it does not dropped it.<br />in practice, if i have a read to give to cassandra and the timeout is elapsed and it timeout. what does the coordinator in this situation do ? it depends, maybe it can be satisfy by the consistency level to another node, then the consistency level gives the query or the return the timeout then back to the client if it is the to high consistency level.<br />In the read, there is not the deal, usally we can reexecute the query.</p><p>however we have another mutation (insert/delete/update) they can be to the queue to long (2 seconds) then no thing is done, <br />so what happen ? the coordinator can be experienced the<br />timeout, may be it store the hint, may be it can be back to the driver to be retry, we<br />don’t know, but the important thing some recovery actions can be taking in this point.<br />still that node who has a write doesn’t perform, that thing like read repair or repair command can back result to the client<br /><strong>why are some messages types “droppable”</strong><br />some messages can be dropped to prioritize activities in cassandra when high resource contention occurs.<br />the number of these messages dropped is in the nodetool tpstats output<br />if you see dropped messages, you should investigate.<br />Question:What is the state of your cluster ? we have nodetool to gather information on our cluster or OpsCenter to gather information on our cluster.<br />nodetool gives us informations what happen right now, it is cummulatitve<br />but we do not know what happen last second,it is good for a general assessment<br />and also for comparaison between nodes (nodetool -h node0 tpstats, nodetool -h node0 compactionstats)</p><p>nodetool -h node0 netstats (we have to look for repair statistics) read repair<br />is when we are looking for discrepancy and if we find discrepancy between nodes, we will fixed by increasing cch (blocking), another read repair is the background, if we run the query probably consistency level 1 and we return no response, if we get a discrepancy. then we will increase the cch (background).</p><p><strong>too wide</strong>: We have partition too large<br /><strong>too narrow</strong>:partition is to small and our query takes row on all bunch of partition<br /><strong>hotspots:</strong> we have a customer who decides to design partition by country code then 90% of all rows are in US and the rest on all others countries<br /><strong>poor primary and secondary indexes:</strong><br /><strong>too many tombstones:</strong> we have a workload doing a lot of deleting and then reading data around the same partition, those things can have a huge impact on read performance.</p><p><strong>OpsCenter</strong></p><p>Changing logging levels with nodetool setlogginglevel:<br />nodetool getlogginglevels: use to get the current runtime logging levels</p><pre>root@ds220:~# nodetool getlogginglevels&#13;\nLogger Name                                                                 Log Level&#13;\nROOT                                                                        INFO&#13;\nDroppedAuditEventLogger                                                     INFO&#13;\nSLF4JAuditWriter                                                            INFO&#13;\ncom.cryptsoft                                                               OFF&#13;\ncom.thinkaurelius.thrift                                                    ERROR&#13;\norg.apache.lucene.index                                                     INFO&#13;\norg.apache.solr.core.CassandraSolrConfig                                    WARN&#13;\norg.apache.solr.core.RequestHandlers                                        WARN&#13;\norg.apache.solr.core.SolrCore                                               WARN&#13;\norg.apache.solr.handler.component                                           WARN&#13;\norg.apache.solr.search.SolrIndexSearcher                                    WARN&#13;\norg.apache.solr.update                                                      WARN&#13;\n</pre><p><strong>nodetool setlogginglevel</strong>: used to set logging level for a service can be used instead of modifying the logback.xml file possible levels:</p><pre>ALL&#13;\nTRACE&#13;\nDEBUG&#13;\nINFO&#13;\nWARN&#13;\nERROR&#13;\nOFF&#13;\n</pre><p>we can increase the level logging for example for namespace only to DEBUG or even TRACE</p><h4><strong>Data Model Tuning</strong>:</h4><p>One off the key component of cassandra is the data model.<br />for performance tuning, we have the workload which place on the database. it is in the given place on cassandra table, that where the data model come .then we have software tuning (OS,JVM, Cassandra) and Hardware.<br />in many cases, updating hardware is an option.</p><p>When we need to diagnose the data model, cassandra provides many tools to know how our table is performing, how the query is performing.<br />Here is one <strong>nodetool cfstats, additionaly nodetool cfhistograms</strong></p><p>when we identify the query which we may suspect, we have CQL tracing but we also have <strong>nodetool settraceprobability</strong><br /><strong>nodetool cfstats example:</strong><br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png\" rel=\"attachment wp-att-248\"><img data-attachment-id=\"248\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfstats1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png\" data-orig-size=\"452,423\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfstats1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=452\" class=\"alignnone wp-image-248 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=529\" alt=\"cfstats1\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfstats1.png 452w, https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfstats1.png?w=300 300w\" /></a><br />This one below is the aggregate number of the keyspace not for this particular table:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png\" rel=\"attachment wp-att-249\"><img data-attachment-id=\"249\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfstats2/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png\" data-orig-size=\"332,76\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfstats2\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=332\" class=\"alignnone wp-image-249 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=529\" alt=\"cfstats2\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfstats2.png 332w, https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfstats2.png?w=300 300w\" /></a><br />Then below the aggregate number of the table stock only:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png\" rel=\"attachment wp-att-250\"><img data-attachment-id=\"250\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfstats3/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png\" data-orig-size=\"456,359\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfstats3\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=456\" class=\"alignnone wp-image-250 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=529\" alt=\"cfstats3\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfstats3.png 456w, https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfstats3.png?w=300 300w\" /></a><br />Here we have 13 sstable on the disk for this table.<br />How much data we are store on the disk with the metrics space used? in this case we have live bytes and the total bytes is the same, that means in this particular table, we don’t have delete, ever update only insert ,however, if I do some delete or update, we will see the difference between live and total space used, the total number will be bigger than the live.<br />Off heap memory used (1928102) shows us how many memory this table are using. We have also other couple of metrics as bloom filter space used, bytes: 7088, and Bloom filter off heap memory used, bytes: 6984, we also have Index summary off heap memory used: 390, and compression metadata off heap memory used, bytes : 1920728, and memTable data size, bytes (36389573)<br /><strong> Conclusion:</strong> If I look at these numbers:<br /><strong>1928102 + 7088 + 6984 + 1920728 +36389573=40252475 bytes (39 MB)</strong></p><p>I can know how much memory of ram are consuming by this table.</p><p>SSTable compression ratio: 0.173624: if I enable the compression like here. We have 17 % of compression of this table.<br />Number of keys: 1664 this is counting just the partition, so if we have a simple table schema , only have a partition key no cluster column , this number (1664) will be a rough number of rows on a given server.<br />These all metrics can be different on every server, so we have to go on each server to look at.</p><p>Memtable cell count (126342), memTable data size, bytes (36389573) and memTable switch count (107). MemTable data size (36389573) , this is a data which is storing in memory currently and they will go to flush on the disk. For memTable switch count, every time I flush memTable data to disk, this metric will increase. Memtable cell count indicates how many cell we have in memory.if we divide that by column count looking in schema definition (example 5 columns), 126342/5 = 25268,4, we can know approximativement the number of rows because Cassandra stores cell else the rows.</p><p>Local read count, local read latency, local write count and local write latency show us metrics read, write for this particular table. Local read count and local write count give us the indication to know if we have read heavy or write heavy. For our example, we have write heavy (4352651 greater than 942517).</p><p> Pending tasks: 0 this is counting any tasks pending, we saw that earlier on the tpstats.</p><p> Bloom filter false positives, bloom filter false ratio, bloom filter space used, bytes, bloom filter off heap memory used,bytes: the main metric here is the false ratio (0.0000) that takes the number of bloom filter false positives (48566) divided by the number of local read count (942517) , that can give us the false ratio. What did we pay to get 48566 of bloom filter false positives to get 942517 local read count? We paid these memory ram bloom filter space used, bytes: 7088 plus bloom filter off heap memory used, bytes: 6984, total of these memories: 7088 + 6984 =14072 bytes approximatively: 14 KB of RAM to get false positives to avoid unnecessary disk seek. If it is unacceptable, we can tune it by paying more memory and to get lower false positives and consequently less fewer unnecessary less disk seek, so read performance will go UP.</p><p>Index summary off heap memory used, bytes: 390: Is a memory structure we use to help us to jump to correct place in the partition index in memory. In this case we paid 390 bytes of memory that is tunable also but in Cassandra 2.1 we cannot adjust it automatically.</p><p>Compression metadata off heap memory used, bytes: 1920728 bytes, it is the amount of ram consume for compression. We can look at also.</p><p>Compacted partition minimum bytes, compacted partition maximum bytes, compacted partition mean bytes: very useful metrics for getting indication for how we have setup our partition. What I mean by that ? if we look at compacted partition minimum bytes (35426 bytes = 35 KB) and compacted partition mean bytes (122344216 bytes =116.68 MB) and we look at the maximum bytes (557074610 bytes =531,28 MB), we have the idea of the distribution of the size of the partition. If we look at the number of means bytes (122344216 bytes =116.68 MB), there is a huge discrepancy between the min and the max, probably we have the good indication of the hotspot of the data , the partition is abnormally much larger than the other partitions.</p><p>Average live cells per slice, average tombstones per slice: these metrics are only useful on production.<br />Average live cells per slice (last five minutes): 2.0: it has been reading on average 2.0 and 0.0 for tombstones, that good. If it is for example 2.0 for tombstones, that means for every 2.0 last five minutes for read, probably it pick at 2.0 tombstones that gives 50% over read performance to tombstones. Unfortunately, this number of tombstones cannot be the same. That the good indication that we have exception large delete of data.<br />For all these metrics we have some tunable.</p><p><strong>Notes:</strong><br />The <strong>bloom_filter_fp_chance</strong> and <strong>read_repair_chance</strong> control two different things. Usually you would leave them set to their default values, which should work well for most typical use cases.<br /><strong> bloom_filter_fp_chance</strong>: controls the precision of the bloom filter data for SSTables stored on disk. The bloom filter is kept in memory and when you do a read, Cassandra will check the bloom filters to see which SSTables might have data for the key you are reading. A bloom filter will often give false positives and when you actually read the SSTable, it turns out that the key does not exist in the SSTable and reading it waste the time. The better the precision used for the bloom filter, the fewer false positives it will give (but the more memory it will need).<br /><strong>From the documentation:</strong><br />0 Enables the unmodified, effectively the largest possible, Bloom filter<br />1.0 Disables the Bloom Filter<br />The recommended setting is 0.1. A higher value yields diminishing returns.<br />So a higher number gives a higher chance of a false positive (fp) when reading the bloom filter.<br />read_repair_chance: controls the probability that a read of a key will be checked against the other replicas for that key. This is useful if your system has frequent downtime of the nodes resulting in data getting out of sync. If you do a lot of reads, then the read repair will slowly bring the data back into sync as you do reads without having to run a full repair on the nodes. Higher settings will cause more background read repairs and consume more resources, but would sync the data more quickly as you do reads.<br />See documentation on this link: <a href=\"http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html\" rel=\"nofollow\">http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html</a><br />You can read also this document:<br /><a href=\"http://www.datastax.com/dev/blog/common-mistakes-and-misconceptions\" rel=\"nofollow\">http://www.datastax.com/dev/blog/common-mistakes-and-misconceptions</a><br />Here is the description of that table:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png\" rel=\"attachment wp-att-251\"><img data-attachment-id=\"251\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/desc-table1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png\" data-orig-size=\"474,408\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"desc-table1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=474\" class=\"alignnone wp-image-251 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=529\" alt=\"desc-table1\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/desc-table1.png 474w, https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/desc-table1.png?w=300 300w\" /></a><br />Below we have tunable parameters for this table:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png\" rel=\"attachment wp-att-252\"><img data-attachment-id=\"252\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/desc-table2/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png\" data-orig-size=\"473,185\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"desc-table2\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=473\" class=\"alignnone wp-image-252 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=529\" alt=\"desc-table2\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/desc-table2.png 473w, https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/desc-table2.png?w=300 300w\" /></a><br />These parameters influence directly the metrics we saw earlier.<br />Example:<br /><strong> bloom_filter_fp_chance</strong>=0.010000, that impact how many false positives we got and how much memory were are going to give up to low or high.<br />c<strong>aching=’KEYS_ONLY’:</strong> In this case it is key only, we just go to cache only key but we have the ability to cache rows cache , so we will increase the memory utilization of this table, that also have the big impact.<br /><strong> dclocal_read_repair_chance</strong>=0.100000 and <strong>read_repair_chance</strong>=0.00000: these parameters influence the no blocking shows in tpstats.<br /><strong> gc_grace_seconds</strong>=864000: it impacts the tombstones, how long we can hold on the tombstones but it impacts the last line we saw earlier (average tombstones per slice (last five minutes), so it is tunable.<br /><strong> index_interval</strong>=128: in Cassandra 2.1, a max and min interval, those will impact index summary off memory used, bytes saw earlier. That is tunable.<br /><strong> Populate_io_cache_on_flush</strong>=’false’: in Cassandra 2.0 it allows us to populate if we want to flush to disk and say.<br /><strong> MemTable_flush_period_in_ms</strong>=0 if we want to flush on the disk by scheduling, most people do that.<br /><strong> Compression</strong>= {‘sstable_compression’:’LZ4compressor’}: that can be impact the compression ratio we saw early. If we want, we can tune off the compression. Why do we change that? May be we have the high compression ratio probably trading off CPU cycle, we can change it.<br /><strong> Speculative_retry</strong>=’99.0PERCENTILE’: this is not impacting in the tpstats output. When we are performing a read, we have additional replicat could be used to feel the request, Cassandra will wait for this long 99.0 % , if it is the quorum read each replicat will go to ask others replicat to get the data , Cassandra will wait this long 99.0 PERCENTILE in terme in milliseconds.</p><h4><strong>NODETOOL CFHISTOGRAMS:</strong></h4><p>Let‘s go to node0 cfhistograms: in Cassandra 2.0, it is pretty long, it gives us very fine bucket. Also we have a new format for percentile sstables writes latency ,read latency , partition size , cell count<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png\" rel=\"attachment wp-att-237\"><img data-attachment-id=\"237\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png\" data-orig-size=\"353,456\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=232\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=353\" class=\"alignnone wp-image-237 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=529\" alt=\"cfhistogram1\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png 353w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=116 116w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram1.png?w=232 232w\" /></a><br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png\" rel=\"attachment wp-att-238\"><img data-attachment-id=\"238\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram2/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png\" data-orig-size=\"563,130\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram2\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=529\" class=\"alignnone wp-image-238 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=529\" alt=\"cfhistogram2\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=529 529w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png?w=300 300w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram2.png 563w\" /></a><br />How do we read us?<br />sstables per read:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png\" rel=\"attachment wp-att-239\"><img data-attachment-id=\"239\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram3/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png\" data-orig-size=\"137,78\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram3\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png?w=137\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png?w=137\" class=\"alignnone wp-image-239 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram3.png?w=529\" alt=\"cfhistogram3\" /></a><br />31261 read will be satisfy by only one sstable but we have 240663 read operations will be satisfy by adding another sstable, that two disk seeks so the performance is there, then when we go down here, we have 397454 read operations by adding another sstables, we have three sstables, that three disk seeks. When we go, the slow the performance get, we have the performance problem,we need to get inside to know how it is performing.<br />Generally if we have this problem, it usually the function of compaction that getting behind. Compaction is the low priority task, it contains disk io, we don’t do that when we have a huge activity on the cluster.<br />Consequence: we have to read on the sstables to satisfy more read.<br />We see here on the five sstables we have 272481 read operations to satisfy read that is the problem.<br />Next is the write latency:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png\" rel=\"attachment wp-att-240\"><img data-attachment-id=\"240\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram4/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png\" data-orig-size=\"355,337\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram4\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=355\" class=\"alignnone wp-image-240 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=529\" alt=\"cfhistogram4\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png 355w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram4.png?w=300 300w\" /></a><br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png\" rel=\"attachment wp-att-241\"><img data-attachment-id=\"241\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram5/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png\" data-orig-size=\"419,365\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram5\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=419\" class=\"alignnone wp-image-241 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=529\" alt=\"cfhistogram5\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png 419w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram5.png?w=300 300w\" /></a><br />We saw in cfstats, local read latency: 0.000 ms and local write latency: 0.000 ms.<br />In write latency, us is the microseconds, here we see 2415947 writes operations will be completed in 50 us (microseconds) , this is the buck, my write operations completed in 50 us, so we have a long tail , then when we come down on write latency, the max was 126934 us:1<br />For the read latency, 50 us (microseconds) of read operations has 12976, this is a buck for read operations .this is proof that with Cassandra write latency is much much lower than read latency.<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png\" rel=\"attachment wp-att-242\"><img data-attachment-id=\"242\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram6/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png\" data-orig-size=\"236,314\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram6\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png?w=225\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png?w=236\" class=\"alignnone wp-image-242 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png?w=529\" alt=\"cfhistogram6\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png 236w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram6.png?w=113 113w\" /></a><br />Another trend is the notion of two bumps:<br />Here is the first bump:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png\" rel=\"attachment wp-att-243\"><img data-attachment-id=\"243\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram7/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png\" data-orig-size=\"245,176\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram7\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png?w=245\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png?w=245\" class=\"alignnone wp-image-243 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png?w=529\" alt=\"cfhistogram7\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png 245w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram7.png?w=150 150w\" /></a><br />Second bump:<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png\" rel=\"attachment wp-att-244\"><img data-attachment-id=\"244\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram8/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png\" data-orig-size=\"225,289\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram8\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png?w=225\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png?w=225\" class=\"alignnone wp-image-244 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png?w=529\" alt=\"cfhistogram8\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png 225w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram8.png?w=117 117w\" /></a><br />This is an indication of read from ram for the first bump and read from disk for the second bump.<br /><strong> Partition size</strong>:<br />So we have fine great output for partition size, these data generally is representative of cfstats by the metrics follow (compacted partition minimum bytes, compacted partition maximum bytes and compacted partition mean bytes).<br />We can see here we have one partition with 42Kb and at the end my largest has 25109160 bytes (24MB). 0, 1, 2 mean we create the bucket.<br /><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png\" rel=\"attachment wp-att-245\"><img data-attachment-id=\"245\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram9/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png\" data-orig-size=\"184,623\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram9\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png?w=89\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png?w=184\" class=\"alignnone wp-image-245 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png?w=529\" alt=\"cfhistogram9\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png 184w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram9.png?w=44 44w\" /></a></p><p><strong>Cell count per partition</strong>:<br />Sometimes people have large volume of data in each cell this definitively influence number of partition size but we have a large data and the small number of cells (0, 1, 2) . Combining information below with which we saw on the partition size, we can see how the data is lay out on the term of data model. That is looking at the table schema</p><p><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png\" rel=\"attachment wp-att-246\"><img data-attachment-id=\"246\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram10/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png\" data-orig-size=\"197,698\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram10\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png?w=85\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png?w=197\" class=\"alignnone wp-image-246 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png?w=529\" alt=\"cfhistogram10\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png 197w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram10.png?w=42 42w\" /></a><br />So this line means 3 of my partition have <strong>4866323</strong> cells</p><p><a href=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png\" rel=\"attachment wp-att-247\"><img data-attachment-id=\"247\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/cfhistogram11/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png\" data-orig-size=\"569,155\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cfhistogram11\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=529\" class=\"alignnone wp-image-247 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=529\" alt=\"cfhistogram11\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=529 529w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png?w=300 300w, https://dmngaya.files.wordpress.com/2015/12/cfhistogram11.png 569w\" /></a></p><p><strong>Quiz:</strong><br />Which of the following information does CQL display for a query trace? Elapsed time<br />Which of the following components would you not find statistics for in a nodetool cfstats output? row cache<br />nodetool cfhistograms shows per-keyspace statistics for read and write latency? false only for the table.</p><h4>Environment Tuning:</h4><p>Some we have to tune others components. For example:<br />The most common bottlenecks in Cassandra? Performance checkpoint, things that slowing down the performance of the system.<br />The most common bottlenecks are:<br />Inadequate hardware.<br />Poorly configured JVM parameters: Cassandra runs on top on this virtual machine.<br />High CPU utilization: Particularly for write workload which we can see the Cassandra CPU bond, CPU pick..<br /><strong> Insufficient or incorrect memory cache tuning</strong>: This referring for read. This point is very outside of Cassandra.<br />I like to drive this point.<br />Question: What is the best way to cope with inadequate node hardware in a Cassandra cluster?<br />Lot of people ask, we have a machine then we want the performance then what can we do?<br />Upgrade the hardware. This is the bad new.<br />If we don’t get adequate equipment for the job, it is not a good idea to do the job.<br />We could say to add another node if the existing machine not performing.</p><p>What are some of the technologies upon which a Cassandra node depends?<br />Java, JVM, JNA, JMX and a bunch of others stuff that starts with “j”</p><h4>JVM:</h4><p>Cassandra is the java program, we will know how the jvm works, so we can tune it.<br />When we can for tuning, JVM has a lot options.<br />If we run this command:</p><pre>java -XX:+PrintFlagsFinal</pre><p>We will see all of the options.<br />JVM and Garbage Collection (GC): <strong>what is Garbage collection?</strong> Java gives the developer to allocate and deallocate memory. Java can complain all of the resources on the machine, lot of ram, CPU, IO.<br /><strong> JVM generational Heap</strong>:<br />When we start the java process, jvm will allocate a big chunk of ram and that is the chunk will be managed can be half. Inside that chunk this is a heap, it divides in several pieces, we have the new gen, old gen and perm gen. after java 8, perm gen change. What is does? It stores all of classe definitions.<br />Important thing that Cassandra boot load all code in most part of perm gen which could not change in size.</p><p><a href=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png\" rel=\"attachment wp-att-253\"><img data-attachment-id=\"253\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/jvm1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png\" data-orig-size=\"810,213\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"jvm1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=529\" class=\"alignnone wp-image-253 size-full\" src=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=529\" alt=\"jvm1\" srcset=\"https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=529 529w, https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=150 150w, https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=300 300w, https://dmngaya.files.wordpress.com/2015/12/jvm1.png?w=768 768w, https://dmngaya.files.wordpress.com/2015/12/jvm1.png 810w\" /></a><br />With Cassandra, we can do anything in perm gen.<br />But in New gen, we have Eden and survival spaces. So, when we create a new objects any kind of objects, that object will be created in Eden. Let say we created it inside the function then it will be created in the Eden and as the space of eden fills Up with objects allocations that can the garbage collection kik send , that the first garbage collector.<br />This piece (new gen) is called parallel new.<br />Important thing is if Eden fills up, we need to look for garbage, we will find and drop all objects which will not references and deallocate them. However, there will be others objects which are always references, they can be move to survival area in New gen.<br />And after the survival, this process could be promoted to Old gen.<br />There are lot of options to do how many time the objects can be on Eden or to be move to survival and also to Old gen. Lot of options.<br />This are (old gen), can be fill up with the options.<br />Take a look at this.<br />I have in my test cluster, 4 nodes working in different virtual machine.</p><pre>nodetool status&#13;\n[hduser@base cassandra]$ nodetool status&#13;\nDatacenter: datacenter1&#13;\n=======================&#13;\nStatus=Up/Down&#13;\n|/ State=Normal/Leaving/Joining/Moving&#13;\n--  Address        Load       Tokens       Owns    Host ID                               Rack&#13;\nUN  192.168.56.72  40.21 MB   256          ?       5ddb3532-70de-47b3-a9ca-9a8c9a70b186  rack1&#13;\nUN  192.168.56.73  50.88 MB   256          ?       ea5286bb-5b69-4ccc-b22c-474981a1f789  rack1&#13;\nUN  192.168.56.74  48.63 MB   256          ?       158812a5-8adb-4bfb-9a56-3ec235e76547  rack1&#13;\nUN  192.168.56.71  48.52 MB   256          ?       a42d792b-1620-4f41-8662-8e44c73c38d4  rack1&#13;\n</pre><p>Now we can do the command:</p><pre class=\"brush: css; title: ; notranslate\" title=\"\">cassandra-stress write -node 192.168.56.71</pre><p>Result:</p><pre>INFO 23:56:42 Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter &#13;\nname with DCAwareRoundRobinPolicy constructor) INFO 23:56:42 New Cassandra host /192.168.56.71:9042 added INFO 23:56:42 New Cassandra &#13;\nhost /192.168.56.72:9042 added INFO 23:56:42 New Cassandra host /192.168.56.73:9042 added INFO 23:56:42 New Cassandra host /192.168.56.74:9042 &#13;\nadded Connected to cluster: Training_Cluster Datatacenter: datacenter1; Host: /192.168.56.71; Rack: rack1 Datatacenter: datacenter1; &#13;\nHost: /192.168.56.72; Rack: rack1 Datatacenter: datacenter1; Host: /192.168.56.73; Rack: rack1 Datatacenter: datacenter1; Host: /192.168.56.74; &#13;\nRack: rack1 Created keyspaces. Sleeping 1s for propagation. Sleeping 2s... Warming up WRITE with 50000 iterations... Failed to connect over JMX; &#13;\nnot collecting these stats WARNING: uncertainty mode (err&lt;) results in uneven workload between thread runs, so should be used for high level &#13;\nanalysis only Running with 4 threadCount Running WRITE with 4 threads until stderr of mean &lt; 0.02 Failed to connect over JMX; not collecting &#13;\nthese stats type, &#13;\ntotal ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb &#13;\ntotal, 2086, 2086, 2086, 2086, 1,9, 1,5, 4,2, 7,0, 46,4, 58,0, 1,0, 0,00000, 0, 0, 0, 0, 0, 0 &#13;\ntotal, 4122, 2029, 2029, 2029, 2,0, 1,6, 4,8, 8,0, 14,0, 15,3, 2,0, 0,02617, 0, 0, 0, 0, 0, 0 &#13;\ntotal, 6171, 2029, 2029, 2029, 1,9, 1,5, 5,1, 7,6, 12,0, 13,6, 3,0, 0,02038, 0, 0, 0, 0, 0, 0 &#13;\ntotal, 8466, 2288, 2288, 2288, 1,7, 1,4, 4,2, 6,1, 11,9, 14,4, 4,0, 0,02715, 0, 0, 0, 0, 0, 0</pre><p>We can run the program called jvisualvm<br />If we have the JDK installed this java visual vm<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg\" rel=\"attachment wp-att-319\"><img data-attachment-id=\"319\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java1/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg\" data-orig-size=\"1188,762\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java1\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=529\" class=\"alignnone wp-image-319 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=529\" alt=\"java1\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=1058 1058w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java1.jpg?w=1024 1024w\" /></a></p><p>We can see the available plugins on the Tools windows and activate some plugins like VisualVM-Glassfish, visual GC:<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg\" rel=\"attachment wp-att-320\"><img data-attachment-id=\"320\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java2/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg\" data-orig-size=\"1187,762\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java2\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=529\" class=\"alignnone wp-image-320 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=529\" alt=\"java2\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=1058 1058w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java2.jpg?w=1024 1024w\" /></a><br />We can see the Eden space:<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java3.png\" rel=\"attachment wp-att-321\"><img data-attachment-id=\"321\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java3/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java3.png\" data-orig-size=\"1602,830\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java3\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java3.png?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java3.png?w=529\" class=\"alignnone wp-image-321 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java3.png?w=529\" alt=\"java3\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java3.png?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=1058 1058w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java3.png?w=1024 1024w\" /></a><br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg\" rel=\"attachment wp-att-322\"><img data-attachment-id=\"322\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java4/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg\" data-orig-size=\"1599,817\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java4\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=529\" class=\"alignnone wp-image-322 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=529\" alt=\"java4\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=1058 1058w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java4.jpg?w=1024 1024w\" /></a><br />In the long time, Cassandra recommended the CMS collector. In java 7, G1 exist and in java 8 it is very good.so depending of the version of Cassandra which you are running, it will be CMS or G 1.<br />CMS and G1 have both old generation and the permanent generation.<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg\" rel=\"attachment wp-att-323\"><img data-attachment-id=\"323\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java5/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg\" data-orig-size=\"767,626\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java5\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=529\" class=\"alignnone wp-image-323 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=529\" alt=\"java5\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java5.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java5.jpg 767w\" /></a><br />The difference is in G1 we have several contigus chunk of memory like this:<br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg\" rel=\"attachment wp-att-324\"><img data-attachment-id=\"324\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java6/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg\" data-orig-size=\"692,201\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java6\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=529\" class=\"alignnone wp-image-324 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=529\" alt=\"java6\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java6.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java6.jpg 692w\" /></a><br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg\" rel=\"attachment wp-att-325\"><img data-attachment-id=\"325\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java7/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg\" data-orig-size=\"575,310\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java7\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=529\" class=\"alignnone wp-image-325 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=529\" alt=\"java7\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java7.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java7.jpg 575w\" /></a><br />G1 is very very well with very large heap. We generally recommend 8 GB heap for Cassandra.<br />When the old gen fill up we can have the capacity and we do a merging garbage collection, this pause time could last for second. That bring the pausis. What is the notion of pausis ?when we are doing the garbage collection in CMS or G1, some part of have to do the stop the work pause, that where they stop our program running at all and in eden and survivor space to find any unnecessary objects so we can clean them up.<br />How long this pause is last? is the function of couple of different things<br />How many objects are still in live?<br />The number of CPU available to the jvm is also the big determine. How long the garbage collection pause there<br />Additionally CMS offers one of the other thing called the heap fragmentation. Any way CMS defragment those is to do that the full stop the wall pause by the serial collector which is another garbage collection but it is single threaded. That the extreme long pause come from.<br />For G1 the only option we have is the target pause time and the minimum is 12 hundred milliseconds.<br />We have a couple of tools available:</p><h4>1. java visual vm</h4><p><strong>2. OpsCenter</strong><br /><a href=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg\" rel=\"attachment wp-att-326\"><img data-attachment-id=\"326\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java8/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg\" data-orig-size=\"598,276\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java8\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=529\" class=\"alignnone wp-image-326 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=529\" alt=\"java8\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java8.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java8.jpg 598w\" /></a></p><h4><strong>3. Jconsole and jvisualvm</strong></h4><p><a href=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg\" rel=\"attachment wp-att-327\"><img data-attachment-id=\"327\" data-permalink=\"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/java9/\" data-orig-file=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg\" data-orig-size=\"862,376\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"java9\" data-image-description=\"\" data-medium-file=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=300\" data-large-file=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=529\" class=\"alignnone wp-image-327 size-full\" src=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=529\" alt=\"java9\" srcset=\"https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=529 529w, https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=150 150w, https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=300 300w, https://dmngaya.files.wordpress.com/2016/01/java9.jpg?w=768 768w, https://dmngaya.files.wordpress.com/2016/01/java9.jpg 862w\" /></a><br /><strong>4. And the last is jstat</strong></p><pre>jstat -gccause 1607 5000 (1607 is the process id for Cassandra)</pre><h4><strong>Notes:</strong></h4><p>The most significant impact for Java virtual machine on Cassandra performance is Garbage collection.<br />The G1 collector is the preferred choice for garbage collection over CMS.<br />Metaspace does NOT exist in the new generation part of the JVM heap memory.</p><h4>JVM Tools and Tuning Strategies:</h4><p>If we have the server with 126 GB of RAM, it starts Cassandra by allocated 8 GB heap, what are it doing of the rest of the memory?<br />Page cache.<br />What is page caching? This is useful for Cassandra improving read performance. it can cache data that people are accessing frequently and get it up quicker than in the disk.</p><pre>[hduser@base ~]$ free -m&#13;\n             total       used       free     shared    buffers     cached&#13;\nMem:          5935       4271       1664         13        104       1189&#13;\n-/+ buffers/cache:       2977       2957 &#13;\nSwap:         2047          0       2047 &#13;\n</pre><p>Here we have 5935 MB of ram but 4271 MB are used, so only 1189 MB are cached.</p><h4><strong>How does Cassandra utilize page caching?</strong></h4><p>It also make a write efficient but it cannot improve a lot write.<br />How do you triage root cause for out of memory (OOM) errors? If we don’t have enough memory.<br />It can be java errors not only Cassandra.<br />All of the challenge can be to buffering data in case of writing. It tries to write data in disk</p><h4><strong>Quiz:</strong></h4><p>What is the benefit of using the page cache? Reads are more efficient, Writes are more efficient, and Repairs are more efficient<br />Memory used by the page cache will not be available to other programs until explicity freed. ? true<br />Which of the following is most likely to cause an out-of-memory error? Client-side joins<br /><strong>CPU</strong><br /><strong> CPU intensive:</strong><br />Writes (INSERT, UPDATE, DELETE), encryption, compression, Garbage collection is CPU-intensive.<br />The more CPU which you can give to garbage is the faster garbage collection will run.<br />If you want to activate the compression or the encryption, you have to monitor the CPU utilization with the tools like dstat or opscenter.<br />What to do ?<br />Add nodes<br />Use nodes that have more and faster CPUs<br />However, if you have saturation of the CPU, we have a couple of options:<br /><strong> 1. Turn off encryption, turn off compression</strong><br /><strong> 2. Add nodes</strong><br /><strong> 3. Alternatively upgrade theses nodes with more CPU</strong><br />Quiz:<br />Which of the following operations would significantly benefit from faster CPUs? Writes and Garbage collection<br />OpsCenter is a tool that can be used to monitor CPU usage ? true<br />What would be the best course of action to resolve issues with CPU saturation? Add more nodes.</p><h4><strong>Disk Tuning:</strong></h4><p>In this section we are going to talk about disk tuning and compaction.<br />Question: How do disk considerations affect performance?<br />When we operate in the database like Cassandra where we have active dataset that have a large available RAM, disk. So, we have to take these things into account.<br />SSD: spinning, or rotation, disks must move a read/write mechanical head to the portion of the disk that is being written to or read.<br />Cassandra is architecture around rotation drive with sequential write and sequential read. However, SSD is very faster. If you have a latency application, SSD is crutial.</p><p>Some of the tuning in the Cassandra.yaml file that affect disk are:<br />Configuring disks in the Cassandra.yaml file:<br />1. <strong>Disk_failure_policy:</strong> what should occur if a data disk fails? Not for performance.<br />• By default, we do stop which Cassandra can detect some kinds of corruption, shut down gossip and thrift.<br />• Best-effort -stop using failed disk and respond using remaining sstables on others disks -obsolete data can be returned if the consistency level is one<br />• Ignore -ignore fatal errors and let requests fail<br />2. <strong>Commit_failure_policy:</strong> – What should occur if a commit log disk fails?<br />• Stop -same as above<br />• Stop_commit – shutdown commit log, let writes collect but continue to serve  reads<br />• Ignore -ignore fatal errors and let batches fail<br />3. <strong>Concurrent_reads</strong> -typically set to 16 * number of drives: how many threads we can allocate for reads pool<br />4. <strong>Trickle_fsync</strong> -good to enable on SSDs but very bad for rotation drive. In SSD, it will do big flash</p><h4>Tools to diagn<strong>ose relative disks issue:</strong></h4><p>Using Linux sysstat tools to discover disk statistics:<br />System activity reporter (sar) -can get information about system buffer activity, system calls, block device, overall paging, semaphore and memory allocation, and CPU utilization.<br />Flags identify the item to check:</p><pre>sar  -d for disk &#13;\nsar  -r for memory &#13;\nsar  -S for swap space used &#13;\nsar  -b for overall I/O activities &#13;\ndstat  -a Leatherman tool for Linux -versatile replacement &#13;\nfor sysstat (vmstat, iostat, netstat, nfsstat, and ifstat) &#13;\nwith colors&#13;\nFlags identify the item to check:                      &#13;\ndstat      -d             for disk &#13;\ndstat      -m             for memory, etc&#13;\n</pre><p>Can also string flags to get multiple stats:</p><pre><strong>&#13;\ndstat    -dmnrs</strong>&#13;\n</pre><p>Step to install dstat on RHEL/CentOS 5.x/6.x and fedora 16/17/18/19/20:<br />Installing RPMForge Repository in RHEL/CentOS<br />For RHEL/CentOS 6.x 64 Bit</p><pre>sudo rpm -Uvh http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm&#13;\nsudo yum repolist&#13;\nsudo yum install dstat&#13;\n</pre><p>If we want to use linux script to monitor our statistics, we can use cron jobs like:</p><pre><strong>sudo dstat  -drsmn --output /var/log/dstat.txt 5 3 &gt;/dev/null&#13;\n</strong></pre><pre><strong>#!/bin/bash&#13;\ndstat -lrvn 10  --output /tmp/dstat.csv -CDN 30 360 mutt -a /tmp/dstat.csv  -s \"dstat report\"  me@me.com &gt;/dev/null</strong></pre><ul><li>load average (-l), disk IOPS(-r) ,vmstat(-v), and network throughput(-n)</li>\n</ul><p>Output can be displayed on webpage for monitoring<br />Output could be piped into graphical programs, like Gnumeric, Gnuplot, and Excel for visual displays.</p><ul><li>memory should be around 95% in use ( most of it in of cache memory column).</li>\n<li>CPU should be &lt;1-2% of iowait and 2-15 % system time.</li>\n<li>Network throughput should mirror whatever the application is doing.</li>\n</ul><h4><strong>Another tool is: nodetool cfhistograms to discover disk issues:</strong></h4><p>For SSD disk io disk performing, this tool can tell us something. The problem could be the JVM, the size of the RAM, there are a lot of thing it could be but disk is one of the things it could be.<br />With cfhistograms, there are two groups of bumps. It could be 1 of three things, usually we see two bumps.<br />One relative to read coming from RAM and big one coming from disk. Sometimes, it could be anything else like a lot of compaction. Compaction can cause disk contention and this contention can cause the read disk goes UP. because the compaction uses the disk by reading.<br /><strong> So, how do we deep that? How to fix it?</strong><br />We have the utility Throttle down compaction by reducing the compaction_throughput_mb_per_sec.<br />Using nodetool proxyhistograms to discover disk issues: will show the full request latency recorded by the coordinator.</p><p><strong>Using CQL TRACING</strong> : <br />To distinguish between slow disk response and slow query: slow disk response will be evident in how long it takes to access each drive.<br />If your queries need to look for SSTables on too many partitions to complete, you will see this in the trace<br />These issues will have different patterns.<br />Here we can see lot off informations like looking on the tombstoned, etc. We can know if our latency on the query is coming from the disk.and it shows us the source on which machine is experiencing those longer latencies.<br />Where is tracing information stored?<br />events table gives us lot of relative details for this particular query.<br />What role does disk readahead play in performance? We read a head a couple of blocks and that tunable, how many blocks to read a head? The problem is in Cassandra we don’t know exactly how much data we want to read a head<br />We recommend people to use readahead value of 8 for SSDs<br />Command to do that is:</p><pre>blockdev -setra 8  &#13;\n</pre><h4><strong>QUIZ:</strong></h4><p>nodetool cfhistograms shows the full request latency recorded by the coordinator.false<br />Which of the following statements is NOT true about the readahead setting?<br />Which of the following tools can be used to monitor disk statistics? iostat, dstat, sysstat, sar.</p><h4>Disk Tuning: </h4><h4>Compaction</h4><h4>How does compaction impact performance?</h4><p>Compaction is the most io intensive operation in Cassandra cluster. So, some of the choice we make around is our compaction strategies and how we throllet, some things happen on the disk, how the big impact on the disk utilization in our cluster.<br /><strong>ction Strategy DTCS </strong> for the time series data model.<br />Adjust compaction is to look at the impact of min/max SSTable thresholds.<br />Understand the connecUnderstand compaction strategies: there are 3 currently (<strong>SizeTiered Compaction Strategy STCS, DateTiered Compaction Strategy DTCS, Leveled Compaction Strategy LTCS</strong>)<br />For write and intensive workload, <strong>SizeTiered Compaction Strategy STCS</strong> can be the best.<br /><strong>Leveled Compaction Strategy</strong> is generaly recommended for read and workload only if we are using SSDs.<br /><strong>DateTiered Compa</strong>tion between number of SSTables and compaction as it affects performance<br />See what options are available for compaction to improve performance<br />How do tombstones affect compaction?<br />Compaction evicts tombstones and removes deleted data while consolidating multiple SSTables into one. More tombstones means more time spent during compaction of SSTables.<br />Once a column is marked with a tombstone, it will continue to exist until compaction permanently deletes the cell. Note that if a node is down longer than gc_grace_seconds and brought back online, it can result in replication of deleted data -zombie!<br />To prevent issues, repair must be done on every node on a regular basis.</p><h4>Best practices: nodes should be repaired every 7 days when gc_grace_seconds is 10 days (the default setting)</h4><h4><strong>How data modelling affects tombstones?</strong></h4><p>If a data model requires a lot of of deletes within a partition of data then a lot of tombstones are created. Tombstones identify stale data awaiting deletion – which data will have to be read until it is removed by compaction.<br />More effective data modelling will alleviate this issue. Ensure that your data model is more likely to delete whole partitions, rather than columns from a partition.<br />The data model has a significant impact on performance. Careful data modelling will avoid the pitfalls of rampant tombstones that affect read performance.<br />Tombstones are normal writes but will not otherwise affect write performance.<br />If you know you do a lot of delete you discover to do a long delete which will affect read performance we probably need to do something to fix it, example changing a data model or the way our workload use this data model.</p><h4>Using nodetool compactionstats to investigate issues:</h4><p>This tool can be used to discover compaction statistics while compaction occurs. Reports how much still needs compacting and the total amount of data getting compacted.<br />But by using CQL tracing to investigate issues, we can see how many nodes and partitions are accessed.<br />The number of tombstones will be shown.<br />The read access time can be observed as decreasing after a compaction is complete. It can also be seen to take longer while a compaction is in progress.<br />Why is a durable queue an anti-pattern that can cause compaction issues?<br />A lot people for another reason like to use Cassandra for durable queue because of this problem of tombstones, this is a use anti pattern. Generally what happen, only reading and delete on the same place then read performance can grow up. If you try to corrige a queue use a queue, use something like KAFKA that the perfect durable queue. Cassandra is not a tool to use as a queue.</p><h4>How do disk choices affect compactions issues?</h4><p>Compaction is the most disk IO with intensive operation of Cassandra performance.so having a good disks has a good positive affect on it. Conversely, have a slow disks can have a very detrimental effect on it. When you have compaction going very slow, you can increase SSTable for read then read performance can suffer.<br />Use nodetool cfhistograms to look at the read performance.<br /><strong> QUIZ:</strong><br />Which of the following compaction strategies should be used for read-heavy workloads, assuming certain hardware conditions are met? Leveled Compaction<br />What tool, command, or setting can be used to investigate issues with tombstones? CQL tracing<br />Compaction can potentially utilize not only a significant amount of disk I/O, but also disk space as well.true</p><h4>Disk Tuning: Easy Wins and Conclusion:</h4><p>To end this paper, we need to revisit:<br />• Performance tuning methodology<br />• Outline easy performance tuning wins<br />• Outline Cassandra or environment anti-patters</p><h4>How does this all fit together?</h4><p>1. We need to understand performance and Cassandra at the high level. That is general performance tuning techniques and some of the terminology as well as how Cassandra itself works.<br />2. Collect performance data on the following things to know where to look for data and what that data means in term of tuning or isolation problem:<br />• workload and data model<br />• cluster and nodes<br />• operating system and hardware<br />• disk and compaction strategies.<br />3. Parsing the information gathered and begin formulating a plan:<br />• Based on metrics collected, where are the bottlenecks?<br />• What tools are available to fix issues that come up?<br />4. Apply solutions to any/all areas required and test solutions:<br />• Using tools and knowledge gained, apply solutions, test solutions applied and start cycle again as needed.</p><h4>Question: What was that performance tuning methodology again?</h4><p>We have:<br /><strong>Active performance tuning – suspect there’s a problem?</strong><br />• Determine if problem is in Cassandra, environment or both.<br />• Isolate problem using tool provided.<br />• Verify problems and test for reproductibility.<br />• Fix problems using tuning strategies provided.<br />• Test, test, and test again.<br />• Verify that your “fixes” did not introduce additional problems.<br /><strong>Passive performance tuning – regular system “sanity checks”</strong><br />• Regularly monitor key health areas in Cassandra / environment using tools provided.<br />• Identify and tune for future growth/scalability.<br />• Apply tuning strategies as needed.<br />• Periodically apply the USE Method for system health check.</p><h4>Easy Cassandra performance tuning wins:</h4><p>• Increase flushwriters, if blocked: Flushing memTable to sstables, if we look in nodetool tpstats tool ,we will see flushwriters regularly getting blocked and we only have one flushwriters which is common on system which has only one disk, we can increase it to 2 that can resolve the problem.<br />• Decrease concurrent compactors: we see lot of people to set their concurrent compactors to high, we recommend at 2 watch for CPU saturation. And if saturated, we can drop it to 1 which will make compaction single thread that is by default.<br />• Increase concurrent reads and writes appropriately: Write is very affecting by CPU then read is very affecting by disk (the kind of disks, the number of disks available) so adjust concurrent reads and writes appropriately.<br />• Nudge Cassandra to leverage OS cache to read based workloads: nudge means more RAM so the more we can read in RAM, better will be the performance.<br />• In cloud environment, sometimes we need to increase phi_convict_threshold for cloud deployments or those with bad network connectivity.<br />• Increase compaction_throughput if disk I/O is available and compactions are falling behind: take the default for 15 MB/s, if we have lot of disk IO available, just increase this, then the compaction will complete quickly.<br />• Increase streaming_throughput to increase the pace of streaming:the default is 200MB/s when we bring the node online and the last repair but if we want to bring the node online faster, this is the parameter which we can increase.<br />• In terme of tuning data model: if we have a time series data modelling, we can learn a lot of by reading the link: <a href=\"http://planetcassandra.org/blog/getting-started-with-time-series-data-modeling/\" rel=\"nofollow\">http://planetcassandra.org/blog/getting-started-with-time-series-data-modeling/</a><br />• Avoid creating more than 500 tables in Cassandra: if it is empty, these tables take at least 1MB of space on heap.<br />• Keep wide rows under 100 MB or 100000 columns: we remember the in memory compaction by default has 15 MB, that the reason, it is a bad idea to have wide rows below 100 MB.<br />• Leverage wide rows instead of collections for high granularity items: sometimes, people have partition that contains lot of list, for that, it is recommended to use clustering column.<br />• Avoid data modelling hotspots by choosing a partition key that ensures read/write workload is spread across cluster: try to find the right partition key, not to large partition key not to small.<br />• Avoid tombstone build up by leveraging append only techniques.<br />You can read also these documents about tombstones:<br /><a href=\"http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html\" rel=\"nofollow\">http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html</a><br /><a href=\"http://www.jsravn.com/2015/05/13/cassandra-tombstones-collections.html\" rel=\"nofollow\">http://www.jsravn.com/2015/05/13/cassandra-tombstones-collections.html</a><br />You can view this video about tombstones:<br /><a href=\"https://www.youtube.com/watch?v=olTsTxpBFqc&amp;feature=youtu.be&amp;t=270\" rel=\"nofollow\">https://www.youtube.com/watch?v=olTsTxpBFqc&amp;feature=youtu.be&amp;t=270</a><br />• Use DESC sort to minimize impact of tombstones, if I do a descending sort by trying to get recent data.<br />You can read also this document:<br /><a href=\"http://www.sestevez.com/range-tombstones/\" rel=\"nofollow\">http://www.sestevez.com/range-tombstones/</a><br />• Use inverted indexes to help where data duplication or nesting is not appropriate.<br />• Use DataStax drivers to ensure coordinator workload is spread evenly across cluster.<br />• Use the token Aware load balancing policy: allows you to go directly to the data of the data by avoiding to go to the coordinator.<br />• Use Prepared Statements (where appropriate), if you do a query a lot of time, you will see performance gain if you use that.<br />• Put OpsCenter’s database on a dedicated cluster.<br />• Size the cluster for peak anticipated workload: for example for load balancing or bench marking.<br />• Use a 10G network between nodes to avoid network bottlenecks.<br />• On JVM hand memery size, the number one is RAM and the CPU if we want Garbage Collection to run faster, so ensure there is adequate RAM to keep active data in memory.<br />• Understand how heap allocation affects performance.<br />• Look at how the key cache affects performance in memory.<br />• Understand bloom filters and their impact in memory: we can tune it on the table by the false positive bloom filter to eliminate unnecessary disk seek.<br />• Disable swap: swap can cause problem and it will be very very difficult to reproduce, we have to disable it by the command:</p><pre>sudo swapoff –a</pre><p> • Remove all swap files on /etc/fstab by:</p><pre>sed -i 's/^\\(.*swap\\)/#\\1/'  /etc/fstab</pre><p> • Look at the impact of memtables on performance: by default, memtables take records by heap then the more memtables we have, the more flushing them on the disk, then the more disk IO, so understanding, how many we have, how many they are flushing</p><h4>What are some Cassandra/environment anti-patterns?</h4><p>1. Network attached storage. Bottlenecks include: when you put Cassandra on the SAN, it like you are putting it on the top on storage system also Cassandra has sequential pattern of reads and writes. Advice is don’t use SAN plus it will be cheaper.<br />Use Cassandra on the SAN will increase network latency to all of operations.<br />• Router latency<br />• Network Interface Card (NIC)<br />• NIC in the NAS device<br />2. Shared network file systems.<br />3. Excessive heap space size: that can cause the JVM pause time very high because running it memory takes time.<br />• Can impair the JVM’s ability to perform fluid garbage collection.<br />4. Load balancers: don’t put load balancers between applications in Cassandra because Cassandra has the load balancer built into the drivers.<br />5. Queues and queue-like datasets: don’t use Cassandra like a queue.<br />• Deletes do not remove rows/columns immediately.<br />• Can cause overhead with RAM/disk because of tombstones.<br />• Can affect read performance if data not modelled well.</p>","id":"09f92fe1-9ee7-57a9-90d5-775990c3f999","title":"Cassandra Operations and Performance Tuning","origin_url":"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/","url":"https://dmngaya.wordpress.com/2015/10/10/cassandra-performance-and-tuning/","wallabag_created_at":"2019-12-02T18:42:19+00:00","published_at":"2015-10-10T07:44:25+00:00","published_by":"['']","reading_time":53,"domain_name":"dmngaya.wordpress.com","preview_picture":"https://i0.wp.com/dmngaya.wordpress.com/wp-content/uploads/2016/07/wallpaper-nature-hd-2.jpg?fit=1200%2C675&ssl=1","tags":["cassandra","troubleshooting and tuning","performance"],"description":"In this topic, i will  cover the basics of general Apache Cassandra performance tuning: when to do performance tuning, how to avoid and identify problems, and methodologies to improve.When do you need..."}]},{"tag":"spark","articles":[{"content":"<div><div><div class=\"speechify-ignore ab cp\"><div class=\"speechify-ignore bh l\"><div class=\"hv hw hx hy hz ab\"><div><div class=\"ab ia\"><div><div class=\"bm\" aria-hidden=\"false\"><a rel=\"noopener follow\" href=\"https://medium.com/@rako?source=post_page---byline--ed55f6e67d17--------------------------------\"><div class=\"l ib ic by id ie\"><div class=\"l fj\"><img alt=\"Arunkumar\" class=\"l fd by dd de cx\" src=\"https://miro.medium.com/v2/resize:fill:88:88/1*Pgfv2m0dFhDO2zF2IUAGMQ.jpeg\" width=\"44\" height=\"44\" data-testid=\"authorPhoto\" referrerpolicy=\"no-referrer\" /></div></div></a></div></div></div></div><div class=\"bn bh l\"><div class=\"ab\"><div><div class=\"ih ab q\"><div class=\"ab q ii\"><div class=\"ab q\"><div><div class=\"bm\" aria-hidden=\"false\"><p class=\"bf b ij ik bk\"><a class=\"af ag ah ai aj ak al am an ao ap aq ar il\" data-testid=\"authorName\" rel=\"noopener follow\" href=\"https://medium.com/@rako?source=post_page---byline--ed55f6e67d17--------------------------------\">Arunkumar</a></p></div></div></div>·<p class=\"bf b ij ik du\"><a class=\"io ip ah ai aj ak al am an ao ap aq ar ex iq ir\" rel=\"noopener follow\" href=\"https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fsubscribe%2Fuser%2Fc92b98ca6070&amp;operation=register&amp;redirect=https%3A%2F%2Fmedium.com%2F%40rako%2Fspark-and-cassandras-sstable-loader-ed55f6e67d17&amp;user=Arunkumar&amp;userId=c92b98ca6070&amp;source=post_page-c92b98ca6070--byline--ed55f6e67d17---------------------post_header-----------\">Follow</a></p></div></div></div></div><div class=\"l is\"><div class=\"ab cn it iu iv\"><div class=\"ab ae\">3 min read<div class=\"iw ix l\" aria-hidden=\"true\">·</div>May 13, 2018</div></div></div></div></div><div class=\"ab cp iy iz ja jb jc jd je jf jg jh ji jj jk jl jm jn\"><div class=\"h k w fg fh q\"><div class=\"kd l\"><div class=\"ab q ke kf\"><div class=\"pw-multi-vote-icon fj kg kh ki kj\"><a class=\"af ag ah ai aj ak al am an ao ap aq ar as at\" data-testid=\"headerClapButton\" rel=\"noopener follow\" href=\"https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Fp%2Fed55f6e67d17&amp;operation=register&amp;redirect=https%3A%2F%2Fmedium.com%2F%40rako%2Fspark-and-cassandras-sstable-loader-ed55f6e67d17&amp;user=Arunkumar&amp;userId=c92b98ca6070&amp;source=---header_actions--ed55f6e67d17---------------------clap_footer-----------\"><div><div class=\"bm\" aria-hidden=\"false\"><div class=\"kk ao kl km kn ko am kp kq kr kj\"></div></div></div></a></div><div class=\"pw-multi-vote-count l ks kt ku kv kw kx ky\"><p class=\"bf b dv z du\">--</p></div></div></div><div><div class=\"bm\" aria-hidden=\"false\"></div></div><div class=\"ab q jo jp jq jr js jt ju jv jw jx jy jz ka kb kc\"><div class=\"h k\"><div><div class=\"bm\" aria-hidden=\"false\"></div></div><div class=\"fd li cn\"><div class=\"l ae\"><div class=\"ab cb\"><div class=\"lj lk ll lm ln lo ci bh\"><div class=\"ab\"><div class=\"bm bh\" aria-hidden=\"false\"><div><div class=\"bm\" aria-hidden=\"false\"></div></div></div></div></div></div></div><div class=\"bm\" aria-hidden=\"false\" aria-describedby=\"postFooterSocialMenu\" aria-labelledby=\"postFooterSocialMenu\"><div><div class=\"bm\" aria-hidden=\"false\"></div></div></div></div></div></div></div></div><p id=\"d328\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\"><em class=\"nf\">Why: We had a lot of very useful data in our Warehouse and wanted to take advantage of those data in some of our production service to enhance the user’s experience. So we choose to server them from Cassandra for all it’s pros which I’m am not going to get into in this blog.</em></p><p id=\"4acb\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\">First stage we went about writing a spark-cassandra exporter. It’s pretty simple and only a couple of line,</p><figure class=\"ng nh ni nj nk nl\"><div class=\"nm nn l fj\"></div></figure><p id=\"e794\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\">This works and took around ~ 30 mins to write ~150 Million rows. But once our services went live we saw the read latencies going a bit high during the bulk insertion time.</p><figure class=\"ng nh ni nj nk nl nq nr paragraph-image\"><div role=\"button\" tabindex=\"0\" class=\"nt nu fj nv bh nw\"><div class=\"nq nr ns\"></div></div><figcaption class=\"ny ff nz nq nr oa ob bf b bg z du\">Latencies during Cassandra row writes</figcaption></figure><p id=\"9612\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\">The spark-cassandra-connector that we are using here had a few configs that can be used to tune the writes <a class=\"af oc\" href=\"https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md#write-tuning-parameters\" rel=\"noopener ugc nofollow\" target=\"_blank\">here</a>. Tried a bunch of tuning along the line of reducing concurrent and reducing throughput_mb_per_sec. They helped a bit but still there’s a clear increase in read latency.</p><p id=\"dba6\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\">Cassandra has sstableloader and we thought of testing it for this case. And so changed the code to use and saw that there’s barely any notable read latency during this task (only a slight increase in the 99 percentile, caused by the IO waits).</p><figure class=\"ng nh ni nj nk nl nq nr paragraph-image\"><div role=\"button\" tabindex=\"0\" class=\"nt nu fj nv bh nw\"><div class=\"nq nr od\"></div></div><figcaption class=\"ny ff nz nq nr oa ob bf b bg z du\">Latencies during Cassandra SSTable loads</figcaption></figure><p id=\"896b\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\">Also if you see the networks graph, the traffic is only on “network in” as now we are generating SSTables in spark and then pushing those tables directly to cassandra. The last spike in below network graph is from SSTable method and the rest are from batched writes.</p><figure class=\"ng nh ni nj nk nl nq nr paragraph-image\"><div role=\"button\" tabindex=\"0\" class=\"nt nu fj nv bh nw\"><div class=\"nq nr oe\"></div></div><figcaption class=\"ny ff nz nq nr oa ob bf b bg z du\">Network Traffic (Row writes vs SSTable load)</figcaption></figure><p id=\"4d14\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\">Now let’s get into how to do that in code,</p><ul class=\"\"><li id=\"f241\" class=\"mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne of og oh bk\">Using CQLSSTableWriter build the SSTables per partition</li></ul><figure class=\"ng nh ni nj nk nl\"><div class=\"nm nn l fj\"></div></figure><ul class=\"\"><li id=\"49df\" class=\"mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne of og oh bk\">We need to define the create and insert statements, but it’s easy to build that from the spark dataframe</li></ul><figure class=\"ng nh ni nj nk nl\"><div class=\"nm nn l fj\"></div></figure><ul class=\"\"><li id=\"f0b8\" class=\"mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne of og oh bk\">And stream SSTable to Cassandra script. We pick a random Cassandra server and stream the SSTable to it. Host is chosen at random for a better load balancing of network traffic.</li></ul><figure class=\"ng nh ni nj nk nl\"><div class=\"nm nn l fj\"></div></figure><ul class=\"\"><li id=\"2a2e\" class=\"mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne of og oh bk\">And finally the code that run’s it all,</li></ul><figure class=\"ng nh ni nj nk nl\"><div class=\"nm nn l fj\"></div></figure><ul class=\"\"><li id=\"1fd4\" class=\"mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne of og oh bk\">As the no. of partitions Cassandra’s suggestion is several tens of megabytes large to minimize the cost of compacting, we use max of 256 MB per SSTable. “sizeInMB” can be calculated from HDFS.</li><li id=\"438e\" class=\"mh mi gu mj b mk oi mm mn mo oj mq mr ms ok mu mv mw ol my mz na om nc nd ne of og oh bk\">Let say the size is 60GB, we will have 256 SSTables of size 256MB each.</li><li id=\"3dcd\" class=\"mh mi gu mj b mk oi mm mn mo oj mq mr ms ok mu mv mw ol my mz na om nc nd ne of og oh bk\">Set this config “mapreduce.output.bulkoutputformat.streamthrottlembits” to throttle traffic to Cassandra.</li></ul><p id=\"f414\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\"><strong class=\"mj gv\">Fyi,</strong></p><ul class=\"\"><li id=\"2ba0\" class=\"mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne of og oh bk\">SSTables has to be at-least several tens of megabytes in size to minimize the cost of compacting the partitions on the server side.</li><li id=\"fb6f\" class=\"mh mi gu mj b mk oi mm mn mo oj mq mr ms ok mu mv mw ol my mz na om nc nd ne of og oh bk\">This methods increase IO wait since it’s writing directly to Disk and not memory like in Cassandra writes. Depending on the size of data and throughput, you need a SSD with high IOPS.</li></ul><p id=\"33fb\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\">We’ve been using this method in production for over 6 months now, writing around ~ 300 million rows in &lt; 30 mins without any issue to the read latencies.</p><p id=\"1564\" class=\"pw-post-body-paragraph mh mi gu mj b mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne gn bk\">Full example code can be found here, <a class=\"af oc\" href=\"https://github.com/therako/sparkles/blob/master/src/main/scala/util/cassandra/SSTableExporter.scala\" rel=\"noopener ugc nofollow\" target=\"_blank\">https://github.com/therako/sparkles/blob/master/src/main/scala/util/cassandra/SSTableExporter.scala</a></p></div></div></div></div>","id":"c019c512-0802-5940-8d36-d83e1db71e43","title":"Spark and Cassandra’s SSTable loader","origin_url":"https://medium.com/@rako/spark-and-cassandras-sstable-loader-ed55f6e67d17","url":"https://medium.com/@rako/spark-and-cassandras-sstable-loader-ed55f6e67d17","wallabag_created_at":"2024-11-01T17:13:45+00:00","published_at":"2018-06-08T01:43:43+00:00","published_by":"['Arunkumar']","reading_time":2,"domain_name":"medium.com","preview_picture":"https://miro.medium.com/v2/resize:fit:1200/1*bXczQ0OE6A9iB2X1yGOONA.png","tags":["sstable","cassandra","spark"],"description":"Arunkumar·Follow3 min read·May 13, 2018--Why: We had a lot of very useful data in our Warehouse and wanted to take advantage of those data in some of our production service to enhance the user’s exper..."},{"content":"<div id=\"js-flash-container\" class=\"flash-container\" data-turbo-replace=\"\"><div class=\"flash flash-full {{ className }}\"><p>{{ message }}</p></div>\n</div><div class=\"application-main\" data-commit-hovercards-enabled=\"\" data-discussion-hovercards-enabled=\"\" data-issue-and-pr-hovercards-enabled=\"\"><main id=\"js-repo-pjax-container\"><div id=\"repository-container-header\" class=\"pt-3 hide-full-screen c8\" data-turbo-replace=\"\"><div class=\"d-flex flex-nowrap flex-justify-end mb-3 px-3 px-lg-5 c6\"><p> / <strong itemprop=\"name\" class=\"mr-2 flex-self-stretch\"><a data-pjax=\"#repo-content-pjax-container\" data-turbo-frame=\"repo-content-turbo-frame\" href=\"https://github.com/apache/cassandra-analytics\">cassandra-analytics</a></strong> Public</p><div id=\"repository-details-container\" class=\"flex-shrink-0 c5\" data-turbo-replace=\"\"><ul class=\"pagehead-actions flex-shrink-0 d-none d-md-inline c4\"><li><a href=\"https://github.com/login?return_to=%2Fapache%2Fcassandra-analytics\" rel=\"nofollow\" id=\"repository-details-watch-button\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;notification subscription menu watch&quot;,&quot;repository_id&quot;:null,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/apache/cassandra-analytics&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"86e9bba45846e653c3bd413878df7136763f76e4a0a264ccfac44d08958e0637\" aria-label=\"You must be signed in to change notification settings\" data-view-component=\"true\" class=\"btn-sm btn\">Notifications</a> You must be signed in to change notification settings</li>\n<li><a id=\"fork-button\" href=\"https://github.com/login?return_to=%2Fapache%2Fcassandra-analytics\" rel=\"nofollow\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;repo details fork button&quot;,&quot;repository_id&quot;:637949608,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/apache/cassandra-analytics&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"0689079dea04daa808a7ad68e829b43f11cb1acd0df3e79f5e7fadcef9de09f1\" data-view-component=\"true\" class=\"btn-sm btn\">Fork 11</a></li>\n<li>\n<p><a href=\"https://github.com/login?return_to=%2Fapache%2Fcassandra-analytics\" rel=\"nofollow\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;star button&quot;,&quot;repository_id&quot;:637949608,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/apache/cassandra-analytics&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"7218263f55129275fdc9b6781844abcb43733f816c5b9be2e7f311bfd520eb73\" aria-label=\"You must be signed in to star a repository\" data-view-component=\"true\" class=\"tooltipped tooltipped-sw btn-sm btn\"> Star 15</a></p>\n</li>\n</ul></div></div><div class=\"d-block d-md-none mb-2 px-3 px-md-4 px-lg-5\" id=\"responsive-meta-container\" data-turbo-replace=\"\"><p class=\"f4 mb-3\">Apache cassandra</p><p><a title=\"https://cassandra.apache.org/\" role=\"link\" target=\"_blank\" class=\"text-bold\" rel=\"noopener noreferrer\" href=\"https://cassandra.apache.org/\">cassandra.apache.org/</a></p><h3 class=\"sr-only\">License</h3><p><a href=\"https://github.com/apache/cassandra-analytics/blob/trunk/LICENSE.txt\" class=\"Link--muted\" data-analytics-event=\"{&quot;category&quot;:&quot;Repository Overview&quot;,&quot;action&quot;:&quot;click&quot;,&quot;label&quot;:&quot;location:sidebar;file:license&quot;}\"> Apache-2.0 license</a></p><p><a class=\"Link--secondary no-underline mr-3\" href=\"https://github.com/apache/cassandra-analytics/stargazers\"> 15 stars</a> <a class=\"Link--secondary no-underline mr-3\" href=\"https://github.com/apache/cassandra-analytics/forks\"> 11 forks</a> <a class=\"Link--secondary no-underline mr-3 d-inline-block\" href=\"https://github.com/apache/cassandra-analytics/branches\"> Branches</a> <a class=\"Link--secondary no-underline d-inline-block\" href=\"https://github.com/apache/cassandra-analytics/tags\"> Tags</a> <a class=\"Link--secondary no-underline d-inline-block\" href=\"https://github.com/apache/cassandra-analytics/activity\"> Activity</a></p><div class=\"d-flex flex-wrap gap-2\"><p><a href=\"https://github.com/login?return_to=%2Fapache%2Fcassandra-analytics\" rel=\"nofollow\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;star button&quot;,&quot;repository_id&quot;:637949608,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/apache/cassandra-analytics&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"7218263f55129275fdc9b6781844abcb43733f816c5b9be2e7f311bfd520eb73\" aria-label=\"You must be signed in to star a repository\" data-view-component=\"true\" class=\"tooltipped tooltipped-sw btn-sm btn btn-block\"> Star</a></p><p><a href=\"https://github.com/login?return_to=%2Fapache%2Fcassandra-analytics\" rel=\"nofollow\" id=\"files-overview-watch-button\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;notification subscription menu watch&quot;,&quot;repository_id&quot;:null,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/apache/cassandra-analytics&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"86e9bba45846e653c3bd413878df7136763f76e4a0a264ccfac44d08958e0637\" aria-label=\"You must be signed in to change notification settings\" data-view-component=\"true\" class=\"btn-sm btn btn-block\">Notifications</a> You must be signed in to change notification settings</p></div></div></div>\n</main></div><p> You can’t perform that action at this time.</p><details class=\"details-reset details-overlay details-overlay-dark lh-default color-fg-default hx_rsm\" open=\"open\">\n\n</details>","id":"1268426a-7484-51b6-b1bf-0376998f5285","title":"GitHub - apache/cassandra-analytics: Apache cassandra","origin_url":"https://github.com/apache/cassandra-analytics","url":"https://github.com/apache/cassandra-analytics","wallabag_created_at":"2024-09-04T12:36:32+00:00","published_at":null,"published_by":"['apache']","reading_time":null,"domain_name":"github.com","preview_picture":"https://opengraph.githubassets.com/74cd461eac02c9222d5f3057f5e6023dab10987c0f9b29f0f7b59a8f6ed1cf6d/apache/cassandra-analytics","tags":["analytics","cassandra","spark"],"description":"{{ message }}\n / cassandra-analytics PublicNotifications You must be signed in to change notification settings\nFork 11\n\n Star 15\n\nApache cassandracassandra.apache.org/License Apache-2.0 license 15 sta..."},{"content":"<div><div><h2 id=\"253c\" class=\"pw-subtitle-paragraph hq gs gt be b hr hs ht hu hv hw hx hy hz ia ib ic id ie if cp dt\">Author: <a class=\"af ig\" href=\"https://fr.linkedin.com/in/clunven\" rel=\"noopener ugc nofollow\" target=\"_blank\">Cédrick Lunven</a></h2><div><div class=\"speechify-ignore ab co\"><div class=\"speechify-ignore bg l\"><div class=\"ih ii ij ik il ab\"><div><div class=\"ab im\"><a rel=\"noopener follow\" href=\"https://medium.com/@datastax?source=post_page-----6f0fc0c87e42--------------------------------\"><div><div class=\"bl\" aria-hidden=\"false\"><div class=\"l in io bx ip iq\"><div class=\"l fi\"><img alt=\"DataStax\" class=\"l fc bx dc dd cw\" src=\"https://miro.medium.com/v2/resize:fill:88:88/1*armpUA9HCxIkYBeqlKqWvA.png\" width=\"44\" height=\"44\" data-testid=\"authorPhoto\" referrerpolicy=\"no-referrer\" /></div></div></div></div></a><a href=\"https://medium.com/building-the-open-data-stack?source=post_page-----6f0fc0c87e42--------------------------------\" rel=\"noopener follow\"><div class=\"it ab fi\"><div><div class=\"bl\" aria-hidden=\"false\"><div class=\"l iu iv bx ip iw\"><div class=\"l fi\"><img alt=\"Building Real-World, Real-Time AI\" class=\"l fc bx bq ix cw\" src=\"https://miro.medium.com/v2/resize:fill:48:48/1*-Zpv5B7sIixLS1_VVKJbpA.png\" width=\"24\" height=\"24\" data-testid=\"publicationPhoto\" referrerpolicy=\"no-referrer\" /></div></div></div></div></div></a></div></div><div class=\"bm bg l\"><div class=\"ab\"><div><div class=\"iy ab q\"><div class=\"ab q iz\"><div class=\"ab q\"><div><div class=\"bl\" aria-hidden=\"false\"><p class=\"be b ja jb bj\"><a class=\"af ag ah ai aj ak al am an ao ap aq ar jc\" data-testid=\"authorName\" rel=\"noopener follow\" href=\"https://medium.com/@datastax?source=post_page-----6f0fc0c87e42--------------------------------\">DataStax</a></p></div></div></div>·<p class=\"be b ja jb dt\"><a class=\"jf jg ah ai aj ak al am an ao ap aq ar ew jh ji\" rel=\"noopener follow\" href=\"https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fsubscribe%2Fuser%2Ffc9a2aaa8a2a&amp;operation=register&amp;redirect=https%3A%2F%2Fmedium.com%2Fbuilding-the-open-data-stack%2Fbuild-an-event-driven-architecture-with-apache-kafka-apache-spark-and-apache-cassandra-6f0fc0c87e42&amp;user=DataStax&amp;userId=fc9a2aaa8a2a&amp;source=post_page-fc9a2aaa8a2a----6f0fc0c87e42---------------------post_header-----------\">Follow</a></p></div></div></div></div><div class=\"l jj\"><div class=\"ab cm jk jl jm\"><div class=\"jn jo ab\"><div class=\"be b bf z dt ab jp\">Published in<div><div class=\"l\" aria-hidden=\"false\"><a class=\"af ag ah ai aj ak al am an ao ap aq ar jc ab q\" data-testid=\"publicationName\" href=\"https://medium.com/building-the-open-data-stack?source=post_page-----6f0fc0c87e42--------------------------------\" rel=\"noopener follow\"><p class=\"be b bf z jr js jt ju jv jw jx jy bj\">Building Real-World, Real-Time AI</p></a></div></div></div><div class=\"h k\">·</div></div><div class=\"ab ae\">9 min read<div class=\"jz ka l\" aria-hidden=\"true\">·</div>May 27, 2022</div></div></div></div></div><div class=\"ab co kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq\"><div class=\"h k w ff fg q\"><div class=\"lg l\"><div class=\"ab q lh li\"><div class=\"pw-multi-vote-icon fi jq lj lk ll\"><a class=\"af ag ah ai aj ak al am an ao ap aq ar as at\" data-testid=\"headerClapButton\" rel=\"noopener follow\" href=\"https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fvote%2Fbuilding-the-open-data-stack%2F6f0fc0c87e42&amp;operation=register&amp;redirect=https%3A%2F%2Fmedium.com%2Fbuilding-the-open-data-stack%2Fbuild-an-event-driven-architecture-with-apache-kafka-apache-spark-and-apache-cassandra-6f0fc0c87e42&amp;user=DataStax&amp;userId=fc9a2aaa8a2a&amp;source=-----6f0fc0c87e42---------------------clap_footer-----------\"><div><div class=\"bl\" aria-hidden=\"false\"><div class=\"lm ao ln lo lp lq am lr ls lt ll\"></div></div></div></a></div><div class=\"pw-multi-vote-count l lu lv lw lx ly lz ma\"><p class=\"be b du z dt\">--</p></div></div></div><div><div class=\"bl\" aria-hidden=\"false\"></div></div><div class=\"ab q kr ks kt ku kv kw kx ky kz la lb lc ld le lf\"><div class=\"h k\"><div><div class=\"bl\" aria-hidden=\"false\"></div></div><div class=\"fc mj cm\"><div class=\"l ae\"><div class=\"ab ca\"><div class=\"mk ml mm mn mo mp ch bg\"><div class=\"ab\"><div class=\"bl bg\" aria-hidden=\"false\"><div><div class=\"bl\" aria-hidden=\"false\"></div></div></div></div></div></div></div><div class=\"bl\" aria-hidden=\"false\" aria-describedby=\"postFooterSocialMenu\" aria-labelledby=\"postFooterSocialMenu\"><div><div class=\"bl\" aria-hidden=\"false\"></div></div></div></div></div></div></div></div><figure class=\"nl nm nn no np nq ni nj paragraph-image\"><div role=\"button\" tabindex=\"0\" class=\"nr ns fi nt bg nu\"><div class=\"ni nj nk\"></div></div></figure><p id=\"a1b7\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\"><em class=\"os\">Knowing how to construct event-driven architectures is a crucial skill for developers as enterprises are relying on real-time data to drive business growth. In this post, we show you how to build a full event-driven toolkit with highly-scalable technologies like Apache Kafka™, Apache Spark™, and Apache Cassandra®.</em></p><p id=\"eb0a\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Event-driven architectures (EDAs) are software patterns that enable organizations to detect “events”, significant changes in a state or an update, and respond to them in real-time or near real-time. In contrast to the traditional “request/response” architecture, applications built with EDAs provide faster response times, a more seamless user experience, and better scalability without blocked thread waiting.</p><p id=\"072e\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">DataStax recently collaborated with Rahul Singh, CEO of <a class=\"af ig\" href=\"https://anant.us/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Anant Corporation</a> and creator of <a class=\"af ig\" href=\"https://cassandra.link/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Cassandra.Link</a>, a knowledge base of all things Cassandra, to produce a 3-part series on building an Event Driven Toolkit for <a class=\"af ig\" href=\"https://cassandra.apache.org/_/index.html\" rel=\"noopener ugc nofollow\" target=\"_blank\">Apache Cassandra®</a>, <a class=\"af ig\" href=\"https://spark.apache.org/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Apache Spark™</a>, and <a class=\"af ig\" href=\"https://kafka.apache.org/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Apache Kafka™.</a></p><p id=\"50a9\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">In <a class=\"af ig\" href=\"https://dtsx.io/3Lwlsbp\" rel=\"noopener ugc nofollow\" target=\"_blank\">Part 1</a>, we discussed how REST relates to event-driven systems and <a class=\"af ig\" href=\"https://github.com/Anant/cassandra.api\" rel=\"noopener ugc nofollow\" target=\"_blank\">created a REST API</a> for Cassandra. In <a class=\"af ig\" href=\"https://www.youtube.com/watch?v=j2B_1_yv3CM\" rel=\"noopener ugc nofollow\" target=\"_blank\">Part 2</a>, we explored different ways to event source information for a particular event with Kafka and connected it to Cassandra with <a class=\"af ig\" href=\"https://kafka.apache.org/documentation/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Kafka Connect</a> and <a class=\"af ig\" href=\"https://kafka.apache.org/documentation/streams/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Kafka Streams</a>. In this post, we connect Kafka to Cassandra with <a class=\"af ig\" href=\"https://spark.apache.org/streaming/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Spark Streaming</a> and process the data once it comes into Cassandra.</p><p id=\"3cd1\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Kafka and Cassandra are a dynamic-duo in microservice architectures. Kafka fits naturally as a distributed queue for event-driven landscapes and acts as a buffer layer to transport messages to the database and surrounding technologies.</p><p id=\"f007\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Cassandra scales linearly by just adding more nodes, making it an excellent persistent data storage choice for microservices applications. When combined with Spark Streaming, an equally scalable, high-throughput, and fault-tolerant streaming processing system, it creates a robust event-driven toolkit.</p><p id=\"ff4f\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">This content has been built before the integration of <a class=\"af ig\" href=\"https://pulsar.apache.org/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Apache Pulsar™</a>, another queuing system in <a class=\"af ig\" href=\"https://dtsx.io/3kmNz11\" rel=\"noopener ugc nofollow\" target=\"_blank\">DataStax Astra DB</a>, Cassandra-as-a-service platform. Everything that you’ll do in this post can be implemented in the same way with Pulsar.</p><p id=\"3ec3\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">By the end of this post, you’ll have a basic understanding of Spark Streaming in the context of Cassandra and Kafka, and be an expert at running streaming jobs and Spark batches. Ultimately, you’ll build a full event-driven data toolkit.</p><h1 id=\"95d5\" class=\"ot ou gt be ov ow ox ht oy oz pa hw pb pc pd pe pf pg ph pi pj pk pl pm pn po bj\">About Cassandra API for Leaves platform</h1><figure class=\"nl nm nn no np nq ni nj paragraph-image\"><div role=\"button\" tabindex=\"0\" class=\"nr ns fi nt bg nu\"><div class=\"ni nj pp\"></div></div><figcaption class=\"pq fe pr ni nj ps pt be b bf z dt\">Figure 1. Leaves Platform.</figcaption></figure><p id=\"ee2c\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">In the <a class=\"af ig\" href=\"https://dtsx.io/3Lwlsbp\" rel=\"noopener ugc nofollow\" target=\"_blank\">previous workshop</a>, we created a Cassandra API GitPod for Anant Corporation’s Leaves platform which is used to generate their Cassandra.link website. Leaves is a knowledge-curation platform with an admin screen, a MySQL and PHP advanced view, and a mirror of the data in Cassandra and Solr.</p><p id=\"57ad\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Anant Corporation built the Cassandra API to make this platform scalable and serverless. The front-end of this application employs JAMstack and Netlify, a web hosting and automation platform to accelerate development productivity by hosting interfaces that can both talk to APIs but also generate a full website using <a class=\"af ig\" href=\"https://www.gatsbyjs.com/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Gatsby</a>. All 1500+ pages on Casssandra.link get generated but run off an API.</p><p id=\"93e7\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">The plan for Anant Corporation is to stop managing the API themselves and migrate it to <a class=\"af ig\" href=\"https://dtsx.io/3kmNz11\" rel=\"noopener ugc nofollow\" target=\"_blank\">Astra DB</a> where they can take advantage of a wide range of APIs, such as API in GraphQL or API Ingress. Anant also wants to make the platform event driven. After data comes in event-driven and has been updated, they will process the data and analyze the Cassandra.link website to see correlations using machine learning.</p><p id=\"6bfb\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">There are currently two APIs–Cassandra.API, and a parity between Leaves API Python and Leaves API Node. Anant Corporation was one of the first users of <a class=\"af ig\" href=\"https://dtsx.io/3kmNz11\" rel=\"noopener ugc nofollow\" target=\"_blank\">Astra DB</a> and back then, they needed to create another API separate from the Cassandra API to run some custom scrapping. Watch this <a class=\"af ig\" href=\"https://dtsx.io/3Kqqm8z\" rel=\"noopener ugc nofollow\" target=\"_blank\">YouTube video</a> for a detailed breakdown of the two APIs.</p><p id=\"a495\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">If you didn’t catch the previous workshop, you can create the Cassandra API from scratch with this <a class=\"af ig\" href=\"https://github.com/Anant/cassandra.api\" rel=\"noopener ugc nofollow\" target=\"_blank\">step-by-step guide</a> or initialize the ready-made GitPod on <a class=\"af ig\" href=\"https://github.com/Anant/cassandra.realtime\" rel=\"noopener ugc nofollow\" target=\"_blank\">GitHub</a>. The first part of this series focused on using Kafka as a broker, a registry, and a REST proxy. Now, we’ll cover the stream process that transfers data from Kafka into Cassandra, and a batch process that reads from Cassandra, processes it, and puts it back into Cassandra using Spark Streaming.</p><p id=\"8af6\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Before we do, let’s understand more about how REST, microservices, and Event Driven Architectures relate to each other.</p><h1 id=\"1b9e\" class=\"ot ou gt be ov ow ox ht oy oz pa hw pb pc pd pe pf pg ph pi pj pk pl pm pn po bj\">REST vs. Microservices vs. Event Driven Architecture</h1><p id=\"8b03\" class=\"pw-post-body-paragraph nw nx gt ny b hr pu oa ob hu pv od oe of pw oh oi oj px ol om on py op oq or gm bj\">Microservices are loosely coupled services with a database for each service. This can be implemented synchronously using REST or asynchronously using AMQP protocol, Kafka, or Pulsar. Microservices run their services as an independent application, and these autonomous functions connect to REST APIs, which work to configure larger applications.</p><p id=\"4c4b\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Generally, each microservice requires its own database to avoid any resource sharing and coupling between the services. But this isn’t true on Cassandra, because Cassandra can scale to hundreds of thousands of services and servers. A keyspace or a table on Cassandra works as its own kind of microservice that scales independently.</p><p id=\"3772\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">On <a class=\"af ig\" href=\"https://dtsx.io/3kmNz11\" rel=\"noopener ugc nofollow\" target=\"_blank\">Astra DB</a> especially, you won’t have to worry about using a different database, keyspace, or table per microservice because it can scale infinitely. When you want to add new functionality, simply create another table instead of a whole database and have one cluster that contains several microservices powered by different keyspaces.</p><p id=\"4a0b\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">The<a class=\"af ig\" href=\"https://dtsx.io/3EXxqZ8\" rel=\"noopener ugc nofollow\" target=\"_blank\"> DataStax Enterprise</a> ecosystem also offers additional features, such as bringing data into Cassandra and retrieving it out as JSON or adding data using <a class=\"af ig\" href=\"https://dtsx.io/3MVUfiX\" rel=\"noopener ugc nofollow\" target=\"_blank\">DSE Graph 6.8</a> and retrieving data using Cassandra Query Language (CQL).</p><figure class=\"nl nm nn no np nq ni nj paragraph-image\"><div class=\"ni nj pz\"></div><figcaption class=\"pq fe pr ni nj ps pt be b bf z dt\">Figure 2. How each keyspace on Cassandra works as a microservice.</figcaption></figure><p id=\"8d72\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">In short, Cassandra is awesome for microservices because you can split your microservices by data center, key space, table, or query. When you execute a query against Cassandra, the load will be distributed among the nodes for you so there’s no coupling among different microservices. DataStax’s Software Engineer, <a class=\"af ig\" href=\"https://www.linkedin.com/in/jeffreyscarpenter/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Jeff Carpenter,</a> explains this in more detail <a class=\"af ig\" href=\"https://dtsx.io/3LvNdAO\" rel=\"noopener ugc nofollow\" target=\"_blank\">in this book</a>.</p><p id=\"6ad5\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Event sourcing and Command and Query Responsibility Segregation (CQRS) are software patterns that people implement with event-driven applications and microservices. CQRS is a method to scale systems so when an event or an update comes, the processor saves that data to different places–one for the event itself, and another for where the data’s going to be queried.</p><p id=\"74aa\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">For example, if you were using DataStax but you don’t have a built-in search with DSE search, and you needed to materialize that data in both Cassandra and ElasticSearch, your event processor would take the event and save it in both places.</p><p id=\"6591\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">With CQRS, updating and retrieving data are seen as two different types of requests. CQRS uses commands to update data, and queries to read data. Figure 3 illustrates the architecture of an event-driven CQRS application. The commands write updates to Kafka in the corresponding topics while Kafka Streams creates projections via aggregations or joins of the data in the topics on the query side. Using event sourcing, we can send one event, process it, and save that data into several different places.</p><figure class=\"nl nm nn no np nq ni nj paragraph-image\"><div role=\"button\" tabindex=\"0\" class=\"nr ns fi nt bg nu\"><div class=\"ni nj qa\"></div></div><figcaption class=\"pq fe pr ni nj ps pt be b bf z dt\">Figure 3. Architecture of a CQRS application. Source: <a class=\"af ig\" href=\"https://developer.ibm.com/articles/an-introduction-to-command-query-responsibility-segregation/\" rel=\"noopener ugc nofollow\" target=\"_blank\">IBM</a>.</figcaption></figure><p id=\"a13b\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Before we get into the hands-on exercises, let’s learn more about the technologies that you’ll be working with.</p><h1 id=\"44d1\" class=\"ot ou gt be ov ow ox ht oy oz pa hw pb pc pd pe pf pg ph pi pj pk pl pm pn po bj\">What is Apache Spark?</h1><figure class=\"nl nm nn no np nq ni nj paragraph-image\"><div class=\"ni nj qb\"></div><figcaption class=\"pq fe pr ni nj ps pt be b bf z dt\">Figure 4. Apache Spark ecosystem.</figcaption></figure><p id=\"c421\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Apache Spark is a unified analytics engine built on top of Apache Spark Core. it includes a collection of technologies, such as:</p><ul class=\"\"><li id=\"1fac\" class=\"nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or qc qd qe bj\"><strong class=\"ny gu\">Spark SQL:</strong> a hive-compliant language for structured data processing on Spark. Coupled with Spark Streaming, you can perform joins, and create batches with different events and queues with Spark SQL.</li><li id=\"7bb9\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or qc qd qe bj\"><strong class=\"ny gu\">Spark Streaming:</strong> a scalable, high-throughput, fault-tolerant stream processing of life data streams. Spark Streaming transforms real-time data from various sources like Kafka, Flume, and Amazon Kinesis, using complex algorithms and delivers the processed data to file systems, databases, and live dashboards. <br />There are two kinds of streaming–basic Spark Streaming and Structured Spark Streaming. Structured streaming gives you a schema so you can run Spark SQL transformations on the data coming in from the event before you send it through. You can also express your live datastream as a static table.</li><li id=\"f7d6\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or qc qd qe bj\"><strong class=\"ny gu\">Machine Learning Library (MLlib):</strong> built-in library to make practical machine learning scalable and easy. There are MLlib extensions that you can run on top of Spark.</li><li id=\"5126\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or qc qd qe bj\"><strong class=\"ny gu\">GraphX:</strong> an API for graphs and graph-parallel computation on top of Spark.</li></ul><p id=\"2a4c\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">As a unified analytics engine, Spark can talk to any datastore that you’re looking to connect your data with. In other words, you can import data to Spark, export data to different systems, and present this data in a similar dataset format.</p><p id=\"b7b5\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">When properly configured, a single Spark job can run on a computer, or hundreds of computers. Due to this scalability, Spark fits in nicely as Cassandra and Kafka’s best friend in scalable data processing. On <a class=\"af ig\" href=\"https://dtsx.io/3EXxqZ8\" rel=\"noopener ugc nofollow\" target=\"_blank\">DataStax Enterprise</a>, you can have both Kafka and Spark on the same node and scale both of them at the same time.</p><p id=\"9179\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Figure 5 illustrates a Spark cluster architecture which consists of a driver called SparkConnect, a cluster manager that allocates resources, and different worker nodes called “Executor”, “Cache”, and “Tasks”. This <a class=\"af ig\" href=\"https://dtsx.io/3LJwSJf\" rel=\"noopener ugc nofollow\" target=\"_blank\">YouTube video</a> explains the architecture in detail.</p><figure class=\"nl nm nn no np nq ni nj paragraph-image\"><div class=\"ni nj qk\"></div><figcaption class=\"pq fe pr ni nj ps pt be b bf z dt\">Figure 5. Spark cluster architecture.</figcaption></figure><p id=\"e227\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Spark is well-known for large-scale processing, machine learning, and analytics. Internet powerhouses like Uber, Netflix, eBay, and Coniva use it for the following:</p><ul class=\"\"><li id=\"e2c8\" class=\"nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or qc qd qe bj\">Streaming Extract, Transform, Load (ETL)</li><li id=\"277e\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or qc qd qe bj\">Data enrichment</li><li id=\"ef7e\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or qc qd qe bj\">Trigger event detection</li><li id=\"fbb2\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or qc qd qe bj\">Complex session analysis</li><li id=\"4be6\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or qc qd qe bj\">Machine learning</li><li id=\"963e\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or qc qd qe bj\">Fog computing</li></ul><h1 id=\"8f4b\" class=\"ot ou gt be ov ow ox ht oy oz pa hw pb pc pd pe pf pg ph pi pj pk pl pm pn po bj\">What is Astra DB?</h1><p id=\"1d91\" class=\"pw-post-body-paragraph nw nx gt ny b hr pu oa ob hu pv od oe of pw oh oi oj px ol om on py op oq or gm bj\"><a class=\"af ig\" href=\"https://dtsx.io/3kmNz11\" rel=\"noopener ugc nofollow\" target=\"_blank\">Astra DB</a> is a data platform as a service in the cloud built on the infinitely scalable Apache Cassandra with a selection of tools, such as APIs, to help you build applications on top of Cassandra. Astra eliminates operations and reduces deployment time from months to minutes as everything, from provisioning to backups, is fully automated. What’s more, you can instantly create a Cassandra database through Astra DB for free for 5 GB forever–no credit cards required.</p><p id=\"e6ae\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\"><a class=\"af ig\" href=\"https://dtsx.io/3kmNz11\" rel=\"noopener ugc nofollow\" target=\"_blank\">Astra DB</a> secures your data with the most advanced security available for Cassandra. Through auto-configured developer tools that you can deploy with a few clicks, Astra DB greatly simplifies app development.</p><p id=\"083a\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Astra can run on any cloud and spin instances on any region you like. On top of the database, there are tools like REST GraphQL, CQL Console, DataStax Studio, and Data Loader. If you want to work with Kubernetes, our K8ssandra initiative on <a class=\"af ig\" href=\"https://dtsx.io/3kmNz11\" rel=\"noopener ugc nofollow\" target=\"_blank\">Astra DB</a> is all you need to connect Cassandra with Kubernetes. With the service broker, you can tell K8ssandra to spin your Cassandra instance directly into Astra DB.</p><h1 id=\"8dc6\" class=\"ot ou gt be ov ow ox ht oy oz pa hw pb pc pd pe pf pg ph pi pj pk pl pm pn po bj\">Hands-on exercise overview</h1><p id=\"8793\" class=\"pw-post-body-paragraph nw nx gt ny b hr pu oa ob hu pv od oe of pw oh oi oj px ol om on py op oq or gm bj\">Now that you’re familiar with the technologies, let’s get started on the hands-on workshop. You’ll first read data from Kafka in a structured stream, select information, and materialize a new dataset in Cassandra. Then, you’ll run a batch job to take all the data from Cassandra, crunch it, and save it back into another table.</p><p id=\"1097\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Follow along with this <a class=\"af ig\" href=\"https://dtsx.io/37Qra9z\" rel=\"noopener ugc nofollow\" target=\"_blank\">YouTube tutorial</a>, and get codes from this <a class=\"af ig\" href=\"https://github.com/Anant/cassandra.realtime#getting-started-with-cassandra-spark-and-kafka\" rel=\"noopener ugc nofollow\" target=\"_blank\">GitHub repository</a>. You won’t need to download or run any codes on your computer; everything you need to run a Spark job is serverless. Click on the links below to get started!</p><ol class=\"\"><li id=\"be47\" class=\"nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://dtsx.io/3F6ZpG4\" rel=\"noopener ugc nofollow\" target=\"_blank\">Create a Cassandra database on Astra DB</a></li><li id=\"0d2a\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://github.com/Anant/cassandra.realtime#1-reminders-on-episode-1-setup-cassandra-api\" rel=\"noopener ugc nofollow\" target=\"_blank\">Open Cassandra.API in GitPod</a></li><li id=\"73b6\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://github.com/Anant/cassandra.realtime#2-start-and-setup-apache-kafka\" rel=\"noopener ugc nofollow\" target=\"_blank\">Start and setup Apache Kafka</a></li><li id=\"c07c\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://github.com/Anant/cassandra.realtime#3-consume-from-kafka-write-to-cassandra\" rel=\"noopener ugc nofollow\" target=\"_blank\">Consume data from Kafka and write to Cassandra</a></li><li id=\"6c37\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://github.com/Anant/cassandra.realtime#4-run-apache-spark-jobs-against-datastax-astra\" rel=\"noopener ugc nofollow\" target=\"_blank\">Run Apache Spark jobs against Astra DB</a></li></ol><h1 id=\"0d10\" class=\"ot ou gt be ov ow ox ht oy oz pa hw pb pc pd pe pf pg ph pi pj pk pl pm pn po bj\">Conclusion</h1><p id=\"a476\" class=\"pw-post-body-paragraph nw nx gt ny b hr pu oa ob hu pv od oe of pw oh oi oj px ol om on py op oq or gm bj\">In this post we delved into the basics of the scalable and fault-tolerant event-driven architecture–Kafka, Cassandra, and Spark–and how you can process heavy real-time data and analyze them in databases and live dashboards.</p><p id=\"feb1\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\">Once you’ve mastered the basics, you can try posting your data on different analytics platforms. For more Cassandra and Kafka workshops, check out <a class=\"af ig\" href=\"https://dtsx.io/3F0mRor\" rel=\"noopener ugc nofollow\" target=\"_blank\">DataStax Developers on YouTube</a>. If you have any questions about Cassandra, post them on our <a class=\"af ig\" href=\"https://dtsx.io/3vo6gaJ\" rel=\"noopener ugc nofollow\" target=\"_blank\">DataStax Community</a> — the Cassandra stack overflow.</p><p id=\"df3f\" class=\"pw-post-body-paragraph nw nx gt ny b hr nz oa ob hu oc od oe of og oh oi oj ok ol om on oo op oq or gm bj\"><em class=\"os\">Follow the </em><a class=\"af ig\" href=\"https://dtsx.io/3OLe5Pp\" rel=\"noopener ugc nofollow\" target=\"_blank\"><em class=\"os\">DataStax Tech Blog</em></a><em class=\"os\"> on Medium for more developer stories. Follow </em><a class=\"af ig\" href=\"https://dtsx.io/3OLpfn9\" rel=\"noopener ugc nofollow\" target=\"_blank\"><em class=\"os\">DataStax Developers on Twitter</em></a><em class=\"os\"> for the latest news about our developer community.</em></p><h1 id=\"098d\" class=\"ot ou gt be ov ow ox ht oy oz pa hw pb pc pd pe pf pg ph pi pj pk pl pm pn po bj\">Resources</h1><ol class=\"\"><li id=\"dd71\" class=\"nw nx gt ny b hr pu oa ob hu pv od oe of pw oh oi oj px ol om on py op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://dtsx.io/3kmNz11\" rel=\"noopener ugc nofollow\" target=\"_blank\">Astra DB: the multi-cloud database-as-a-service</a></li><li id=\"011b\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://dtsx.io/3F6ZpG4\" rel=\"noopener ugc nofollow\" target=\"_blank\">Create your Cassandra database on Astra DB</a></li><li id=\"c7df\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://cassandra.apache.org/_/index.html\" rel=\"noopener ugc nofollow\" target=\"_blank\">Apache Cassandra®</a></li><li id=\"0b5a\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://spark.apache.org/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Apache Spark™</a></li><li id=\"81dc\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://kafka.apache.org/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Apache Kafka™.</a></li><li id=\"2c11\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://pulsar.apache.org/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Apache Pulsar™</a></li><li id=\"34a2\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://anant.us/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Anant Corporation</a></li><li id=\"3cbb\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://kafka.apache.org/documentation/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Kafka Connect</a></li><li id=\"af43\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://kafka.apache.org/documentation/streams/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Kafka Streams</a></li><li id=\"ad72\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://spark.apache.org/streaming/\" rel=\"noopener ugc nofollow\" target=\"_blank\">Spark Streaming</a></li><li id=\"aa3b\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://dtsx.io/3EXxqZ8\" rel=\"noopener ugc nofollow\" target=\"_blank\">DataStax Enterprise</a></li><li id=\"4a7b\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://dtsx.io/3MVUfiX\" rel=\"noopener ugc nofollow\" target=\"_blank\">DataStax Graph Documentation</a></li><li id=\"fe7d\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://dtsx.io/3Lwlsbp\" rel=\"noopener ugc nofollow\" target=\"_blank\">YouTube Workshop Part 1: Build a REST API with Apache Cassandra</a></li><li id=\"e4de\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://www.youtube.com/watch?v=j2B_1_yv3CM\" rel=\"noopener ugc nofollow\" target=\"_blank\">YouTube Workshop Part 2: Cassandra.API CRUD UI</a></li><li id=\"1db5\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://dtsx.io/37Qra9z\" rel=\"noopener ugc nofollow\" target=\"_blank\">YouTube Workshop Part 3: Running a Spark Job on Apache Cassandra</a></li><li id=\"2a08\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://github.com/Anant/cassandra.api\" rel=\"noopener ugc nofollow\" target=\"_blank\">GitHub: Cassandra API</a></li><li id=\"39b0\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://github.com/Anant/cassandra.realtime\" rel=\"noopener ugc nofollow\" target=\"_blank\">GitHub: Cassandra in Real-Time</a></li><li id=\"7833\" class=\"nw nx gt ny b hr qf oa ob hu qg od oe of qh oh oi oj qi ol om on qj op oq or ql qd qe bj\"><a class=\"af ig\" href=\"https://dtsx.io/3LvNdAO\" rel=\"noopener ugc nofollow\" target=\"_blank\">Definitive Guide for Apache Cassandra 4.0</a></li></ol></div></div></div></div></div>","id":"522aa199-ac8c-5bc5-80d2-7f365ffa6f9a","title":"Build an Event-Driven Architecture with Apache Kafka, Apache Spark, and Apache Cassandra","origin_url":"https://medium.com/building-the-open-data-stack/build-an-event-driven-architecture-with-apache-kafka-apache-spark-and-apache-cassandra-6f0fc0c87e42","url":"https://medium.com/building-the-open-data-stack/build-an-event-driven-architecture-with-apache-kafka-apache-spark-and-apache-cassandra-6f0fc0c87e42","wallabag_created_at":"2024-08-03T13:51:08+00:00","published_at":"2022-05-27T12:03:40+00:00","published_by":"['DataStax']","reading_time":10,"domain_name":"medium.com","preview_picture":"https://miro.medium.com/v2/da:true/resize:fit:1200/0*auI9qoRPiVH04PBH","tags":["cassandra","event.driven","spark","kafka"],"description":"Author: Cédrick LunvenDataStax·FollowPublished inBuilding Real-World, Real-Time AI·9 min read·May 27, 2022--Knowing how to construct event-driven architectures is a crucial skill for developers as ent..."},{"content":"<div id=\"js-flash-container\" data-turbo-replace=\"\"><div class=\"flash flash-full {{ className }} px-2\"><p>{{ message }}</p></div>\n</div><div class=\"application-main\" data-commit-hovercards-enabled=\"\" data-discussion-hovercards-enabled=\"\" data-issue-and-pr-hovercards-enabled=\"\"><main id=\"js-repo-pjax-container\"><div id=\"repository-container-header\" class=\"pt-3 hide-full-screen c6\" data-turbo-replace=\"\"><div class=\"d-flex flex-wrap flex-justify-end mb-3 px-3 px-md-4 px-lg-5 c4\"><p> / <strong itemprop=\"name\" class=\"mr-2 flex-self-stretch\"><a data-pjax=\"#repo-content-pjax-container\" data-turbo-frame=\"repo-content-turbo-frame\" href=\"https://github.com/andreia-negreira/Data_streaming_project\">Data_streaming_project</a></strong> Public</p><div id=\"repository-details-container\" data-turbo-replace=\"\"><ul class=\"pagehead-actions flex-shrink-0 d-none d-md-inline c3\"><li><a href=\"https://github.com/login?return_to=%2Fandreia-negreira%2FData_streaming_project\" rel=\"nofollow\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;notification subscription menu watch&quot;,&quot;repository_id&quot;:null,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/andreia-negreira/Data_streaming_project&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"edb0d5e4d16437538b972039dcedc5e9a5967051e4764c39b2ccae14c2d460ce\" aria-label=\"You must be signed in to change notification settings\" data-view-component=\"true\" class=\"tooltipped tooltipped-s btn-sm btn\">Notifications</a></li>\n<li><a id=\"fork-button\" href=\"https://github.com/login?return_to=%2Fandreia-negreira%2FData_streaming_project\" rel=\"nofollow\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;repo details fork button&quot;,&quot;repository_id&quot;:717123970,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/andreia-negreira/Data_streaming_project&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"505fe6e630df31e5af72eab06bd6111b08eec8dfeae4c45fb4e4da149f9f6879\" data-view-component=\"true\" class=\"btn-sm btn\">Fork 0</a></li>\n<li>\n<p><a href=\"https://github.com/login?return_to=%2Fandreia-negreira%2FData_streaming_project\" rel=\"nofollow\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;star button&quot;,&quot;repository_id&quot;:717123970,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/andreia-negreira/Data_streaming_project&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"87799fb94a97a23da088704cf3e698b581d2e3667e9aa9fab8770fe14025e1fc\" aria-label=\"You must be signed in to star a repository\" data-view-component=\"true\" class=\"tooltipped tooltipped-s btn-sm btn BtnGroup-item\"> Star 0</a> </p>\n</li>\n</ul></div></div><div class=\"d-block d-md-none mb-2 px-3 px-md-4 px-lg-5\" id=\"responsive-meta-container\" data-turbo-replace=\"\"><p class=\"f4 mb-3\">Data streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and containerized solution to easy deployment.</p><p><a class=\"Link--secondary no-underline mr-3\" href=\"https://github.com/andreia-negreira/Data_streaming_project/stargazers\"> 0 stars</a> <a class=\"Link--secondary no-underline mr-3\" href=\"https://github.com/andreia-negreira/Data_streaming_project/forks\"> 0 forks</a> <a class=\"Link--secondary no-underline d-inline-block\" href=\"https://github.com/andreia-negreira/Data_streaming_project/activity\"> Activity</a></p><div class=\"d-flex flex-wrap gap-2\"><p><a href=\"https://github.com/login?return_to=%2Fandreia-negreira%2FData_streaming_project\" rel=\"nofollow\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;star button&quot;,&quot;repository_id&quot;:717123970,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/andreia-negreira/Data_streaming_project&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"87799fb94a97a23da088704cf3e698b581d2e3667e9aa9fab8770fe14025e1fc\" aria-label=\"You must be signed in to star a repository\" data-view-component=\"true\" class=\"tooltipped tooltipped-s btn-sm btn btn-block BtnGroup-item\"> Star</a> </p><p><a href=\"https://github.com/login?return_to=%2Fandreia-negreira%2FData_streaming_project\" rel=\"nofollow\" data-hydro-click=\"{&quot;event_type&quot;:&quot;authentication.click&quot;,&quot;payload&quot;:{&quot;location_in_page&quot;:&quot;notification subscription menu watch&quot;,&quot;repository_id&quot;:null,&quot;auth_type&quot;:&quot;LOG_IN&quot;,&quot;originating_url&quot;:&quot;https://github.com/andreia-negreira/Data_streaming_project&quot;,&quot;user_id&quot;:null}}\" data-hydro-click-hmac=\"edb0d5e4d16437538b972039dcedc5e9a5967051e4764c39b2ccae14c2d460ce\" aria-label=\"You must be signed in to change notification settings\" data-view-component=\"true\" class=\"tooltipped tooltipped-s btn-sm btn btn-block\">Notifications</a></p></div></div></div>\n</main></div><p> You can’t perform that action at this time.</p><details class=\"details-reset details-overlay details-overlay-dark lh-default color-fg-default hx_rsm\" open=\"open\">\n\n</details>","id":"1929ac10-cd1e-50bf-98ed-7526acc11fe4","title":"GitHub - andreia-negreira/Data_streaming_project: Data streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and containerized solution to easy deployment.","origin_url":"https://github.com/andreia-negreira/Data_streaming_project","url":"https://github.com/andreia-negreira/Data_streaming_project","wallabag_created_at":"2023-12-02T16:46:47+00:00","published_at":null,"published_by":"['andreia-negreira']","reading_time":null,"domain_name":"github.com","preview_picture":"https://opengraph.githubassets.com/ffade618f679782ffd42041fbd835f696d595d847bc87cdcc11ce3674094c81f/andreia-negreira/Data_streaming_project","tags":["python","cassandra","spark","airflow","kafka","postgres","docker"],"description":"{{ message }}\n / Data_streaming_project PublicNotifications\nFork 0\n\n Star 0 \n\nData streaming project with robust end-to-end pipeline, combining tools such as Airflow, Kafka, Spark, Cassandra and conta..."}]}]}},"staticQueryHashes":[]}