Approach to resolve Cassandra Coordinator node timeouts on writes

Approach to resolve Cassandra Coordinator node timeouts on writes - python

I have a simple one node Cassandra cluster with basic keyspace configuration that has replication_factor=1
In this keyspace, we have about 230 tables. Each table has roughly 40 columns. The writes we do to these tables are at roughly the rate of 30k writes in five minutes just once a day. I have about 6 python workers scripts that make these writes to any one table at a time and they will all continue making these writes till all 230 tables are written to for the day. The scripts use the python cassandra-driver with a simple session to make these writes. As far as the data being written here, a lot of them are nulls.
Effectively, if I am right, this can be thought of as 6 concurrent connection making 30k+ entries in five minutes per day.
I understand how cassandra writes and deletes work and am familiar with coordinator nodes etc. I am observing a traceback that occurs intermittently as described below:
"cassandra/cluster.py", line 2030, in cassandra.cluster.Session.execute (cassandra/cluster.c:38536)
app_nstablebuilder.1.69j772led82k#swarm-worker-gg37 | File "cassandra/cluster.py", line 3844, in cassandra.cluster.ResponseFuture.result (cassandra/cluster.c:80834)
app_nstablebuilder.1.69j772led82k#swarm-worker-gg37 | cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0}
My question has to do with how to approach solving this problem. I am unable to verify whether the problem has come out of my workers' scripts or with the Cassandra cluster itself. Should I be slowing down my workers in doing their writes? Should I run some sort of diagnostic to improve Cassandra performance?
All the solutions I have read till now have to do with multinode clusters and I couldn't find one for a single node cluster.
I feel like our cluster is unhealthy and that my efforts should be targetted in fixes there. If so, I'm unsure of where to begin. Could anyone point me in the right direction?
If there's any further information I could provide to help, do let me know.

Inserting nulls will create tombstones. Excluding the null columns from the query will not create tombstones. You can read a little bit on that matter here. I'm not sure that inserting nulls may cause this, but inserting nulls (that would create tombstones) is definitely an improvement to take into account.

Related

Why do writes through Cassandra Python Driver add records with a delay?

I'm writing 2.5 million records into Cassandra using a python program. The program finishes quickly but on querying the data, the records are reflected after a long time. The number of records gradually increase and it seems like the database is performing the writes to the tables in a queue fashion. The writes continue on till all the records are finished. Why do writes reflect late?

It is customary to provide a minimal code example plus steps to replicate the issue but you haven't provided much information.
My guess is that you've issued a lot of asynchronous writes which means that those queries get queued up because that's how asynchronous programming works. Until they eventually reach the cluster and get processed, you won't be able to immediately see the results.
In addition, you haven't provided information on how you're verifying the data so I'm going to make another guess and say you're doing a SELECT COUNT(*) which requires a full table scan in Cassandra. Given that you've issued millions of writes, chances are the nodes are overloaded and take a while to respond.
For what it's worth, if you are doing a COUNT() you might be interested in this post where I've explained why it's bad to do it in Cassandra -- https://community.datastax.com/questions/6897/. Cheers!

Why does PostgreSQL say FATAL: sorry, too many clients already when I am nowhere close to the maximum connections?

I am working with an installation of PostgreSQL 11.2 that periodically complains in its system logs
FATAL: sorry, too many clients already
despite being no-where close to its configured limit of connections. This query:
SELECT current_setting('max_connections') AS max,
COUNT(*) AS total
FROM pg_stat_activity
tells me that the database is configured for a maximum of 100 connections. I have never seen over about 45 connections into the database with this query, not even moments before a running program receives a database error saying too many clients backed by the above message in the Postgres logs.
Absolutely everything I can find on issue on the Internet this suggests that the error means you have exceeded the max_connections setting, but the database itself tells me that I am not.
For what it's worth, pyspark is the only database client that triggers this error, and only when it's writing into tables from dataframes. The regular python code using psycopg2 (that is the main client) never triggers it (not even when writing into tables in the same manner from Pandas dataframes), and admin tools like pgAdmin also never trigger it. If I didn't see the error in the database logs directly, I would think that Spark is lying to me about the error. Most of the time, if I use a query like this:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND application_name LIKE 'pgAdmin%';
then the problem goes away for several days. But like I said, I've never seen even 50% of the supposed max of 100 connections in use, according to the database itself. How do I figure out what is causing this error?

This is caused by how Spark reads/writes data using JDBC. Spark tries to open several concurrent connections to the database in order to read/write multiple partitions of data in parallel.
I couldn't find it in the docs but I think by default the number of connections is equal to the number of partitions in the datafame you want to write into db table. This explains the intermittency you've noticed.
However, you can control this number by setting numPartitions option:
The maximum number of partitions that can be used for parallelism in
table reading and writing. This also determines the maximum number of
concurrent JDBC connections. If the number of partitions to write
exceeds this limit, we decrease it to this limit by calling
coalesce(numPartitions) before writing.
Example:
spark.read.format("jdbc") \
.option("numPartitions", "20") \
# ...

Three possibilities:
The connections are very short-lived, and they were already gone by the time you looked.
You have a lower connection limit on that database.
You have a lower connection limit on the database user.
But options 2 and 3 would result in a different error message, so it must be the short-lived connections.
Whatever it is, the answer to your problem would be a well-configured connection pool.

How to store BIG DATA as global variables in Dash Python?

I have a problem with my Dash application put in a server of a remote office. Two users running the app will experience interactions with each other due to table import followed by table pricing (the code for pricing is around 10,000 lines and pull out 8 tables). While looking on the internet, I saw that to solve this problem, it was enough to create html.Div preceded by the conversation of dataframes in JSON. However, this solution is not possible because I have to store 9 tables totaling 200,000 rows and 500 columns. So, I looked into the cache solution. However, this option does not create errors but increases the execution time of the program considerably. Going from a table of 20,000 vehicles to 200,000 it increases the compute time by almost * 1,000 and it is horrible every time I change the settings of the graphics.
I use cache filesystem and i used the exemple 4 of this : https://dash.plotly.com/sharing-data-between-callbacks. By doing some time calculations, I noticed that it is not accessing the cache that is the problem (about 1sec) but converting the JSON tables to dataframe (almost 60 seconds per callback). About 60 seconds is the time also corresponding to the pricing, so it is the same to call the cache in a callback as it is to price in a callback.
1/ do you have an idea that would save a dataframe not a JSON in the form of a cache or with a technique like the invisible html.Div or a cookie system or whatever other methods ?
2/ with the Redis or Memcached, we have to provide return json?
2/ If so, how do we set it up, taking example 4 from the previous link because I have an error "redis.exceptions.ConnectionError: Error 10061 connecting to localhost: 6379. No connection could be established because l target computer expressly refused it. " ?
3/ Do you also know if turning off the application automatically deletes the cache without following the default_timeout?

I think your issue can be solved using dash_extensions and specifically server side call back caches, might be worth a shot to implement.
https://community.plotly.com/t/show-and-tell-server-side-caching/42854

Redshift + SQLAlchemy long query hangs

I'm doing something among the lines of:
conn_string = "postgresql+pg8000://%s:%s#%s:%d/%s" % (db_user, db_pass, host, port, schema)
conn = sqlalchemy.engine.create_engine(conn_string,execution_options={'autocommit':True},encoding='utf-8',isolation_level="AUTOCOMMIT")
rows = cur.execute(sql_query)
To run queries on a Redshift cluster. Lately, I've been doing maintenance tasks such as running vacuum reindex on large tables that get truncated and reloaded every day.
The problem is that that command above takes around 7 minutes for a particular table (the table is huge, 60 million rows across 15 columns) and when I run it using the method above it just never finishes and hangs. I can see in the cluster dashboard in AWS that parts of the vacuum command are being run for about 5 minutes and then it just stops. No python errors, no errors on the cluster, no nothing.
My guess is that the connection is lost during the command. So, how do I prove my theory? Anybody else with the issue? What do I change the connection string to keep it alive longer?
EDIT:
I change my connection this after the comments here:
conn = sqlalchemy.engine.create_engine(conn_string,
execution_options={'autocommit': True},
encoding='utf-8',
connect_args={"keepalives": 1, "keepalives_idle": 60,
"keepalives_interval": 60},
isolation_level="AUTOCOMMIT")
And it has been working for a while. However, it decided to start with the same behaviour for even larger tables in which the vacuum reindex actually takes around 45 minutes (at least that is my estimate, the command never finishes running in Python).
How can I make this work regardless of the query runtime?

It's most likely not a connection drop issue. To confirm this , try pushing a few million rows into a dummy table (something which takes more than 5 minutes) and see if the statement fails. Once a query has been submitted to redshift , regardless of your connection string shutting the query executes in the background.
Now, coming to the problem itself - my guess is that you are running out of memory or disk space, can you please be more elaborate and list out your redshift setup (How many nodes of dc1/ds2) ? Also, try running some admin queries and see how much space you have left on the disk. Sometimes when the cluster is loaded to the brim a disk full error is thrown but in your case since the connection might be dropped much before the error is thrown to your python shell.

Cassandra assynchronous execution in multiple processes blocking synchronous requests

I have an application that reads a series of XML files containing logs of vehicles passages in a road. The application then processes each record, transform a few of the informations to match the database columns and inserts it into a cassandra database (running a single node in a remote server [it's in an internal network so connection isn't really an issue]). After inserting data in the database, the process for each file then goes on to read this data and produce information for summary tables, that leaves information ready for a drilldown analysis made in an unrelated part of the application.
I'm using multiprocessing to process many XML files in parallel, and the trouble I'm having is with communicating to the cassandra server. Schematically, the process goes as follows:
Read record from XML file
Process record's data
insert processed data into the database (using .execute_async(query))
repeat 1 to 3 until the XMl file is over
Wait for the responses of all the insert queries I made
Read data from the database
Process the read data
Insert the processed data in summary tables
Now, this is running smoothly in multiple parallel processes, until, when one process goes on to step 6, its request (that's made using .execute(query), meaning I'll wait for the response) is always facing a timeout. The error I receive is:
Process ProcessoImportacaoPNCT-1:
Traceback (most recent call last):
File "C:\Users\Lucas\Miniconda\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Users\Lucas\PycharmProjects\novo_importador\app\core\ImportacaoArquivosPNCT.py", line 231, in run
core.CalculoIndicadoresPNCT.processa_equipamento(sessao_cassandra, equipamento, data, sentido, faixa)
File "C:\Users\Lucas\PycharmProjects\novo_importador\app\core\CalculoIndicadoresPNCT.py", line 336, in processa_equipamento
desvio_medias(sessao_cassandra, equipamento, data_referencia, sentido, faixa)
File "C:\Users\Lucas\PycharmProjects\novo_importador\app\core\CalculoIndicadoresPNCT.py", line 206, in desvio_medias
veiculos = sessao_cassandra.execute(sql_pronto)
File "C:\Users\Lucas\Miniconda\lib\site-packages\cassandra\cluster.py", line 1594, in execute
result = future.result(timeout)
File "C:\Users\Lucas\Miniconda\lib\site-packages\cassandra\cluster.py", line 3296, in result
raise self._final_exception
ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
I have changed the timeout in the server to absurd amounts of time (500000000 ms for instance), and I have also attempted setting the timeout limit in the client, with .execute(query, timeout=3000) but still, no success.
Now, when more processes hit the same problem and the intense writing from steps 1-3 in multiple processes stops, the last processes to get to step 6 have success in following the procedure, which makes me think the problem is that cassandra is giving priority to the tens of thousands of insert requests I'm asking per second and either ignoring my read request or putting it way back in the line.
A way to solve this, in my opinion, would be if in any way I could ask cassandra to give priority to my read request so that I can keep processing, even if that means slowing down the other processes.
Now, as a side note, you might think my process modelling is not optimal, and I'd love to hear opinions on that, but for the reality of this application this is, in our vision, the best way to proceed. So we have actually thought extensively about optimising the process, but (if the cassandra server can handle it) this is optimal for our reality.
So, TL;DR: Is there a way of giving priority to a query when executing tens of thousands of assynchronous queries? If not, is there a way of executing tens of thousands of insert queries and read queries per second in a way that the requests don't timeout? additionally, what would you suggest I do to solve the problem? run less processes in parallel is obviously a solution but one I'm trying to avoid. So, Would love to hear everyone's thoughts.
Storing the data while inserting so I don't need to read it again for summary is not a possibility because the XML files are huge and memory is an issue.

I don't know of a way to give priority to read queries. I believe internally Cassandra has separate thread pools for read and write operations, so those are running in parallel. Without seeing the schema and queries you're doing, it's hard to say if you are doing a very expensive read operation or if the system is just so swamped with writes that it can't keep up with the reads.
You might want to try monitoring what's going on in Cassandra as your application is running. There are several tools you can use to monitor what's going on. For example, if you ssh to your Cassandra node and run:
watch -n 1 nodetool tpstats
This will show you the thread pool stats (updated once per second). You'll be able to see if the queues are filling up or operations are getting blocked. If any of the "Dropped" counters increase, that's a sign you don't have enough capacity for what you're trying to do. If that's the case, then add capacity by adding more nodes, or change your schema and approach so that the node has less work to do.
Other useful things to monitor (on linux use watch -n 1 to monitor continuously):
nodetool compactionstats
nodetool netstats
nodetool cfstats <keyspace.table name>
nodetool cfhistograms <keyspace> <table name>
It also good to monitor the node with linux commands like top and iostat to check the CPU utilization and disk utilization.
My impression from what you say is that your single node doesn't have enough capacity to do all the work you're giving it, so either you need to process less data per unit of time, or add more Cassandra nodes to spread out the workload.
I'm currently facing my own timeout error due to partitions having too many rows, so I may have to add cardinality to my partition key to make the contents of each partition smaller.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.