Redis server responding intermittently to Python client - python

Using Redis with our Python WSGI application, we are noticing that at certain intervals, Redis stops responding to requests. After some time, however, we are able to fetch the values stored in it again.
After seeing this situation and checking the status of the Redis service, it is still online.
If it is of any help, we are using the redis Python package and using StrictRedis as a connection class, with a default ConnectionPool. Any ideas on this will be greatly appreciated. If there is more information which will help better diagnose the problem, let me know and I'll update ASAP.
Thanks very much!

More data about your redis setup and the size of the data set would be useful. That said I would venture to guess your Eedis server is configured to persist data to disk (the default). If so you could be seeing your Redis node getting a bit overwhelmed whenit forks off a copy of itself to save the data set to disk.
If this is the case and you do need to persist to disk then I would recommend a second instance be stood up and configured to be a shave to the first one and persist to disk. The master you would then configure to not persist to disk. In this configuration you should see the writable node be fully responsive at all times. You could even go so far as to setup a Non-persisting slave for read-only access.
But without more details on your configuration, resources, and usage it is merely an educated guess.

Related

What exactly happens on the computer when multiple requests came to the webserver serving django or pyramid application?

I am having a hard time trying to figure out the big picture of the handling of multiple requests by the uwsgi server with django or pyramid application.
My understanding at the moment is this:
When multiple http requests are sent to uwsgi server concurrently, the server creates a separate processes or threads (copies of itself) for every request (or assigns to them the request) and every process/thread loads the webapplication's code (say django or pyramid) into computers memory and executes it and returns the response. In between every copy of the code can access the session, cache or database. There is a separate database server usually and it can also handle concurrent requests to the database.
So here some questions I am fighting with.
Is my above understanding correct or not?
Are the copies of code interact with each other somehow or are they wholly separated from each other?
What about the session or cache? Are they shared between them or are they local to each copy?
How are they created: by the webserver or by copies of python code?
How are responses returned to the requesters: by each process concurrently or are they put to some kind of queue and sent synchroniously?
I have googled these questions and have found very interesting answers on StackOverflow but anyway can't get the whole picture and the whole process remains a mystery for me. It would be fantastic if someone can explain the whole picture in terms of django or pyramid with uwsgi or whatever webserver.
Sorry for asking kind of dumb questions, but they really torment me every night and I am looking forward to your help:)
There's no magic in pyramid or django that gets you past process boundaries. The answers depend entirely on the particular server you've selected and the settings you've selected. For example, uwsgi has the ability to run multiple threads and multiple processes. If uwsig spins up multiple processes then they will each have their own copies of data which are not shared unless you took the time to create some IPC (this is why you should keep state in a third party like a database instead of in-memory objects which are not shared across processes). Each process initializes a WSGI object (let's call it app) which the server calls via body_iter = app(environ, start_response). This app object is shared across all of the threads in the process and is invoked concurrently, thus it needs to be threadsafe (usually the structures the app uses are either threadlocal or readonly to deal with this, for example a connection pool to the database).
In general the answers to your questions are that things happen concurrently, and objects may or may not be shared based on your server model but in general you should take anything that you want to be shared and store it somewhere that can handle concurrency properly (a database).
The power and weakness of webservers is that they are in principle stateless. This enables them to be massively parallel. So indeed for each page request a different thread may be spawned. Wether or not this indeed happens depends on the configuration settings of you webserver. There's also a cost to spawning many threads, so if possible threads are reused from a thread pool.
Almost all serious webservers have page cache. So if the same page is requested multiple times, it can be retrieved from cache. In addition, browsers do their own caching. A webserver has to be clever about what to cache and what not. Static pages aren't a big problem, although they may be replaced, in which case it is quite confusing to still get the old page served because of the cache.
One way to defeat the cache is by adding (dummy) parameters to the page request.
The statelessness of the web was initialy welcomed as a necessity to achieve scalability, where webpages of busy sites even could be served concurrently from different servers at nearby or remote locations.
However the trend is to have stateful apps. State can be maintained on the browser, easing the burden on the server. If it's maintained on the server it requires the server to know 'who's talking'. One way to do this is saving and recognizing cookies (small identifiable bits of data) on the client.
For databases the story is a bit different. As soon as anything gets stored that relates to a particular user, the application is in principle stateful. While there's no conceptual difference between retaining state on disk and in RAM memory, traditionally statefulness was left to the database, which in turned used thread pools and load balancing to do its job efficiently.
With the advent of very large internet shops like amazon and google, mandatory disk access to achieve statefulness created a performance problem. The answer were in-memory databases. While they may be accessed traditionally using e.g. SQL, they offer much more flexibility in the way data is stored conceptually.
A type of database that enjoys growing popularity is persistent object store. With this database, while the distinction still can be made formally, the boundary between webserver and database is blurred. Both have their data in RAM (but can swap to disk if needed), both work with objects rather than flat records as in SQL tables. These objects can be interconnected in complex ways.
In short there's an explosion of innovative storage / thread pooling / caching/ persistence / redundance / synchronisation technology, driving what has become popularly know as 'the cloud'.

Alternative to Amazon SQS for Python

I have a collection of EC2 instances whit a process installed in them that makes use of the same SQS Queue, name it my_queue. This queue is extremely active, writing more than 250 messages a minute and deleting those 250 messages consecutively. The problem I've encounter at this point is that it is starting to be slow, thus, my system is not working properly, some processeses hang because SQS closes the connection and also writing to remotes machines.
The big advantage I have for using SQS is that 1) it's very easy to use, no need to install or configure local files, 2) it's a reliable tool, since I only need a key and key_secret in order to start pushing and pulling messages.
My questions are:
What alternatives exist to SQS, I know of Redis, RabbittMQ, but both need local deployment, configuration and that might lead to unreliable functionality, if for example the box that is running it suddenly crashes and other boxes are not able to write messages to the queue.
If I choose something like Redis, to be deployed in my in my box, is it worth it over SQS, or I should just stay with SQS and look for another solution ?
Thanks
You may have solved this by now since the question is old; I had the same issues a while back and the costs ($$) of polling SQS in my application were sizable. I migrated it to REDIS successfully without much effort. I eventually migrated the REDIS host to ElasticCache's REDIS and have been very pleased with the solution. I am able to snapshot it and scale it as needed. Hope that helps.

master slave postgresql with logging and monitoring for a django application

I have a django application running.
The database backend that i use for it is PostGreSql.
Everything is working fine for me.
Now I want to create a master slave replication for my database, such that:
Whatever change happens on master, is replicated on slave.
If the master shuts down, the slave takes charge, and an error notification is sent.
Backup is created automatically of the database.
Logging is taken care of.
Monitoring is taken care of.
I went through https://docs.djangoproject.com/en/dev/topics/db/multi-db/ the entire article.
But I don't have much idea, how to implement the all 5 steps above. As you would have understood, I don't have much experience, hence please suggest pointers around, how to proceed. Thanks.
Have I missed, anything which should be kept in mind for database purpose??
It sounds like you want a dual-node HA setup for PostgreSQL, using synchronous streaming replication and failover.
Check out http://repmgr.org/ for one tool that'll help with this, particularly when coupled with a PgBouncer front-end. You may also want to read about "heartbeat", "high availability", "fencing" and "STONITH".
You need to cope with the master continuing to run but failing, not just it shutting down. Consider what happens if the master runs out of disk space; all write queries will return errors, but it won't shut down or crash.
This is really an issue of database administration / server management.

Python: file-based thread-safe queue

I am creating an application (app A) in Python that listens on a port, receives NetFlow records, encapsulates them and securely sends them to another application (app B). App A also checks if the record was successfully sent. If not, it has to be saved. App A waits few seconds and then tries to send it again etc. This is the important part. If the sending was unsuccessful, records must be stored, but meanwhile many more records can arrive and they need to be stored too. The ideal way to do that is a queue. However I need this queue to be in file (on the disk). I found for example this code http://code.activestate.com/recipes/576642/ but it "On open, loads full file into memory" and that's exactly what I want to avoid. I must assume that this file with records will have up to couple of GBs.
So my question is, what would you recommend to store these records in? It needs to handle a lot of data, on the other hand it would be great if it wasn't too slow because during normal activity only one record is saved at a time and it's read and removed immediately. So the basic state is an empty queue. And it should be thread safe.
Should I use a database (dbm, sqlite3..) or something like pickle, shelve or something else?
I am a little consfused in this... thank you.
You can use Redis as a database for this. It is very very fast, does queuing amazingly well, and it can save its state to disk in a few manners, depending on the fault tolerance level you want. being an external process, you might not need to have it use a very strict saving policy, since if your program crashes, everything is saved externally.
see here http://redis.io/documentation , and if you want more detailed info on how to do this in redis, I'd be glad to elaborate.

A question about sites downtime updates

I've a server when I run a Django application but I've a little problem:
when with mercurial I commit and pushing new changes on the server, there's a micro time (like 1 microsec) where the home page is unreachable.
I have apache on the server.
How can I solve this?
You could run multiple instances of the django app (either on the same machine with different ports or on different machines) and use apache to reverse proxy requests to each instance. It can failover to instance B whilst instance A is rebooting. See mod_proxy.
If the downtime is as short as you say though, it is unlikly to be an issue worth worrying about.
Also note that there are likely to be better (and easier) proxies than Apache. Nginx is popular, as is HAProxy.
If you have any significant traffic in time that is measured in microsecond it's probably best to push new changes to your web servers one at a time, and remove the machine from load balancer rotation for the moment you're doing the upgrade there.
When using apachectl graceful, you minimize the time the website is unavailable when 'restarting' Apache. All children are 'kindly' requested to restart and get their new configuration when they're not doing anything.
The USR1 or graceful signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.
At a heavy-traffic website, you will notice some performance loss, as some children will temporarily not accept new connections. It's my experience, however, that TCP recovers perfectly from this.
Considering that some websites take several minutes or hours to update, that is completely acceptable. If it is a really big issue, you could use a proxy, running multiple instances and updating them one at a time, or update at an off-peak moment.
If you're at the point of complaining about a 1/1,000,000th of a second outage, then I suggest the following approach:
Front end load balancers pointing to multiple backend servers.
Remove one backend server from the loadbalancer to ensure no traffic will go to it.
Wait for all traffic that the server was processing has been sent.
Shutdown the webserver on that instance.
Update the django instance on that machine.
Add that instance back to the load balancers.
Repeat for every other server.
This will ensure that the 1/1,000,000th of a second gap is removed.
i think it's normal, since django may be needing to restart its server after your update

Categories

Resources