I'm looking for a way to check if a redis instance (on a local machine with default port) is running or not. If not, I want to start it from my python code.
If you start a redis client you can first try to ping -- if you get a redis.exceptions.ConnectionError then the service is probably not running. Below is an example of such a function. There are other ways to get a similar or more robust result -- this one is just an easy approach. Also note that this doesn't tell you if a particular key is setup or anything about the redis setup. It only tells you if there is a live redis server on localhost or not.
def redisLocalhostLive():
redtest = redis.StrictRedis() # non-default ports could go here
try:
return redtest.ping()
except ConnectionError:
return False
gah, Pyrce beat me to a similar answer. posting anyway:
import redis
server = redis.Redis()
try:
server.ping()
except redis.exceptions.ConnectionError:
# your redis start command here
One approach is to use psutil, which is a Python module that provides a cross-platform way to retrieve info on running processes.
>>> import psutil
>>> processes = psutil.process_iter() # Get all running processes
>>> if any(process.name == 'redis-server' for process in processes):
... print "redis is running"
...
redis is running
Related
I want to provide shared state for a Flask app which runs with multiple workers, i. e. multiple processes.
To quote this answer from a similar question on this topic:
You can't use global variables to hold this sort of data. [...] Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs.
(Source: Are global variables thread safe in flask? How do I share data between requests?)
My question is on that last part regarding suggestions on how to provide the data "outside" of Flask. Currently, my web app is really small and I'd like to avoid requirements or dependencies on other programs. What options do I have if I don't want to run Redis or anything else in the background but provide everything with the Python code of the web app?
If your webserver's worker type is compatible with the multiprocessing module, you can use multiprocessing.managers.BaseManager to provide a shared state for Python objects. A simple wrapper could look like this:
from multiprocessing import Lock
from multiprocessing.managers import AcquirerProxy, BaseManager, DictProxy
def get_shared_state(host, port, key):
shared_dict = {}
shared_lock = Lock()
manager = BaseManager((host, port), key)
manager.register("get_dict", lambda: shared_dict, DictProxy)
manager.register("get_lock", lambda: shared_lock, AcquirerProxy)
try:
manager.get_server()
manager.start()
except OSError: # Address already in use
manager.connect()
return manager.get_dict(), manager.get_lock()
You can assign your data to the shared_dict to make it accessible across processes:
HOST = "127.0.0.1"
PORT = 35791
KEY = b"secret"
shared_dict, shared_lock = get_shared_state(HOST, PORT, KEY)
shared_dict["number"] = 0
shared_dict["text"] = "Hello World"
shared_dict["array"] = numpy.array([1, 2, 3])
However, you should be aware of the following circumstances:
Use shared_lock to protect against race conditions when overwriting values in shared_dict. (See Flask example below.)
There is no data persistence. If you restart the app, or if the main (the first) BaseManager process dies, the shared state is gone.
With this simple implementation of BaseManager, you cannot directly edit nested values in shared_dict. For example, shared_dict["array"][1] = 0 has no effect. You will have to edit a copy and then reassign it to the dictionary key.
Flask example:
The following Flask app uses a global variable to store a counter number:
from flask import Flask
app = Flask(__name__)
number = 0
#app.route("/")
def counter():
global number
number += 1
return str(number)
This works when using only 1 worker gunicorn -w 1 server:app. When using multiple workers gunicorn -w 4 server:app it becomes apparent that number is not a shared state but individual for each worker process.
Instead, with shared_dict, the app looks like this:
from flask import Flask
app = Flask(__name__)
HOST = "127.0.0.1"
PORT = 35791
KEY = b"secret"
shared_dict, shared_lock = get_shared_state(HOST, PORT, KEY)
shared_dict["number"] = 0
#app.route("/")
def counter():
with shared_lock:
shared_dict["number"] += 1
return str(shared_dict["number"])
This works with any number of workers, like gunicorn -w 4 server:app.
your example is a bit magic for me! I'd suggest reusing the magic already in the multiprocessing codebase in the form of a Namespace. I've attempted to make the following code compatible with spawn servers (i.e. MS Windows) but I only have access to Linux machines, so can't test there
start by pulling in dependencies and defining our custom Manager and registering a method to get out a Namespace singleton:
from multiprocessing.managers import BaseManager, Namespace, NamespaceProxy
class SharedState(BaseManager):
_shared_state = Namespace(number=0)
#classmethod
def _get_shared_state(cls):
return cls._shared_state
SharedState.register('state', SharedState._get_shared_state, NamespaceProxy)
this might need to be more complicated if creating the initial state is expensive and hence should only be done when it's needed. note that the OPs version of initialising state during process startup will cause everything to reset if gunicorn starts a new worker process later, e.g. after killing one due to a timeout
next I define a function to get access to this shared state, similar to how the OP does it:
def shared_state(address, authkey):
manager = SharedState(address, authkey)
try:
manager.get_server() # raises if another server started
manager.start()
except OSError:
manager.connect()
return manager.state()
though I'm not sure if I'd recommend doing things like this. when gunicorn starts it spawns lots of processes that all race to run this code and it wouldn't surprise me if this could go wrong sometimes. also if it happens to kill off the server process (because of e.g. a timeout) every other process will start to fail
that said, if we wanted to use this we would do something like:
ss = shared_state('server.sock', b'noauth')
ss.number += 1
this uses Unix domain sockets (passing a string rather than a tuple as an address) to lock this down a bit more.
also note this has the same race conditions as the OP's code: incrementing a number will cause the value to be transferred to the worker's process, which is then incremented, and sent back to the server. I'm not sure what the _lock is supposed to be protecting, but I don't think it'll do much
Redis is very easy to use in python. However, now I have a problem in using Redis transaction. First, I have to get a key in Redis, next I have to check whether the value bound to this key is legal. I hope those operations to be atomic. Here is my code.
pipe = redis_conn.pipeline()
pipe.multi()
var = pipe.get('key_want_to_be_read')
if is_legal(val):
do_something
else:
do_another_thing
pipe.execute()
However, when I run these code, python name var is not bound to a value stored in redis, but a Pipeline<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>, So. Is there any way to get a key and bound it to a python name in redis transaction?
You can register a lua script like this:
import redis
r = redis.StrictRedis(host='localhost', port=6379, db=0)
redis_script = r.register_script("""
local valueToTest = redis.call('GET','{key}')
--Test key in lua
""".format(key=key_to_be_read))
And then call it with redis_script()
From the Redis site this is atomic:
Redis uses the same Lua interpreter to run all the commands. Also
Redis guarantees that a script is executed in an atomic way: no other
script or Redis command will be executed while a script is being
executed.
I'm trying to create a discovery script, which will use multithreading to ping multiple IP addresses at once.
import ipaddress
import sh
from threading import Thread
from Queue import Queue
user_input = raw_input("")
network = ipaddress.ip_network(unicode(user_input))
def pingit(x):
for i in x.hosts():
try:
sh.ping(i, "-c 1")
print i, "is active"
except sh.ErrorReturnCode_1:
print "no response from", i
queue_to_work = Queue(maxsize=0)
number_of_workers = 30
for i in range(number_of_workers):
workers = Thread(target=pingit(network),args=(queue_to_work,))
workers.getDaemon(True)
workers.start()
When I run this script, I get the ping responses, but it is not fast. I believe the multithreading is not kicking in.
Could someone please tell me where I'm going wrong?
Many thanks.
You are doing it completely wrong.
import ipaddress
import sh
from threading import Thread
user_input = raw_input("")
network = ipaddress.ip_network(unicode(user_input))
def pingit(x):
for i in x.hosts():
try:
sh.ping(i, "-c 1")
print i, "is active"
except sh.ErrorReturnCode_1:
print "no response from", i
workers = Thread(target=pingit,args=(network,))
workers.start()
This is how you start a thread. Writing pingit(network) will actually run the function, and pass its result into Thread, while you want to pass the function itself. You should pass function pingit and the argument network separately. Note this creates a thread that practically run pingit(network).
Now, if you want to use multiple threads, you can do so in a loop. But you also have to create separate sets of data to feed into the threads. Consider you have a list of hosts, e.g. ['A', 'B', 'C', 'D'], and you want to ping them from two threads. You have to create two threads, that call pingit(['A', 'B']) and pingit(['C', 'D']).
As a side note, don't use ip_network to find the ip addresses, use ip_address. You can ping an ip address, but not a network. Of course if you want to ping all ip addresses in the network, ip_network is fine.
You may want to somehow split the user input into multiple ip addresses and separate the list into sublists for your threads. This is pretty easy. Now you can write a for to create threads, and feed each sublist into arguments of the thread. This creates threads that actually run with different parameters.
I would like to share my thoughts on this.
Since I guess this is something you would like to run in the background, I would suggest you use a Queue instead of a Thread.
This will offer you multiple advantages:
You can add multiple functionalities into the queue
If something happens, the queue will just continue, and catch the error for you. You can even add some logging to it in case something goes wrong.
The queue runs as a daemon, with every item in the queue as it's own process
Systems like RabbitMQ or Redis are build for this specific kind of task.
It is relatively easy to setup
I have created a simple script for you that you might be able to use:
import subprocess
from celery import Celery
app = Celery()
#app.task
def check_host(ip, port=80, timeout=1):
output = subprocess.Popen(["ping", "-c", "1", ip], stdout=subprocess.PIPE).communicate()[0]
if "1 packets received" in output.decode("utf-8"):
return "{ip} connected successfully".format_map(vars())
else:
return "{ip} was unable to connect".format_map(vars())
def pingit(ip="8.8.8.8"):
check_host.delay(ip)
What this does is the following.
You first import Celery, this will make you able to connect to Celery that runs in the background.
You create an app which is in instance of the Celery class
You use this app to create a task. Inside this you put task all the actions you want to perform async.
You call the delay() method on the task
Now task will run on the background, and all other tasks will be put in the queue to run async for you.
So you can just put everything in a loop, and the Queue will handle it for you.
The information about Celery: http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
And a great tutorial to get everything setup I found on YouTube: https://www.youtube.com/watch?v=68QWZU_gCDA
I hope this can help you a bit further
I have a site that runs with follow configuration:
Django + mod-wsgi + apache
In one of user's request, I send another HTTP request to another service, and solve this by httplib library of python.
But sometimes this service don't get answer too long, and timeout for httplib doesn't work. So I creating thread, in this thread I send request to service, and join it after 20 sec (20 sec - is a timeout of request). This is how it works:
class HttpGetTimeOut(threading.Thread):
def __init__(self,**kwargs):
self.config = kwargs
self.resp_data = None
self.exception = None
super(HttpGetTimeOut,self).__init__()
def run(self):
h = httplib.HTTPSConnection(self.config['server'])
h.connect()
sended_data = self.config['sended_data']
h.putrequest("POST", self.config['path'])
h.putheader("Content-Length", str(len(sended_data)))
h.putheader("Content-Type", 'text/xml; charset="utf-8"')
if 'base_auth' in self.config:
base64string = base64.encodestring('%s:%s' % self.config['base_auth'])[:-1]
h.putheader("Authorization", "Basic %s" % base64string)
h.endheaders()
try:
h.send(sended_data)
self.resp_data = h.getresponse()
except httplib.HTTPException,e:
self.exception = e
except Exception,e:
self.exception = e
something like this...
And use it by this function:
getting = HttpGetTimeOut(**req_config)
getting.start()
getting.join(COOPERATION_TIMEOUT)
if getting.isAlive(): #maybe need some block
getting._Thread__stop()
raise ValueError('Timeout')
else:
if getting.resp_data:
r = getting.resp_data
else:
if getting.exception:
raise ValueError('REquest Exception')
else:
raise ValueError('Undefined exception')
And all works fine, but sometime I start catching this exception:
error: can't start new thread
at the line of starting new thread:
getting.start()
and the next and the final line of traceback is
File "/usr/lib/python2.5/threading.py", line 440, in start
_start_new_thread(self.__bootstrap, ())
And the answer is: What's happen?
Thank's for all, and sorry for my pure English. :)
The "can't start new thread" error almost certainly due to the fact that you have already have too many threads running within your python process, and due to a resource limit of some kind the request to create a new thread is refused.
You should probably look at the number of threads you're creating; the maximum number you will be able to create will be determined by your environment, but it should be in the order of hundreds at least.
It would probably be a good idea to re-think your architecture here; seeing as this is running asynchronously anyhow, perhaps you could use a pool of threads to fetch resources from another site instead of always starting up a thread for every request.
Another improvement to consider is your use of Thread.join and Thread.stop; this would probably be better accomplished by providing a timeout value to the constructor of HTTPSConnection.
You are starting more threads than can be handled by your system. There is a limit to the number of threads that can be active for one process.
Your application is starting threads faster than the threads are running to completion. If you need to start many threads you need to do it in a more controlled manner I would suggest using a thread pool.
I was running on a similar situation, but my process needed a lot of threads running to take care of a lot of connections.
I counted the number of threads with the command:
ps -fLu user | wc -l
It displayed 4098.
I switched to the user and looked to system limits:
sudo -u myuser -s /bin/bash
ulimit -u
Got 4096 as response.
So, I edited /etc/security/limits.d/30-myuser.conf and added the lines:
myuser hard nproc 16384
myuser soft nproc 16384
Restarted the service and now it's running with 7017 threads.
Ps. I have a 32 cores server and I'm handling 18k simultaneous connections with this configuration.
I think the best way in your case is to set socket timeout instead of spawning thread:
h = httplib.HTTPSConnection(self.config['server'],
timeout=self.config['timeout'])
Also you can set global default timeout with socket.setdefaulttimeout() function.
Update: See answers to Is there any way to kill a Thread in Python? question (there are several quite informative) to understand why. Thread.__stop() doesn't terminate thread, but rather set internal flag so that it's considered already stopped.
I completely rewrite code from httplib to pycurl.
c = pycurl.Curl()
c.setopt(pycurl.FOLLOWLOCATION, 1)
c.setopt(pycurl.MAXREDIRS, 5)
c.setopt(pycurl.CONNECTTIMEOUT, CONNECTION_TIMEOUT)
c.setopt(pycurl.TIMEOUT, COOPERATION_TIMEOUT)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.POST, 1)
c.setopt(pycurl.SSL_VERIFYHOST, 0)
c.setopt(pycurl.SSL_VERIFYPEER, 0)
c.setopt(pycurl.URL, "https://"+server+path)
c.setopt(pycurl.POSTFIELDS,sended_data)
b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
something like that.
And I testing it now. Thanks all of you for help.
If you are tying to set timeout why don't you use urllib2.
I'm running a python script on my machine only to copy and convert some files from one format to another, I want to maximize the number of running threads to finish as quickly as possible.
Note: It is not a good workaround from an architecture perspective If you aren't using it for a quick script on a specific machine.
In my case, I checked the max number of running threads that my machine can run before I got the error, It was 150
I added this code before starting a new thread. which checks if the max limit of running threads is reached then the app will wait until some of the running threads finish, then it will start new threads
while threading.active_count()>150 :
time.sleep(5)
mythread.start()
If you are using a ThreadPoolExecutor, the problem may be that your max_workers is higher than the threads allowed by your OS.
It seems that the executor keeps the information of the last executed threads in the process table, even if the threads are already done. This means that when your application has been running for a long time, eventually it will register in the process table as many threads as ThreadPoolExecutor.max_workers
As far as I can tell it's not a python problem. Your system somehow cannot create another thread (I had the same problem and couldn't start htop on another cli via ssh).
The answer of Fernando Ulisses dos Santos is really good. I just want to add, that there are other tools limiting the number of processes and memory usage "from the outside". It's pretty common for virtual servers. Starting point is the interface of your vendor or you might have luck finding some information in files like
/proc/user_beancounters
Is there any way to make multiple calls from an xmlrpc client to different xmlrpc servers at a time.
My Server code looks like this: (I'll have this code runnning in two machines, server1 & server2)
class TestMethods(object):
def printHello(self):
while(1):
time.sleep(10)
print "HELLO FROM SERVER"
return True
class ServerThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.server = SimpleXMLRPCServer(("x.x.x.x", 8000))
self.server.register_instance(TestMethods())
def run(self):
self.server.serve_forever()
server = ServerThread()
server.start()
My Client code looks like this:
import xmlrpclib
client1 = xmlrpclib.ServerProxy("http://x.x.x.x:8080") # registering with server 1
client2 = xmlrpclib.ServerProxy("http:/x.x.x.x:8080") # registering with server 2
ret1 = client1.printHello()
ret2 = client2.printHello()
Now, on the 10th second I'll get a response from server1 and on the 20th second I'll get a response from server2 which is unfortunately not what I want.
I'm trying to make calls to two machines at a time so that I get the response back from those two machines at a time.
PLease help me out, THanks in advance.
There a a few different ways to do this.
python multiprocessing
Is the built-in python module for running stuff in parallel. The docs are fairly clear. The easiest & most extensible way using this method is with a 'Pool' of workers that you can add as many to as you want.
from multiprocessing import Pool
import xmlrpclib
def hello_client(url):
client = xmlrpclib.ServerProxy(url)
return client.printHello()
p = Pool(processes=10) # maximum number of requests at once.
servers_list = ['http://x.x.x.x:8080', 'http://x.x.x.x:8080']
# you can add as many other servers into that list as you want!
results = p.map(hello_client, servers_list)
print results
twisted python
twisted python is an amazing clever system for writing all kinds of multithreaded / parallel / multiprocess stuff. The documentation is a bit confusing.
Tornado
Another non-blocking python framework. Also very cool. Here's an answer about XMLRPC, python, and tornado.
gevent
A 'magic' way of allowing blocking tasks in python to happen in the background. Very very cool. And here's a question about how to use XMLRPC in python with gevent.