I have a python program that needs to generate several guids and hand them back with some other data to a client over the network. It may be hit with a lot of requests in a short time period and I would like the latency to be as low as reasonably possible.
Ideally, rather than generating new guids on the fly as the client waits for a response, I would rather be bulk-generating a list of guids in the background that is continually replenished so that I always have pre-generated ones ready to hand out.
I am using the uuid module in python on linux. I understand that this is using the uuidd daemon to get uuids. Does uuidd already take care of pre-genreating uuids so that it always has some ready? From the documentation it appears that it does not.
Is there some setting in python or with uuidd to get it to do this automatically? Is there a more elegant approach then manually creating a background thread in my program that maintains a list of uuids?
Are you certain that the uuid module would in fact be too slow to handle the requests you expect in a timely manner? I would be very surprised if UUID generation accounted for a bottleneck in your application.
I would first build the application to simply use the uuid module and then if you find that this module is in fact slowing things down you should investigate a way to keep a pre-generated list of UUIDs around.
I have tested the performance of the uuid module for generating uuids:
>>> import timeit
>>> timer=timeit.Timer('uuid.uuid1()','import uuid')
>>> timer.repeat(3, 10000)
[0.84600019454956055, 0.8469998836517334, 0.84400010108947754]
How many do you need? Is 10000 per second not enough?
Suppose you have a thread to keep topping up a pool of uuid's.
Here is a very simple version
import uuid,threading,time
class UUID_Pool(threading.Thread):
pool_size=10000
def __init__(self):
super(UUID_Pool,self).__init__()
self.daemon=True
self.uuid_pool=set(uuid.uuid1() for x in range(self.pool_size))
def run(self):
while True:
while len(self.uuid_pool) < self.pool_size:
self.uuid_pool.add(uuid.uuid1())
time.sleep(0.01) # top up the pool 100 times/sec
uuid_pool = UUID_Pool()
uuid_pool.start()
get_uuid = uuid_pool.uuid_pool.pop # make a local binding
uuid=get_uuid() # ~60x faster than uuid.uuid1() on my computer
You'd also need to handle the case where your burst empties the pool by using uuid's faster than the thread can generate them.
Related
I have a fairly large python package that interacts synchronously with a third party API server and carries out various operations with the server. Additionally, I am now also starting to collect some of the data for future analysis by pickling the JSON responses. After profiling several serialisation/database methods, using pickle was the fastest in my case. My basic pseudo-code is:
While True:
do_existing_api_stuff()...
# additional data pickling
data = {'info': []} # there are multiple keys in real version!
if pickle_file_exists:
data = unpickle_file()
data['info'].append(new_data)
pickle_data(data)
if len(data['info']) >= 100: # file size limited for read/write speed
create_new_pickle_file()
# intensive section...
# move files from "wip" (Work In Progress) dir to "complete"
if number_of_pickle_files >= 100:
compress_pickle_files() # with lzma
move_compressed_files_to_another_dir()
My main issue is that the compressing and moving of the files takes several seconds to complete and is therefore slowing my main loop. What is the easiest way to call these functions in a non-blocking way without any major modifications to my existing code? I do not need any return from the function, however it will raise an error if anything fails. Another "nice to have" would be for the pickle.dump() to also be non-blocking. Again, I am not interested in the return beyond "did it raise an error?". I am aware that unpickle/append/re-pickle every loop is not particularly efficient, however it does avoid data loss when the api drops out due to connection issues, server errors, etc.
I have zero knowledge on threading, multiprocessing, asyncio, etc and after much searching, I am currently more confused than I was 2 days ago!
FYI, all of the file related functions are in a separate module/class, so that could be made asynchronous if necessary.
EDIT:
There may be multiple calls to the above functions, so I guess some sort of queuing will be required?
Easiest solution is probably the threading standard library package. This will allow you to spawn a thread to do the compression while your main loop continues.
There is almost certainly quite a bit of 'dead time' in your existing loop waiting for the API to respond and conversely there is quite a bit of time spent doing the compression when you could be usefully making another API call. For this reason I'd suggest separating these two aspects. There are lots of good tutorials on threading so I'll just describe a pattern which you could aim for
Keep the API call and the pickling in the main loop but add a step which passes the file path to each pickle to a queue after it is written
Write a function which takes a the queue as its input and works through the filepaths performing the compression
Before starting the main loop, start a thread with the new function as its target
I've built some websites using Flask before, including one which used websockets, but this time I'm not sure how to begin.
I currently have an endless loop in Python which gets sensor data from a ZeroMQ socket. It roughly looks like this:
import zeromq
socket = zeromq.create_socket()
while True:
data_dict = socket.receive_json()
print data_dict # {'temperature': 34.6, 'speed': 12.8, etc.}
I now want to create a dashboard showing the incoming sensor data in real time in some nice charts. Since it's in Python and I'm familiar with Flask and websockets I would like to use that.
The websites I built before were basic request/reply based ones though. How on earth would I create a Flask website from a continuous loop?
The Web page will only be interested on the latest value within a reasonable interval from the user's point of view..., say, 3 seconds, so you can retrieve values in the background using a separate thread.
This is an example of how to use the threading module to update a latest value in the background:
import threading
import random
import time
_last_value = None
def get_last_value():
return _last_value
def retrieve_value():
global _last_value
while True:
_last_value = random.randint(1, 100)
time.sleep(3)
threading.Thread(target=retrieve_value, daemon=True).start()
for i in range(20):
print(i, get_last_value())
time.sleep(1)
In your case, it would be something like:
import threading
import zeromq
_socket = zeromq.create_socket()
_last_data_dict = {}
def get_latest_data():
return _last_data_dict
def retrieve_value():
global _last_data_dict
while True:
_last_data_dict = _socket.receive_json()
threading.Thread(target=retrieve_value, daemon=True).start()
Basically, what you need is some form of storage two processes can access at the same time.
If you don't want to leave the comfort of a single python executable, you should look into threading:
https://docs.python.org/2/library/thread.html
Otherwise, you could write two different python scripts (one for sensor readout, one for flask), let the one write into a file and the next one reading from it (or use a pipe in Linux, no idea what Windows offers), and run both processes at the same time and let your OS handle the "threading".
The second approach has the advantage of your OS taking care of performance, but you loose a lot of freedom in locking and reading the file. There may be some weird behavior if your server reads in the instant your sensor-script writes, but I did similar things without problems and I dimly recall that an OS should take care of consistent file-states whenever it's read or written to.
I am downloading pictures from the internet, and as it turns out, I need to download lots of pictures. I am using a version of the following code fragment (actually looping through the links I intend to download and downloading the pictures :
import urllib
urllib.urlretrieve(link, filename)
I am downloading roughly 1000 pictures every 15 minutes, which is awfully slow based on the number of pictures I need to download.
For efficiency, I set a timeout every 5 seconds (still many downloads last much longer):
import socket
socket.setdefaulttimeout(5)
Besides running a job on a computer cluster to parallelize downloads, is there a way to make the picture download faster / more efficient?
my code above was very naive as I did not take advantage of multi-threading. It obviously takes for url requests to be responded but there is no reason why the computer cannot make further requests while the proxy server responds.
Doing the following adjustments, you can improve efficiency by 10x - and there are further ways for improving efficiency, with packages such as scrapy.
To add multi-threading, do something like the following, using the multiprocessing package:
1) encapsulate the url retrieving in a function:
import import urllib.request
def geturl(link,i):
try:
urllib.request.urlretrieve(link, str(i)+".jpg")
except:
pass
2) then create a collection with all urls as well as names you want for the downloaded pictures:
urls = [url1,url2,url3,urln]
names = [i for i in range(0,len(urls))]
3)Import the Pool class from the multiprocessing package and create an object using such class (obviously you would include all imports in the first line of your code in a real program):
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(100)
then use the pool.starmap() method and pass the function and the arguments of the function.
results = pool.starmap(geturl, zip(links, d))
note: pool.starmap() works only in Python 3
When a program enters I/O wait, the execution is paused so that the kernel can perform the low-level operations associated with the I/O request (this is called a context switch) and is not resumed until the I/O operation is completed.
Context switching is quite a heavy operation. It requires us to save the state of our program (losing any sort of caching we had at the CPU level) and give up the use of the CPU. Later, when we are allowed to run again, we must spend time reinitializing our program on the motherboard and getting ready to resume (of course, all this happens behind the scenes).
With concurrency, on the other hand, we typically have a thing called an “event loop” running that manages what gets to run in our program, and when. In essence, an event loop is simply a list of functions that need to be run. The function at the top of the list gets run, then the next, etc.
The following shows a simple example of an event loop:
from Queue import Queue
from functools import partial
eventloop = None
class EventLoop(Queue):
def start(self):
while True:
function = self.get()
function()
def do_hello():
global eventloop
print "Hello"
eventloop.put(do_world)
def do_world():
global eventloop
print "world"
eventloop.put(do_hello)
if __name__ == "__main__":
eventloop = EventLoop()
eventloop.put(do_hello)
eventloop.start()
If the above seems like something you may use, and you'd also like to see how gevent, tornado, and AsyncIO, can help with your issue, then head out to your (University) library, check out High Performance Python by Micha Gorelick and Ian Ozsvald, and read pp. 181-202.
Note: above code and text are from the book mentioned.
I need something like a temporary in-memory key-value store. I know there are solutions like Redis. But I wonder if using a python dictionary could work? And potentially be even faster?
So think a Tornado (or similar) server running and holding a python dictionary in memory and just return the appropriate value based on the HTTP request.
Why I need this?
As part of a service there are key values being stored but they have this property: the more recent they are the more likely they are to be accessed. So I want to keep say last 100 key values in memory (as well as writing to disk) for faster retrieval.
If the server dies the dictionary can be restored again from disk.
Has anyone done something like this? Am I totally missing something here?
PS: I think it's not possible with a WSGI server, right? Because as far as I know you can't keep something in memory between individual requests.
I'd definitely work with memcached. Once it has been setup you can easily decorate your functions/methods like it's done in my example:
#!/usr/bin/env python
import time
import memcache
import hashlib
def memoize(f):
def newfn(*args, **kwargs):
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
# generate md5 out of args and function
m = hashlib.md5()
margs = [x.__repr__() for x in args]
mkwargs = [x.__repr__() for x in kwargs.values()]
map(m.update, margs + mkwargs)
m.update(f.__name__)
m.update(f.__class__.__name__)
key = m.hexdigest()
value = mc.get(key)
if value:
return value
else:
value = f(*args, **kwargs)
mc.set(key, value, 60)
return value
return f(*args)
return newfn
#memoize
def expensive_function(x):
time.sleep(5)
return x
if __name__ == '__main__':
print expensive_function('abc')
print expensive_function('abc')
Don't care about network latency since that kind of optimization will be a waste of your time.
An in process Python dictionary is way faster than a memcached server. According to a non-rigorous benchmark that I performed some days ago, a single get takes around 2us using an in process python dictionary and around 50us using a memcached server listening on localhost. In my benchmark, I was using libmemcached as C client and python-libmemcached as python wrapper over this C-client.
If you are bundling the dictionary into the same server as is running your actual service, then yes, that would work fine.
If you're creating separate things, well, this is basically what memcached is for. Don't reinvent the wheel.
I am experimenting with something similar, and the corecache library is a great way to test a few caching systems.
https://pypi.python.org/pypi/cachecore
In particular, their SimpleCache implementation relies on a vanilla python dict, and in my preliminary tests it's extremely fast, 10x faster than calling memcached locally (assuming I'm already in the python application that needs caching, probably the tornado service in your case).
It's possible and it is much faster than redis/memcache because of no network latency. You can use cPickle to dump the dictionary every once in a while. It's tricky though if your program spawns sub processes, then updating the values in one process doesn't affect the other.
You could just cache last data in dict, nobody prohibits about it and it works in one-server environment
When new data added - store it to some redis (memcachedb)
When server restarts - just load newest N records to dictionary
All depends on data volume. I believe it takes more memory to keep complex structures in dictionary in python, thought access will be fast - yes
I am using Python 2.6 and the multiprocessing module for multi-threading. Now I would like to have a synchronized dict (where the only atomic operation I really need is the += operator on a value).
Should I wrap the dict with a multiprocessing.sharedctypes.synchronized() call? Or is another way the way to go?
Intro
There seems to be a lot of arm-chair suggestions and no working examples. None of the answers listed here even suggest using multiprocessing and this is quite a bit disappointing and disturbing. As python lovers we should support our built-in libraries, and while parallel processing and synchronization is never a trivial matter, I believe it can be made trivial with proper design. This is becoming extremely important in modern multi-core architectures and cannot be stressed enough! That said, I am far from satisfied with the multiprocessing library, as it is still in its infancy stages with quite a few pitfalls, bugs, and being geared towards functional programming (which I detest). Currently I still prefer the Pyro module (which is way ahead of its time) over multiprocessing due to multiprocessing's severe limitation in being unable to share newly created objects while the server is running. The "register" class-method of the manager objects will only actually register an object BEFORE the manager (or its server) is started. Enough chatter, more code:
Server.py
from multiprocessing.managers import SyncManager
class MyManager(SyncManager):
pass
syncdict = {}
def get_dict():
return syncdict
if __name__ == "__main__":
MyManager.register("syncdict", get_dict)
manager = MyManager(("127.0.0.1", 5000), authkey="password")
manager.start()
raw_input("Press any key to kill server".center(50, "-"))
manager.shutdown()
In the above code example, Server.py makes use of multiprocessing's SyncManager which can supply synchronized shared objects. This code will not work running in the interpreter because the multiprocessing library is quite touchy on how to find the "callable" for each registered object. Running Server.py will start a customized SyncManager that shares the syncdict dictionary for use of multiple processes and can be connected to clients either on the same machine, or if run on an IP address other than loopback, other machines. In this case the server is run on loopback (127.0.0.1) on port 5000. Using the authkey parameter uses secure connections when manipulating syncdict. When any key is pressed the manager is shutdown.
Client.py
from multiprocessing.managers import SyncManager
import sys, time
class MyManager(SyncManager):
pass
MyManager.register("syncdict")
if __name__ == "__main__":
manager = MyManager(("127.0.0.1", 5000), authkey="password")
manager.connect()
syncdict = manager.syncdict()
print "dict = %s" % (dir(syncdict))
key = raw_input("Enter key to update: ")
inc = float(raw_input("Enter increment: "))
sleep = float(raw_input("Enter sleep time (sec): "))
try:
#if the key doesn't exist create it
if not syncdict.has_key(key):
syncdict.update([(key, 0)])
#increment key value every sleep seconds
#then print syncdict
while True:
syncdict.update([(key, syncdict.get(key) + inc)])
time.sleep(sleep)
print "%s" % (syncdict)
except KeyboardInterrupt:
print "Killed client"
The client must also create a customized SyncManager, registering "syncdict", this time without passing in a callable to retrieve the shared dict. It then uses the customized SycnManager to connect using the loopback IP address (127.0.0.1) on port 5000 and an authkey establishing a secure connection to the manager started in Server.py. It retrieves the shared dict syncdict by calling the registered callable on the manager. It prompts the user for the following:
The key in syncdict to operate on
The amount to increment the value accessed by the key every cycle
The amount of time to sleep per cycle in seconds
The client then checks to see if the key exists. If it doesn't it creates the key on the syncdict. The client then enters an "endless" loop where it updates the key's value by the increment, sleeps the amount specified, and prints the syncdict only to repeat this process until a KeyboardInterrupt occurs (Ctrl+C).
Annoying problems
The Manager's register methods MUST be called before the manager is started otherwise you will get exceptions even though a dir call on the Manager will reveal that it indeed does have the method that was registered.
All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)
Using SyncManager's dict method would alleviate annoying problem #2 except that annoying problem #1 prevents the proxy returned by SyncManager.dict() being registered and shared. (SyncManager.dict() can only be called AFTER the manager is started, and register will only work BEFORE the manager is started so SyncManager.dict() is only useful when doing functional programming and passing the proxy to Processes as an argument like the doc examples do)
The server AND the client both have to register even though intuitively it would seem like the client would just be able to figure it out after connecting to the manager (Please add this to your wish-list multiprocessing developers)
Closing
I hope you enjoyed this quite thorough and slightly time-consuming answer as much as I have. I was having a great deal of trouble getting straight in my mind why I was struggling so much with the multiprocessing module where Pyro makes it a breeze and now thanks to this answer I have hit the nail on the head. I hope this is useful to the python community on how to improve the multiprocessing module as I do believe it has a great deal of promise but in its infancy falls short of what is possible. Despite the annoying problems described I think this is still quite a viable alternative and is pretty simple. You could also use SyncManager.dict() and pass it to Processes as an argument the way the docs show and it would probably be an even simpler solution depending on your requirements it just feels unnatural to me.
I would dedicate a separate process to maintaining the "shared dict": just use e.g. xmlrpclib to make that tiny amount of code available to the other processes, exposing via xmlrpclib e.g. a function taking key, increment to perform the increment and one taking just the key and returning the value, with semantic details (is there a default value for missing keys, etc, etc) depending on your app's needs.
Then you can use any approach you like to implement the shared-dict dedicated process: all the way from a single-threaded server with a simple dict in memory, to a simple sqlite DB, etc, etc. I suggest you start with code "as simple as you can get away with" (depending on whether you need a persistent shared dict, or persistence is not necessary to you), then measure and optimize as and if needed.
In response to an appropriate solution to the concurrent-write issue. I did very quick research and found that this article is suggesting a lock/semaphore solution. (http://effbot.org/zone/thread-synchronization.htm)
While the example isn't specificity on a dictionary, I'm pretty sure you could code a class-based wrapper object to help you work with dictionaries based on this idea.
If I had a requirement to implement something like this in a thread safe manner, I'd probably use the Python Semaphore solution. (Assuming my earlier merge technique wouldn't work.) I believe that semaphores generally slow down thread efficiencies due to their blocking nature.
From the site:
A semaphore is a more advanced lock mechanism. A semaphore has an internal counter rather than a lock flag, and it only blocks if more than a given number of threads have attempted to hold the semaphore. Depending on how the semaphore is initialized, this allows multiple threads to access the same code section simultaneously.
semaphore = threading.BoundedSemaphore()
semaphore.acquire() # decrements the counter
... access the shared resource; work with dictionary, add item or whatever.
semaphore.release() # increments the counter
Is there a reason that the dictionary needs to be shared in the first place? Could you have each thread maintain their own instance of a dictionary and either merge at the end of the thread processing or periodically use a call-back to merge copies of the individual thread dictionaries together?
I don't know exactly what you are doing, so keep in my that my written plan may not work verbatim. What I'm suggesting is more of a high-level design idea.