I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting:
version = '1.2.3.4.5-RC4' # the end can vary a lot
api = '.'.join( version.split('.')[0:3] ) # extract '1.2.3'
Therefore I wonder:
Will this line be executed (interpreted) as creation of a temporary array (memory allocation), then concatenate the first three cells (again memory allocation)?
Or is the python interpreter smart enough?
(I am also curious about optimizations made in this context by Pythran, Parakeet, Numba, Cython, and other python interpreters/compilers...)
Is there a trick to write a replacement line more CPU efficient and still understandable/elegant?
(You can provide specific Python2 and/or Python3 tricks and tips)
I have no idea of the CPU usage, for this purpose, but isn't it why we use high level languages in some way?
Another solution would be using regular expressions, using compiled pattern should allow background optimisations:
import re
version = '1.2.3.4.5-RC4'
pat = re.compile('^(\d+\.\d+\.\d+)')
res = re.match(version)
if res:
print res.group(1)
Edit: As suggested #jonrsharpe, I did also run the timeit benchmark. Here are my results:
def extract_vers(str):
res = pat.match(str)
if res:
return res.group(1)
else:
return False
>>> timeit.timeit("api1(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.9013631343841553
>>> timeit.timeit("api2(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.3482811450958252
>>> timeit.timeit("extract_vers(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.174590826034546
Edit: But anyway, some lib exist in Python, such as distutils.version to do the job.
You should have a look on that answer.
To answer your first question: no, this will not be optimised out by the interpreter. Python will create a list from the string, then create a second list for the slice, then put the list items back together into a new string.
To cover the second, you can optimise this slightly by limiting the split with the optional maxsplit argument:
>>> v = '1.2.3.4.5-RC4'
>>> v.split(".", 3)
['1', '2', '3', '4.5-RC4']
Once the third '.' is found, Python stops searching through the string. You can also neaten slightly by removing the default 0 argument to the slice:
api = '.'.join(version.split('.', 3)[:3])
Note, however, that any difference in performance is negligible:
>>> import timeit
>>> def test1(version):
return '.'.join(version.split('.')[0:3])
>>> def test2(version):
return '.'.join(version.split('.', 3)[:3])
>>> timeit.timeit("test1(s)", setup="from __main__ import test1, test2; s = '1.2.3.4.5-RC4'")
1.0458565345561743
>>> timeit.timeit("test2(s)", setup="from __main__ import test1, test2; s = '1.2.3.4.5-RC4'")
1.0842980287537776
The benefit of maxsplit becomes clearer with longer strings containing more irrelevant '.'s:
>>> timeit.timeit("s.split('.')", setup="s='1.'*100")
3.460900054011617
>>> timeit.timeit("s.split('.', 3)", setup="s='1.'*100")
0.5287887450379003
I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting:
A feel of CPU wasting is absolutely normal for C/C++ programmers facing python code. Your code:
version = '1.2.3.4.5-RC4' # the end can vary a lot
api = '.'.join(version.split('.')[0:3]) # extract '1.2.3'
Is absolutely fine in python, there is no simplification possible. Only if you have to do it 1000s of times, consider using a library function or write your own.
Related
I have a large list of various types of objects I would like Z3 to synthesize in my Python project. Since constraints associated with each object to be synthesized are independent, this process can be completely parallelized. That is, instead of synthesizing one value at a time, if I have a machine with 4 cores, I can synthesize 4 values at the same time. To do this, we must use Python's multiprocessing package instead of threading (due to GIL and the fact that the workload should be CPU-bound).
For simplicity, say I have a simple str synthesizer that synthesizes a new str that is lexicographically less than a given input value, something like this:
def lt_constraint(value):
solver = Solver()
# do a number of processing on 'value', which is an input string
# ... define char and _chars in code here
template = Concat(Re(StringVal(value[:offset])), char, Star(_chars))
solver.add(InRe(String("var"), template))
if solver.check() == sat:
value = solver.model()[self.var]
return convert_to_str(value)
Now if I have a number of values, I want to run the function above in parallel:
from pathos.multiprocessing import ProcessingPool as Pool
with Pool(processes=4) as pool:
value_list = ['This', 'is', 'an', 'example']
synthesized_strs = pool.map(lt_constraint, value_list)
I use pathos hoping that it will handle pickling issue, but I still received this error:
TypeError: cannot pickle 're.Match' object
which I believe is because Z3 uses methods in re and they need to be pickled when pickling lt_constraint(), but dill cannot pickle those.
Is there any other way to parallelize Z3 for my case (other than implementing pickling myself for re or what not)?
Thanks!
Stack-overflow works the best when you include your whole code, so people can experiment with it. Having said that, I had good luck with the following:
from z3 import *
import concurrent.futures
def getVal(value):
solver = Solver()
var = Int('var')
solver.add(var > value)
if solver.check() == sat:
return solver.model()[var].as_long()
else:
return 'CANT SOLVE'
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(getVal, i) for i in [1, 2, 3]]
results = [f.result() for f in futures]
print(results)
This prints:
$ python3.9 a.py
[2, 3, 4]
Without actually having the details of how you constructed your lt_constraint, it's hard to tell whether this'll work for your case. But it seems using the concurrent.futures library works well with z3; so far as simple constraints are used. Give this a try and see if it handles your case as well. If not; please post the full-code as an minimal-reproducible example. See https://stackoverflow.com/help/minimal-reproducible-example
I'm slicing a quite big pandas series (~5M) using .loc and I stumble upon some weird behavior when checking times in an attempt to optimize my code.
It's weird that the first slicing attempt like series_object.loc[some_indexes] is taking 100x longer than the following ones.
When I try timeit it does not reflect this behaviour, but when checking the partial laps using `time``, we can see that the first lap is taking much longer than the following ones.
Is .loc using some sort of cacheing? if that's so, how does garbage collection is not influencing this?
Is timeit doing the cacheing even with garbage collector disabled and not behaving as it's suppose?
Which time should I trust that my app in production will take when running in a live environment?
I tried this on windows and linux machines using different versions of python (3.6, 3.7 and 2.7) and the behavior is always the same.
Thanks in advance for you help. This thing is banging my head for a week already and I miss not doubting %timeit :)
to reproduce:
Save the following code to a python file eg.:test_loc_times.py
import pandas as pd
import numpy as np
import timeit
import time, gc
def get_data():
ids = np.arange(size_bigseries)
big_series = pd.Series(index=ids, data=np.random.rand(len(ids)), name='{} elements series'.format(len(ids)))
small_slice = np.arange(size_slice)
return big_series, small_slice
# Method to test: a simple pandas slicing with .loc
def basic_loc_indexing(pd_series, slice_ids):
return pd_series.loc[slice_ids].dropna()
# method to time it
def timing_it(func, n, *args):
gcold = gc.isenabled()
gc.disable()
times = []
for i in range(n):
s = time.time()
func(*args)
times.append((time.time()-s)*1000)
if gcold:
gc.enable()
return times
if __name__ == '__main__':
import sys
n_tries = int(sys.argv[1]) if len(sys.argv)>1 and sys.argv[1] is not None else 1000
size_bigseries = int(sys.argv[2]) if len(sys.argv)>2 and sys.argv[2] is not None else 5000000 #5M
size_slice = int(sys.argv[3]) if len(sys.argv)>3 and sys.argv[3] is not None else 100 #5M
#1: timeit()
big_series, small_slice = get_data()
time_with_timeit = timeit.timeit('basic_loc_indexing(big_series, small_slice)',"gc.disable(); from __main__ import basic_loc_indexing, big_series, small_slice",number=n_tries)
print("using timeit: {:.6f}ms".format(time_with_timeit/n_tries*1000))
del big_series, small_slice
#2: time()
big_series, small_slice = get_data()
time_with_time = timing_it(basic_loc_indexing, n_tries, big_series, small_slice)
print("using time: {:.6f}ms".format(np.mean(time_with_time)))
print('head detail: {}\n'.format(time_with_time[:5]))
try out:
Run
python test_loc_times.py 1000 5000000 100
This will run timeit and time 1000 laps on slicing 100 elements from a 5M pandas.Series.
you can try it yourself with other values and the first run it always taking longer.
stdout:
>>> using timeit: 0.789754ms
>>> using time: 0.829869ms
>>> head detail: [145.02716064453125, 0.7691383361816406, 0.7028579711914062, 0.5738735198974609, 0.6380081176757812]
Weird right?
edit:
I found this answer which might be related. What do you think?
This code is likely not idempotent (has side effects that impact its execution).
timeit will run the code once first to measure the time and deduce the number of loops and runs it should use. If your code is not idempotent (has side effects, like cashing) then that first run (not recorded) will be longer and the subsequent (faster runs) will be measured and reported.
You can take a look at the arguments you can pass to timeit (see the doc) to specify the number of loops and forgo that initial run.
Also note that (taken from the doc linked above):
The times reported by %timeit will be slightly higher than those reported by the timeit.py script when variables are accessed. This is due to the fact that %timeit executes the statement in the namespace of the shell, compared with timeit.py, which uses a single setup statement to import function or create variables. Generally, the bias does not matter as long as results from timeit.py are not mixed with those from %timeit.
Edit: Missed the fact that you were passing the number of runs to timeit. In that case, only the latter part of my answer applies, but the numbers you are seeing seem to point to another issue...
I see that from x import * is discouraged all over the place. Corrupts naming space, etc.
So I'm inclined to use from . import x, and when I need to use the functions, I'll call x.func() instead of just using func().
The speed difference is probably very little, but I still want to know how much it might impact the performance? So that I can keep the good habit without needing to worry about other things.
It has practically no impact:
>>> import timeit
>>> timeit.timeit('math.pow(1, 1)', 'import math')
0.20310196322982677
>>> timeit.timeit('pow(1, 1)', 'from math import pow')
0.19039931574786806
Note I picked a function that would have very little run time so that any difference would be magnified.
I have been using pickle and was very happy, then I saw this article: Don't Pickle Your Data
Reading further it seems like:
Pickle is slow
Pickle is unsafe
Pickle isn’t human readable
Pickle isn’t language-agnostic
I’ve switched to saving my data as JSON, but I wanted to know about best practice:
Given all these issues, when would you ever use pickle? What specific situations call for using it?
Pickle is unsafe because it constructs arbitrary Python objects by invoking arbitrary functions. However, this is also gives it the power to serialize almost any Python object, without any boilerplate or even white-/black-listing (in the common case). That's very desirable for some use cases:
Quick & easy serialization, for example for pausing and resuming a long-running but simple script. None of the concerns matter here, you just want to dump the program's state as-is and load it later.
Sending arbitrary Python data to other processes or computers, as in multiprocessing. The security concerns may apply (but mostly don't), the generality is absolutely necessary, and humans won't have to read it.
In other cases, none of the drawbacks is quite enough to justify the work of mapping your stuff to JSON or another restrictive data model. Maybe you don't expect to need human readability/safety/cross-language compatibility or maybe you can do without. Remember, You Ain't Gonna Need It. Using JSON would be the right thing™ but right doesn't always equal good.
You'll notice that I completely ignored the "slow" downside. That's because it's partially misleading: Pickle is indeed slower for data that fits the JSON model (strings, numbers, arrays, maps) perfectly, but if your data's like that you should use JSON for other reasons anyway. If your data isn't like that (very likely), you also need to take into account the custom code you'll need to turn your objects into JSON data, and the custom code you'll need to turn JSON data back into your objects. It adds both engineering effort and run-time overhead, which must be quantified on a case-by-case basis.
Pickle has the advantage of convenience -- it can serialize arbitrary object graphs with no extra work, and works on a pretty broad range of Python types. With that said, it would be unusual for me to use Pickle in new code. JSON is just a lot cleaner to work with.
I usually use neither Pickle, nor JSON, but MessagePack it is both safe and fast, and produces serialized data of small size.
An additional advantage is possibility to exchange data with software written in other languages (which of course is also true in case of JSON).
I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) is the fastest dump method.
import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np
num_tests = 10
obj = np.random.normal(0.5, 1, [240, 320, 3])
command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle: %f seconds" % result)
command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle: %f seconds" % result)
command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest: %f seconds" % result)
command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json: %f seconds" % result)
command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack: %f seconds" % result)
Output:
pickle : 0.847938 seconds
cPickle : 0.810384 seconds
cPickle highest: 0.004283 seconds
json : 1.769215 seconds
msgpack : 0.270886 seconds
So, I prefer cPickle with the highest dumping protocol in situations that require real time performance such as video streaming from a camera to
a server.
You can find some answer on JSON vs. Pickle security: JSON can only pickle unicode, int, float, NoneType, bool, list and dict. You can't use it if you want to pickle more advanced objects such as classes instance. Note that for those kinds of pickle, there is no hope to be language agnostic.
Also using cPickle instead of Pickle partially resolve the speed progress.
I have a python console application that contains 300+ regular expressions. The set of regular expressions is fixed for each release. When users run the app, the entire set of regular expressions will be applied anywhere from once (a very short job) to thousands of times (a long job).
I would like to speed up the shorter jobs by compiling the regular expressions up front, pickle the compiled regular expressions to a file, and then load that file when the application is run.
The python re module is efficient and the regex compilation overhead is quite acceptable for long jobs. For short jobs, however, it is a large proportion of the overall run-time. Some users will want to run many small jobs to fit into their existing workflows. Compiling the regular expressions takes about 80ms. A short job might take 20ms-100ms excluding regular expression compilation. So for short jobs, the overhead can be 100% or more. This is with Python27 under both Windows and Linux.
The regular expressions must be applied with the DOTALL flag, so need to be compiled prior to use. A large compilation cache clearly doesn't help in this instances. As some have pointed out, the default method to serialise the compiled regular expression doesn't actually do much.
The re and sre modules compile the patterns into a little custom language with its own opcodes and some auxiliary data structures (e.g., for charsets used in an expression). The pickle function in re.py takes the easy way out. It is:
def _pickle(p):
return _compile, (p.pattern, p.flags)
copy_reg.pickle(_pattern_type, _pickle, _compile)
I think that a good solution to the problem would be an update to the definition of _pickle in re.py that actually pickled the compiled pattern object. Unfortunately, this goes beyond my python skills. I bet, however, that someone here knows how to do it.
I realise that I am not the first person to ask this question - but perhaps you can be the first person to give an accurate and useful response to it!
Your advice would be greatly appreciated.
OK, this isn't pretty, but it might be what you want. I looked at the sre_compile.py module from Python 2.6, and ripped out a bit of it, chopped it in half, and used the two pieces to pickle and unpickle compiled regexes:
import re, sre_compile, sre_parse, _sre
import cPickle as pickle
# the first half of sre_compile.compile
def raw_compile(p, flags=0):
# internal: convert pattern list to internal format
if sre_compile.isstring(p):
pattern = p
p = sre_parse.parse(p, flags)
else:
pattern = None
code = sre_compile._code(p, flags)
return p, code
# the second half of sre_compile.compile
def build_compiled(pattern, p, flags, code):
# print code
# XXX: <fl> get rid of this limitation!
if p.pattern.groups > 100:
raise AssertionError(
"sorry, but this version only supports 100 named groups"
)
# map in either direction
groupindex = p.pattern.groupdict
indexgroup = [None] * p.pattern.groups
for k, i in groupindex.items():
indexgroup[i] = k
return _sre.compile(
pattern, flags | p.pattern.flags, code,
p.pattern.groups-1,
groupindex, indexgroup
)
def pickle_regexes(regexes):
picklable = []
for r in regexes:
p, code = raw_compile(r, re.DOTALL)
picklable.append((r, p, code))
return pickle.dumps(picklable)
def unpickle_regexes(pkl):
regexes = []
for r, p, code in pickle.loads(pkl):
regexes.append(build_compiled(r, p, re.DOTALL, code))
return regexes
regexes = [
r"^$",
r"a*b+c*d+e*f+",
]
pkl = pickle_regexes(regexes)
print pkl
print unpickle_regexes(pkl)
I don't really know if this works, or if it speeds things up. I know it prints a list of regexes when I try it. It might be very specific to version 2.6, I also don't know that.
As others have mentioned, you can simply pickle the compiled regex. They will pickle and unpickle just fine, and be usable. However, it doesn't look like the pickle actually contains the result of compilation. I suspect you will incur the compilation overhead again when you use the result of the unpickling.
>>> p.dumps(re.compile("a*b+c*"))
"cre\n_compile\np1\n(S'a*b+c*'\np2\nI0\ntRp3\n."
>>> p.dumps(re.compile("a*b+c*x+y*"))
"cre\n_compile\np1\n(S'a*b+c*x+y*'\np2\nI0\ntRp3\n."
In these two tests, you can see the only difference between the two pickles is in the string. Apparently compiled regexes don't pickle the compiled bits, just the string needed to compile it again.
But I'm wondering about your application overall: compiling a regex is a fast operation, how short are your jobs that compiling the regex is significant? One possibility is that you are compiling all 300 regexes, and then only using one for a short job. In that case, don't compile them all up front. The re module is very good at using cached copies of compiled regexes, so you generally don't have to compile them yourself, just use the string form. The re module will lookup the string in a dictionary of compiled regexes, so grabbing the compiled form yourself only saves you a dictionary look up. I may be totally off-base, sorry if so.
Just compile as you go - re module will cache the compiled re's even if you dont. Bump the re._MAXCACHE up to 400 or 500, the short jobs will only compile the re's they need, and the long jobs benefit from a big fat cache of compiled expressions - everybody's happy!
Some observations and musings:
You don't need to compile to get the effect of the re.DOTALL flag (or any other flag)-- all you need to do is insert (?s) at the start of the pattern string ... re.DOTALL -> re.S -> the s in (?s). Do a Ctrl-F search for sux (sic) in the re syntax docs.
80ms seems a very short time, even when multiplied by "many" (how many??) short jobs.
Does each job require a new Python process to be started? If so, isn't 80ms small compared with process startup and shutdown overhead? Otherwise, please explain why it is not possible, when a user wants to run "many" small jobs, to do the re.compiles once per batch of jobs.
In a similar case (where every time some input needs to be run through ALL of the regexes), I had to split the Python script in a master-slave setup using *nix sockets; the first time the script is called, the master —doing all time-expensive regex compilations— starts up and the slave for that and all subsequent invokations exchanges data with the master. The master stays idle maximum N seconds.
In my case, this master/slave setup was found to be faster in all occasions than the straightforward way (many invokations against relatively little data every time; also, it had to be a script because it is called from an external application without any Python bindings). I don't know whether this would apply to your situation.
I had the same problem and instead of patching python's re module I opted to create a long running regex "service" instead. Basic code appended below. Please note: It is not designed to handle multiple clients in parallel, i.e. the server is only available once a client has closed the connection.
server
from multiprocessing.connection import Client
from multiprocessing.connection import Listener
import re
class RegexService(object):
patternsByRegex = None
def __init__(self):
self.patternsByRegex = {}
def processMessage(self, message):
regex = message.get('regex')
result = {"error": None}
if regex == None:
result["error"] = "no regex in message - something is wrong with your client"
return result
text = message.get('text')
pattern = self.patternsByRegex.get(regex)
if pattern == None:
print "compiling previously unseen regex: %s" %(regex)
pattern = re.compile(regex, re.IGNORECASE)
self.patternsByRegex[regex] = pattern
if text == None:
result["error"] = "no match"
return result
match = pattern.match(text)
result["matchgroups"] = None
if match == None:
return result
result["matchgroups"] = match.groups()
return result
workAddress = ('localhost', 6000)
resultAddress = ('localhost', 6001)
listener = Listener(workAddress, authkey='secret password')
service = RegexService()
patterns = {}
while True:
connection = listener.accept()
resultClient = Client(resultAddress, authkey='secret password')
while True:
try:
message = connection.recv()
resultClient.send(service.processMessage(message))
except EOFError:
resultClient.close()
connection.close()
break
listener.close()
testclient
from multiprocessing.connection import Client
from multiprocessing.connection import Listener
workAddress = ('localhost', 6000)
resultAddress = ('localhost', 6001)
regexClient = Client(workAddress, authkey='secret password')
resultListener = Listener(resultAddress, authkey='secret password')
resultConnection = None
def getResult():
global resultConnection
if resultConnection == None:
resultConnection = resultListener.accept()
return resultConnection.recv()
regexClient.send({
"regex": r'.*'
})
print str(getResult())
regexClient.send({
"regex": r'.*',
"text": "blub"
})
print str(getResult())
regexClient.send({
"regex": r'(.*)',
"text": "blub"
})
print str(getResult())
resultConnection.close()
regexClient.close()
output of test client run 2 times
$ python ./regexTest.py
{'error': 'no match'}
{'matchgroups': (), 'error': None}
{'matchgroups': ('blub',), 'error': None}
$ python ./regexTest.py
{'error': 'no match'}
{'matchgroups': (), 'error': None}
{'matchgroups': ('blub',), 'error': None}
output of service process during both test runs
$ python ./regexService.py
compiling previously unseen regex: .*
compiling previously unseen regex: (.*)
As long as you create them on program start, the pyc file will cache them. You don't need to result to pickling.