According to this answer, the call len(s) has a complexity of O(1).
Then why is it, that calling it on a downloaded 27kb file so much slower than on a 1kb file?
27kb
>>> timeit.timeit('x = len(r.text)', 'from requests import get; r = get("https://cdn.discordapp.com/attachments/280190011918254081/293010649754370048/Journal.170203183244.01.log")', number = 20)
5.78126864130499
1kb
>>> timeit.timeit('x = len(r.text)', 'from requests import get; r = get("https://cdn.discordapp.com/attachments/280190011918254081/293016636288663562/Journal.170109120508.01.log")', number = 20)
0.00036539355403419904
The problem is, that this example ran on my dev-machine, which is a normal work pc. The machine where the code should run on is a RaspberryPi, which is orders of magnitude slower.
Try assigning r.text to a local variable during your setup phase. It's a lazy property, not a plain attribute, and you're timing the work of constructing the value, which decodes from the internally cached bytes to str, not just the len call.
Hat tip to Martijn Pieters for the precise references!
Related
I'm looking at using Go to write a small program that's mostly handling text. I'm pretty sure, based on what I've heard about Go and Python that Go will be substantially faster. I don't actually have a specific need for insane speeds, but I'd like to get to know Go.
The "Go is going to be faster" idea was supported by a trivial test:
# test.py
print("Hello world")
$ time python dummy.py
Hello world
real 0m0.029s
user 0m0.019s
sys 0m0.010s
// test.go
package main
import "fmt"
func main() {
fmt.Println("hello world")
}
$ time ./test
hello world
real 0m0.001s
user 0m0.001s
sys 0m0.000s
Looks good in terms of raw startup speed (which is entirely expected). Highly non-scientific justification:
$ strace python test.py 2>&1 | wc -l
1223
$ strace ./test 2>&1 | wc -l
174
However, my next contrived test was how fast is Go when faffing with strings, and I was expecting to be similarly blown away by Go's raw speed. So, this was surprising:
# test2.py
s = ""
for i in range(1000000):
s += "a"
$ time python test2.py
real 0m0.179s
user 0m0.145s
sys 0m0.013s
// test2.go
package main
func main() {
s := ""
for i:= 0; i < 1000000; i++ {
s += "a";
}
}
$ time ./test2
real 0m56.840s
user 1m50.836s
sys 0m17.653
So Go is hundreds of times slower than Python.
Now, I know this is probably due to Schlemiel the Painter's algorithm, which explains why the Go implementation is quadratic in i (i is 10 times bigger leads to 100 times slowdown).
However, the Python implementation seems much faster: 10 times more loops only slows it down by twice. The same effect persists if you concatenate str(i), so I doubt there's some kind of magical JIT optimization to s = 100000 * 'a' going on. And it's not much slower if I print(s) at the end, so the variable isn't being optimised out.
Naivety of the concatenation methods aside (there are surely more idiomatic ways in each language), is there something here that I have misunderstood, or is it simply easier in Go than in Python to run into cases where you have to deal with C/C++-style algorithmic issues when handling strings (in which case a straight Go port might not be as uh-may-zing as I might hope without having to, ya'know, think about things and do my homework)?
Or have I run into a case where Python happens to work well, but falls apart under more complex use?
Versions used: Python 3.8.2, Go 1.14.2
TL;DR summary: basically you're testing the two implementation's allocators / garbage collectors and heavily weighting the scale on the Python side (by chance, as it were, but this is something the Python folks optimized at some point).
To expand my comments into a real answer:
Both Go and Python have counted strings, i.e., strings are implemented as a two-element header thingy containing a length (byte count or, for Python 3 strings, Unicode characters count) and data pointer.
Both Go and Python are garbage-collected (GCed) languages. That is, in both languages, you can allocate memory without having to worry about freeing it yourself: the system takes care of that automatically.
But the underlying implementations differ, quite a bit in this particular one important way: the version of Python you are using has a reference counting GC. The Go system you are using does not.
With a reference count, the inner bits of the Python string handler can do this. I'll express it as Go (or at least pseudo-Go) although the actual Python implementation is in C and I have not made all the details line up properly:
// add (append) new string t to existing string s
func add_to_string(s, t string_header) string_header {
need = s.len + t.len
if s.refcount == 1 { // can modify string in-place
data = s.data
if cap(data) >= need {
copy_into(data + s.len, t.data, t.len)
return s
}
}
// s is shared or s.cap < need
new_s := make_new_string(roundup(need))
// important: new_s has extra space for the next call to add_to_string
copy_into(new_s.data, s.data, s.len)
copy_into(new_s.data + s.len, t.data, t.len)
s.refcount--
if s.refcount == 0 {
gc_release_string(s)
}
return new_s
}
By over-allocating—rounding up the need value so that cap(new_s) is large—we get about log2(n) calls to the allocator, where n is the number of times you do s += "a". With n being 1000000 (one million), that's about 20 times that we actually have to invoke the make_new_string function and release (for gc purposes because the collector uses refcounts as a first pass) the old string s.
[Edit: your source archaeology led to commit 2c9c7a5f33d, which suggests less than doubling but still a multiplicative increase. To other readers, see comment.]
The current Go implementation allocates strings without a separate capacity header field (see reflect.StringHeader and note the big caveat that says "don't depend on this, it might be different in future implementations"). Between the lack of a refcount—we can't tell in the runtime routine that adds two strings, that the target has only one reference—and the inability to observe the equivalent of cap(s) (or cap(s.data)), the Go runtime has to create a new string every time. That's one million memory allocations.
To show that the Python code really does use the refcount, take your original Python:
s = ""
for i in range(1000000):
s += "a"
and add a second variable t like this:
s = ""
t = s
for i in range(1000000):
s += "a"
t = s
The difference in execution time is impressive:
$ time python test2.py
0.68 real 0.65 user 0.03 sys
$ time python test3.py
34.60 real 34.08 user 0.51 sys
The modified Python program still beats Go (1.13.5) on this same system:
$ time ./test2
67.32 real 103.27 user 13.60 sys
and I have not poked any further into the details, but I suspect the Go GC is running more aggressively than the Python one. The Go GC is very different internally, requiring write barriers and occasional "stop the world" behavior (of all goroutines that are not doing the GC work). The refcounting nature of the Python GC allows it to never stop: even with a refcount of 2, the refcount on t drops to 1 and then next assignment to t drops it to zero, releasing the memory block for re-use in the next trip through the main loop. So it's probably picking up the same memory block over and over again.
(If my memory is correct, Python's "over-allocate strings and check the refcount to allow expand-in-place" trick was not in all versions of Python. It may have first been added around Python 2.4 or so. This memory is extremely vague and a quick Google search did not turn up any evidence one way or the other. [Edit: Python 2.7.4, apparently.])
Well. You should never, ever use string concatenation in this way :-)
in go, try the strings.Buider
package main
import (
"strings"
)
func main() {
var b1 strings.Builder
for i:= 0; i < 1000000; i++ {
b1.WriteString("a");
}
}
I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting:
version = '1.2.3.4.5-RC4' # the end can vary a lot
api = '.'.join( version.split('.')[0:3] ) # extract '1.2.3'
Therefore I wonder:
Will this line be executed (interpreted) as creation of a temporary array (memory allocation), then concatenate the first three cells (again memory allocation)?
Or is the python interpreter smart enough?
(I am also curious about optimizations made in this context by Pythran, Parakeet, Numba, Cython, and other python interpreters/compilers...)
Is there a trick to write a replacement line more CPU efficient and still understandable/elegant?
(You can provide specific Python2 and/or Python3 tricks and tips)
I have no idea of the CPU usage, for this purpose, but isn't it why we use high level languages in some way?
Another solution would be using regular expressions, using compiled pattern should allow background optimisations:
import re
version = '1.2.3.4.5-RC4'
pat = re.compile('^(\d+\.\d+\.\d+)')
res = re.match(version)
if res:
print res.group(1)
Edit: As suggested #jonrsharpe, I did also run the timeit benchmark. Here are my results:
def extract_vers(str):
res = pat.match(str)
if res:
return res.group(1)
else:
return False
>>> timeit.timeit("api1(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.9013631343841553
>>> timeit.timeit("api2(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.3482811450958252
>>> timeit.timeit("extract_vers(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.174590826034546
Edit: But anyway, some lib exist in Python, such as distutils.version to do the job.
You should have a look on that answer.
To answer your first question: no, this will not be optimised out by the interpreter. Python will create a list from the string, then create a second list for the slice, then put the list items back together into a new string.
To cover the second, you can optimise this slightly by limiting the split with the optional maxsplit argument:
>>> v = '1.2.3.4.5-RC4'
>>> v.split(".", 3)
['1', '2', '3', '4.5-RC4']
Once the third '.' is found, Python stops searching through the string. You can also neaten slightly by removing the default 0 argument to the slice:
api = '.'.join(version.split('.', 3)[:3])
Note, however, that any difference in performance is negligible:
>>> import timeit
>>> def test1(version):
return '.'.join(version.split('.')[0:3])
>>> def test2(version):
return '.'.join(version.split('.', 3)[:3])
>>> timeit.timeit("test1(s)", setup="from __main__ import test1, test2; s = '1.2.3.4.5-RC4'")
1.0458565345561743
>>> timeit.timeit("test2(s)", setup="from __main__ import test1, test2; s = '1.2.3.4.5-RC4'")
1.0842980287537776
The benefit of maxsplit becomes clearer with longer strings containing more irrelevant '.'s:
>>> timeit.timeit("s.split('.')", setup="s='1.'*100")
3.460900054011617
>>> timeit.timeit("s.split('.', 3)", setup="s='1.'*100")
0.5287887450379003
I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting:
A feel of CPU wasting is absolutely normal for C/C++ programmers facing python code. Your code:
version = '1.2.3.4.5-RC4' # the end can vary a lot
api = '.'.join(version.split('.')[0:3]) # extract '1.2.3'
Is absolutely fine in python, there is no simplification possible. Only if you have to do it 1000s of times, consider using a library function or write your own.
On Linux systems root privileges can be granted more selectively than adding the setuid bit using file capabilities. See capabilities(7) for details. These are attributes of files and can be read using the getcap program. How can these attributes be retrieved in Python?
Even though running the getcap program using e.g. subprocess for answering such a question is possible it is not desirable when retrieving very many capabilities.
It should be possible to devise a solution using ctypes. Are there alternatives to this approach or even libraries facilitating this task?
Python 3.3 comes with os.getxattr. If not, yeah... one way would be using ctypes, at least to get the raw stuff, or maybe use pyxattr
For pyxattr:
>>> import xattr
>>> xattr.listxattr("/bin/ping")
(u'security.capability',)
>>> xattr.getxattr("/bin/ping", "security.capability")
'\x00\x00\x00\x02\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
For Python 3.3's version, it's essentially the same, just importing os, instead of xattr. ctypes is a bit more involved, though.
Now, we're getting the raw result, meaning that those two are most useful only retrieving textual attributes. But... we can use the same approach of getcap, through libcap itself:
import ctypes
libcap = ctypes.cdll.LoadLibrary("libcap.so")
cap_t = libcap.cap_get_file('/bin/ping')
libcap.cap_to_text.restype = ctypes.c_char_p
libcap.cap_to_text(cap_t, None)
which gives me:
'= cap_net_raw+p'
probably more useful for you.
PS: note that cap_to_text returns a malloced string. It's your job to deallocate it using cap_free
Hint about the "binary gibberish":
>>> import struct
>>> caps = '\x00\x00\x00\x02\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> struct.unpack("<IIIII", caps)
(33554432, 8192, 0, 0, 0)
In that 8192, the only active bit is the 13th. If you go to linux/capability.h, you'll see that CAP_NET_RAW is defined at 13.
Now, if you wan to write a module with all those constants, you can decode the info. But I'd say it's much more laborious than just using ctypes + libcap.
I tried the code from Ricardo Cárdenes's answer, but it did not work properly for me, because some details of the ctypes invocation incorrect. This issue caused a truncated path string to be passed to getxattr(...) inside of libcap, which thus returned the wrong capabilities list for the wrong item (the / directory, or other first path character, and not the actual path).
It is very important to remember and account for the difference between str and bytes in Python 3.X. This code works properly on Python 3.5/3.6:
#!/usr/bin/env python3
import ctypes
import os
import sys
# load shared library
libcap = ctypes.cdll.LoadLibrary('libcap.so')
class libcap_auto_c_char_p(ctypes.c_char_p):
def __del__(self):
libcap.cap_free(self)
# cap_t cap_get_file(const char *path_p)
libcap.cap_get_file.argtypes = [ctypes.c_char_p]
libcap.cap_get_file.restype = ctypes.c_void_p
# char* cap_to_text(cap_t caps, ssize_t *length_p)
libcap.cap_to_text.argtypes = [ctypes.c_void_p, ctypes.c_void_p]
libcap.cap_to_text.restype = libcap_auto_c_char_p
def cap_get_file(path):
cap_t = libcap.cap_get_file(path.encode('utf-8'))
if cap_t is None:
return ''
else:
return libcap.cap_to_text(cap_t, None).value.decode('utf-8')
print(cap_get_file('/usr/bin/traceroute6.iputils'))
print(cap_get_file('/usr/bin/systemd-detect-virt'))
print(cap_get_file('/usr/bin/mtr'))
print(cap_get_file('/usr/bin/tar'))
print(cap_get_file('/usr/bin/bogus'))
The output will look like this (anything nonexistent, or with no capabilities set just returns '':
= cap_net_raw+ep
= cap_dac_override,cap_sys_ptrace+ep
= cap_net_raw+ep
I have been using pickle and was very happy, then I saw this article: Don't Pickle Your Data
Reading further it seems like:
Pickle is slow
Pickle is unsafe
Pickle isn’t human readable
Pickle isn’t language-agnostic
I’ve switched to saving my data as JSON, but I wanted to know about best practice:
Given all these issues, when would you ever use pickle? What specific situations call for using it?
Pickle is unsafe because it constructs arbitrary Python objects by invoking arbitrary functions. However, this is also gives it the power to serialize almost any Python object, without any boilerplate or even white-/black-listing (in the common case). That's very desirable for some use cases:
Quick & easy serialization, for example for pausing and resuming a long-running but simple script. None of the concerns matter here, you just want to dump the program's state as-is and load it later.
Sending arbitrary Python data to other processes or computers, as in multiprocessing. The security concerns may apply (but mostly don't), the generality is absolutely necessary, and humans won't have to read it.
In other cases, none of the drawbacks is quite enough to justify the work of mapping your stuff to JSON or another restrictive data model. Maybe you don't expect to need human readability/safety/cross-language compatibility or maybe you can do without. Remember, You Ain't Gonna Need It. Using JSON would be the right thing™ but right doesn't always equal good.
You'll notice that I completely ignored the "slow" downside. That's because it's partially misleading: Pickle is indeed slower for data that fits the JSON model (strings, numbers, arrays, maps) perfectly, but if your data's like that you should use JSON for other reasons anyway. If your data isn't like that (very likely), you also need to take into account the custom code you'll need to turn your objects into JSON data, and the custom code you'll need to turn JSON data back into your objects. It adds both engineering effort and run-time overhead, which must be quantified on a case-by-case basis.
Pickle has the advantage of convenience -- it can serialize arbitrary object graphs with no extra work, and works on a pretty broad range of Python types. With that said, it would be unusual for me to use Pickle in new code. JSON is just a lot cleaner to work with.
I usually use neither Pickle, nor JSON, but MessagePack it is both safe and fast, and produces serialized data of small size.
An additional advantage is possibility to exchange data with software written in other languages (which of course is also true in case of JSON).
I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) is the fastest dump method.
import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np
num_tests = 10
obj = np.random.normal(0.5, 1, [240, 320, 3])
command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle: %f seconds" % result)
command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle: %f seconds" % result)
command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest: %f seconds" % result)
command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json: %f seconds" % result)
command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack: %f seconds" % result)
Output:
pickle : 0.847938 seconds
cPickle : 0.810384 seconds
cPickle highest: 0.004283 seconds
json : 1.769215 seconds
msgpack : 0.270886 seconds
So, I prefer cPickle with the highest dumping protocol in situations that require real time performance such as video streaming from a camera to
a server.
You can find some answer on JSON vs. Pickle security: JSON can only pickle unicode, int, float, NoneType, bool, list and dict. You can't use it if you want to pickle more advanced objects such as classes instance. Note that for those kinds of pickle, there is no hope to be language agnostic.
Also using cPickle instead of Pickle partially resolve the speed progress.
I'm getting this exception when trying to open shelve persisted files over a certain size which is actually pretty small (< 1MB) but I'm not sure where the exactly number is. Now, I know pickle is sort of the bastard child of python and shelve isn't thought of as a particularly robust solution, but it happens to solve my problem wonderfully (in theory) and I haven't been able to find a reason for this exception.
Traceback (most recent call last):
File "test_shelve.py", line 27, in <module>
print len(f.keys())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shelve.py", line 101, in keys
return self.dict.keys()
SystemError: Negative size passed to PyString_FromStringAndSize
I can reproduce it consistently, but I haven't found much on google. Here's a script that will reproduce.
import shelve
import random
import string
import pprint
f = shelve.open('test')
# f = {}
def rand_list(list_size=20, str_size=40):
return [''.join([random.choice(string.ascii_uppercase + string.digits) for j in range(str_size)]) for i in range(list_size)]
def recursive_dict(depth=3):
if depth==0:
return rand_list()
else:
d = {}
for k in rand_list():
d[k] = recursive_dict(depth-1)
return d
for k,v in recursive_dict(2).iteritems():
f[k] = v
f.close()
f = shelve.open('test')
print len(f.keys())
Regarding error itself:
The idea circulating on the web is the data size exceeded the largest
integer possible on that machine (the largest 32 bit (signed) integer
is 2 147 483 647), interpreted as a negative size by Python.
Your code is running with 2.7.3, so may be a fixed bug.
The code "works" if I change the depth from 2 to 1, or if I run under python 3 (after fixing the print statements and using items() instead of iteritems()). However, the list of keys is clearly not the set of keys found while iterating over the return value of recursive_dict().
The following restriction from the shelve documentation may apply (emphases mine):
The choice of which database package will be used (such as dbm, gdbm or bsddb) depends on which interface is available. Therefore it is not safe to open the database directly using dbm. The database is also (unfortunately) subject to the limitations of dbm, if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small, and in rare cases key collisions may cause the database to refuse updates.