So I have been playing around with f-strings and their speeds in comparison in different scenarios. I ran into a scenario where f strings are slower.
Edit: x = 0
In[1]: %timeit f"{x:0128x}"
363 ns ± 1.69 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In[2]: %timeit '%0128x' % x
224 ns ± 1.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In[3]: %timeit f"{x:0128X}"
533 ns ± 22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In[4]: %timeit "%0128X" % x
222 ns ± 0.408 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Why are f-strings slower in this scenario, and why is 'X' so much slower than 'x' for f-strings?
String interpolation with %x (and other numeric conversions) can't be overloaded, so the interpreter can perform it quickly.
f-strings are the same thing as the format() built-in function, which needs to look for a __format__ method on the object. This is slower.
For instance, this class can override %s and format(), but can't override %x:
class myint(int):
def __format__(self, spec):
return "example"
def __int__(self):
return "example"
def __str__(self):
return "example"
def __repr__(self):
return "example"
>>> '%x' % myint()
'0'
Capitalizing the string, in the CPython implementation, first builds the lowercase string then loops over the string to change the case.
Overriding __str__, even to return a constant string, will also make %s slower than %x, since it involves a method call.
Related
Suppose I have a list of short lowercase [a-z] strings (max length 8):
L = ['cat', 'cod', 'dog', 'cab', ...]
How to efficiently determine if a string s is in this list?
I know I can do if s in L: but I could presort L and binary-tree search.
I could even build my own tree, letter by letter. So setting s='cat':
So T[ ord(s[0])-ord('a') ] gives the subtree leading to 'cat' and 'cab', etc. But eek, messy!
I could also make my own hashfunc, as L is static.
def hash_(item):
w = [127**i * (ord(j)-ord('0')) for i,j in enumerate(item)]
return sum(w) % 123456
... and just fiddle the numbers until I don't get duplicates. Again, ugly.
Is there anything out-of-the-box I can use, or must I roll my own?
There are of course going to be solutions everywhere along the complexity/optimisation curve, so my apologies in advance if this question is too open ended.
I'm hunting for something that gives decent performance gain in exchange for a low LoC cost.
The builtin Python set is almost certainly going to be the most efficient device you can use. (Sure, you could roll out cute things such as a DAG of your "vocabulary", but this is going to be much, much slower).
So, convert your list into a set (preferably built once if multiple tests are to be made) and test for membership:
s in set(L)
Or, for multiple tests:
set_values = set(L)
# ...
if s in set_values:
# ...
Here is a simple example to illustrate the performance:
from string import ascii_lowercase
import random
n = 1_000_000
L = [''.join(random.choices(ascii_lowercase, k=6)) for _ in range(n)]
Time to build the set:
%timeit set(L)
# 99.9 ms ± 49.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Time to query against the set:
set_values = set(L)
# non-existent string
%timeit 'foo' in set_values
# 45.1 ns ± 0.0418 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# existing value
s = L[-1]
a = %timeit -o s in set_values
# 45 ns ± 0.0286 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Contrast that to testing directly against the list:
b = %timeit -o s in L
# 16.5 ms ± 24.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
b.average / a.average
# 359141.74
When's the last time you made a 350,000x speedup ;-) ?
Is there a trick to quickly check whether 10000 Windows directories exist in Python? I currently stored them in a list, I wonder how to do the check quickly.
You can iterate through the list and call os.path.exists() (for files and directories), os.path.isfile() (for files) or os.path.isdir() for directories in order to know whether or not these directories exist:
dir_list = [...]
for dir_entry in dir_list:
if not os.path.isdir(dir_entry):
# do something if the dir does not exist
else:
# do something if the dir exists
If you want to just check the existence of the path without checking whether or not it actually is a directory, then there are additional options that may be faster, see Johnny's answer for details.
If simply iterating through the list is not fast enough, you can use a ThreadPoolExecutor in order to iterate over the list in parallel threads (assign chunks of that list (e.g. 1000 directories) to each worker) but I doubt that that would speed up much and handling the return values (if needed) would be complicated.
WORKER_COUNT=10
CHUNK_SIZE=1000
def process_dir_list(dir_list):
# implementation according to the snippet above
(...)
future_list = []
with ThreadPoolExecutor(max_workers=WORKER_COUNT) as executor:
for dir_index in range(0, len(dir_list), CHUNK_SIZE):
future_list.append(executor.submit(process_dir_list, dir_list[dir_index:dir_index + 1000]))
# wait for all futures to finish
for current_future in future_list:
# wait for the current future to finish
result = future.result(timeout=0)
# do something with the result, if desired
Use with multi-process
os.access("/file/path/foo.txt", os.F_OK)
# check file is exists
os.F_OK
# check file is readable
os.R_OK
# check file is wirteable
os.W_OK
# check file is execute
os.X_OK
It is a simple test
In [1]: import os, pathlib
In [2]: p = "/home/lpc/gitlab/config/test"
In [3]: %timeit pathlib.Path(p).exists()
5.85 µs ± 32.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [4]: %timeit os.path.exists(p)
1.03 µs ± 4.69 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: %timeit os.access(p, os.F_OK)
526 ns ± 2.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: def check(p):
...: try:
...: f = open(p)
...: f.close
...: return True
...: except:
...: pass
In [7]: %timeit check(p)
1.52 µs ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [8]: %timeit os.path.isdir(p)
1.05 µs ± 4.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I'm comparing two versions of the Fibonacci routine in Python 3:
import functools
#functools.lru_cache()
def fibonacci_rec(target: int) -> int:
if target < 2:
return target
res = fibonacci_rec(target - 1) + fibonacci_rec(target - 2)
return res
def fibonacci_it(target: int) -> int:
if target < 2:
return target
n_1 = 2
n_2 = 1
for n in range(3, target):
new = n_2 + n_1
n_2 = n_1
n_1 = new
return n_1
The first version is recursive, with memoization (thanks to lru_cache). The second is simply iterative.
I then benchmarked the two versions and I'm slightly surprised by the results:
In [5]: %timeit fibonacci_rec(1000)
82.7 ns ± 2.94 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [6]: %timeit fibonacci_it(1000)
67.5 µs ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The iterative version is waaaaay slower than the recursive one. Of course the first run of the recursive version will take lots of time (to cache all the results), and the recursive version takes more memory space (to store all the calls). But I wasn't expecting such difference on the runtime. Don't I get some overhead by calling a function, compared to just iterating over numbers and swapping variables?
As you can see, timeit invokes the function many times, to get a reliable measurement. The LRU cache of the recursive version is not being cleared between invocations, so after the first run, fibonacci_rec(1000) is just returned from the cache immediately without doing any computation.
As explained by #Thomas, the cache isn't cleared between invocations of fibonacci_rec (so the result of fibonacci(1000) will be cached and re-used). Here is a better benchmark:
def wrapper_rec(target: int) -> int:
res = fibonacci_rec(target)
fibonacci_rec.cache_clear()
return res
def wrapper_it(target: int) -> int:
res = fibonacci_it(target)
# Just to make sure the comparison will be consistent
fibonacci_rec.cache_clear()
return res
And the results:
In [9]: %timeit wrapper_rec(1000)
445 µs ± 12.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: %timeit wrapper_it(1000)
67.5 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I have a set I want to update if there is match in another set. Else I want to append to a list strings of error messages if there is no match. I referenced if/else in a list comprehension to write my code.
Here is what I wrote:
logstocrunch_set=dirlogs_set.difference(dblogs_set)
pattern = re.compile(r"\d*F[IR]P",re.IGNORECASE) #to find register values
logstocrunch_finset = set()
errorlist = []
logstocrunch_finset.update([x for x if pattern.search(x) else errorlist.append(f'{x} is not proper name') for x in logstocrunch_set])
However, when I run this, I get the error invalid syntax with the arror pointed at my if statement.
So why is this happening?
The syntax of a list comprehension with a condition is:
[<value> for <variable> in <iterable> if <condition>]
if <condition> goes after the iterable, not before it.
Also, you can't have an else clause there. It's not a conditional expression that returns different values, it's just used to filter the values in the iterator, so else makes no sense.
You seem to be confusing it with a conditional expression in the <value> part, which allows you to specify different values to be returned in the resulting list depending on a condition. That's just an ordinary conditional expression, not specific to list comprehensions.
You shouldn't use a list comprehension if you want to update multiple targets. Use an ordinary loop.
logstocrunch_finset = set()
errorlist = []
for x in logstocrunch_set:
if pattern.search(x):
logtocrunch_finset.add(x)
else:
errorlist.append(f'{x} is not proper name')
A list comprehension is a way of creating a single list. A basic conditional one must be in the format:
[ expression for item in iterable if condition ]
You can't (easily) update two objects with one comprehension. Also, there's not a lot of point declaring logstocrunch_finset and errorlist and then populating them. Instead, how about something like:
pattern = re.compile(r"\d*F[IR]P", re.IGNORECASE)
logstocrunch_finset = {x for x in logstocrunch_set if pattern.search(x)}
errorlist = [f'{x} is not proper name' for x in logstocrunch_set.difference(logstocrunch_finset)]
UPDATE BELOW - Performance comparison with for loop
As #Barmar suggested, I benchmarked our two solutions. There's not a lot in it. The two comprehensions seem to handle a larger input set better. Changing the ratio of valid to invalid data didn't seem to make much difference.
import re
range_limit = 10
logstocrunch_set = set(
[f'{i}FRP' for i in range(range_limit)] +
[f'longer_{i}frp_lower' for i in range(range_limit)] +
['not valid', 'something else']
)
pattern = re.compile(r"\d*F[IR]P",re.IGNORECASE)
%%timeit -n 100000 -r 20
logstocrunch_finset = set()
errorlist = []
for x in logstocrunch_set:
if pattern.search(x):
logstocrunch_finset.add(x)
else:
errorlist.append(f'{x} is not proper name')
range_limit = 10 | 9.53 µs ± 34.2 ns per loop (mean ± std. dev. of 20 runs, 100000 loops each)
range_limit = 50 | 45.5 µs ± 699 ns per loop (mean ± std. dev. of 20 runs, 100000 loops each)
range_limit = 100 | 89.4 µs ± 1.2 µs per loop (mean ± std. dev. of 10 runs, 100000 loops each)
%%timeit -n 100000 -r 20
logstocrunch_finset = {x for x in logstocrunch_set if pattern.search(x)}
errorlist = [f'{x} is not proper name' for x in logstocrunch_set.difference(logstocrunch_finset)]
range_limit = 10 | 9.58 µs ± 14.1 ns per loop (mean ± std. dev. of 20 runs, 100000 loops each)
range_limit = 50 | 42.2 µs ± 24.7 ns per loop (mean ± std. dev. of 20 runs, 100000 loops each)
range_limit = 100 | 82.2 µs ± 491 ns per loop (mean ± std. dev. of 10 runs, 100000 loops each)
a = ['123b4', '234v5', 'lobf56']
b = [obj1, obj2, obj3] # where each obj is list of object which has attribute called 'serial' which matches serial numbers in list #a
Where obj1.serial is 234v5, obj2.serial is lobf56 and obj3.serial is 123b4
tmplist=list()
for each in a:
for obj in b:
if each == obj.serial:
tmplist.append(obj)
print(tmplist)
output: [obj3, obj1, obj2]
I am currently able to achieve the sorting in above manner. But is there a better way to do it?
Does a list comprehension helps?
[obj for each in a for obj in b if each == obj.serial]
If you compare the time between both, your approach takes:
1.6 µs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
The list comprehension takes:
1.37 µs ± 18.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Therefore, if by "a better way to do it" you mean efficiency. This definitely counts.