Why is Boto dynamodb2 significantly slower thane dynamodb? - python

I have been using boto.dynamodb for a couple of years but recently found that I need to put lists and maps into my DynamoDB table so I decided to switch over to boto.dynamodb2.
However, in attempting to do so I found that the performance for dynamodb2 was significantly slower. I am curious as to if this should be expected and if so what is the benefit to moving to dynamodb2 (assuming I don't need to use any of DyanomoDBs newer functions).
In the test below, I utilized some timing code from Python time measure function
#timing
def test1():
logs = aws_conn.query(old_table, hash_key=88, attributes_to_get=attribs)
return list(logs)
#timing
def test2():
logs = log_table.query_2(key__eq=88, attributes_to_get=attribs)
return list(logs)
# >>> logs1 = test1() # dynamodb.layer2.query for 90705 objects
# test1 function took 60063.000 ms
# >>> logs2 = test2() # dynamodb.table.query_2 for 90706 objects
# test2 function took 164971.000 ms

Related

python timeit: function with decorator

I am trying to figure out whether to compile regex expressions upfront, or alternativelt for startup speed, only compile they once when they are first required.
I can use timeit to test:
re.compile(r'#(?:(?:[\da-f]{3}){1,2}|(?:[\da-f]{4}){1,2})|rgb\(\d{1,3},\d{1,3},\d{1,3}\)')
I don't know know how to figure out how long the parser takes (or how to use timeit to time it) to see the following code (not execute it, just "see" it).
from functools import cache
#cache
def get_color_pattern():
return re.compile(r'#(?:(?:[\da-f]{3}){1,2}|(?:[\da-f]{4}){1,2})|rgb\(\d{1,3},\d{1,3},\d{1,3}\)')
Python code is interpreted "top to bottom". Therefore, if you declare variables either side of your function that are initialised with a "timestamp" you can deduce that the difference between those values will be the time taken to parse the code between them. So, for example,
from functools import cache
import time
import re
start = time.perf_counter()
#cache
def get_color_pattern():
return re.compile(r'#(?:(?:[\da-f]{3}){1,2}|(?:[\da-f]{4}){1,2})|rgb\(\d{1,3},\d{1,3},\d{1,3}\)')
print(time.perf_counter()-start)
start = time.perf_counter()
re.compile(r'#(?:(?:[\da-f]{3}){1,2}|(?:[\da-f]{4}){1,2})|rgb\(\d{1,3},\d{1,3},\d{1,3}\)')
print(time.perf_counter()-start)
Output:
5.995000265102135e-06
0.00024428899996564724
Thus we see the difference in time taken to parse the function versus compiling the expression
If you want to check how long it takes with the cache generated, just run the function once before timing to generate the cache.
from functools import cache
import re
import timeit
#cache
def get_color_pattern():
return re.compile(r'#(?:(?:[\da-f]{3}){1,2}|(?:[\da-f]{4}){1,2})|rgb\(\d{1,3},\d{1,3},\d{1,3}\)')
# Run once to generate cache
get_color_pattern()
number = 1000
t = timeit.timeit(get_color_pattern, number=number)
print(f"Total {t} seconds for {number} iterations, {t / number} at average")
Out: Total 4.5900000000001495e-05 seconds for 1000 iterations, 4.5900000000001496e-08 at average
Running with cache commented out, seems to be roughly 10x slower:
Total 0.0004936999999999997 seconds for 1000 iterations, 4.936999999999997e-07 at average
As pointed out by Kelly Bundy, re.compile already includes a built in caching feature.

Why does Numba skew the timings of a JIT-compiled function?

I'm trying to benchmark a Python function that does list operations using Numba against the CPython interpreter. To compare end-to-end time I used the Linux time utility.
time python3.10 list.py
As I understand the first invocation will be expensive due to JIT compilation, but it does not explain why the maximum recorded time is longer than the total time taken to run the entire script.
# list.py
import numpy as np
from time import time, perf_counter
from numba import njit
#njit
def listOperations():
list = []
for i in range(1000):
list.append(i)
list.sort(reverse=True)
list.remove(420)
list.reverse()
if __name__ == "__main__":
repetitions = 1000
timings = np.zeros(repetitions)
for rep in range(repetitions):
start = time() # Similar results with perf_counter too.
listOperations()
timings[rep] = time() - start
# Convert to milliseconds
timings *= 10e3
print("Mean {}ms, Median {}ms, Std. Dev {}ms, Min {}ms, Max {}ms".format(
float('%.4f' % np.mean(timings)),
float('%.4f' % np.median(timings)),
float('%.4f' % np.std(timings)),
float('%.4f' % np.min(timings)),
float('%.4f' % np.max(timings)))
)
For Numba it shows maximum of ~66.3s while the time utility reports ~8s. The complete results are below.
'''
Numba --->
Mean 66.8154ms, Median 0.391ms, Std. Dev 2097.7752ms, Min 0.3219ms, Max 66371.1143ms
real 0m7.982s
user 0m8.248s
sys 0m0.100s
CPython3.10 --->
Mean 1.6395ms, Median 1.6284ms, Std. Dev 0.0708ms, Min 1.5759ms, Max 2.3198ms
real. 0m1.115s
user 0m1.468s
sys 0m0.080s
'''
The main issue is that the compilation time is included in the timings. Indeed, Numba compiles the functions lazily. To prevent this, you must specify the prototype or to execute the first function call outside (which is generally a good practice in benchmarks).
You can use #njit('()') instead of #njit. With this fix, the Numba code is about twice faster on my machine.
Note that your function does not return anything nor read anything in parameter so the JIT can optimize the function to a no-op. To avoid biases, you certainly need to add a parameter, to use it and to return the list. This is apparently not the case on my machine but different versions of Numba may do that.
Note also that Numba list are generally not where Numba shine. Lists are generally slow (both with and without Numba). It is better to use array when the size is known.
By the way, list is a built-in function. Overwriting it can cause sneaky bugs in modules using it (frequent) so this is not a good idea. I advise you to use another name.
Furthermore, note that the standard deviation was pretty big in the results, the median time was good and the maximum time was very big indicating that the timings were not stable and that this instability was due to one slow call. Such results generally indicates the benchmark is flawed or the function itself has an unstable behaviour (typically due to a bug or an initialization done once).

How to identify where my python script is slow running

I am returning to a functional python script with the intent of optimizing the runtime. For the most part, I have been using timeit and tqmd to track how long individual functions take to run, but is there a way to run a single function and track the performance of all the commands in the python script to get a single output?
For example:
def funct_a(a):
print(a)
def funct_b(b):
complex_function(a)
def funct_c(c):
return c -5
funct_a(5)
funct_b(Oregon)
funct_c(873)
Ideally i would like to see some output of a performance check that reads like this:
funct_a runtime:.000000001 ms
funct_b runtime: 59 ms
funct_c runtime: .00000002 ms
Any ideas would be greatly appreciated
Use a profiler.
I like to use a default profiler (Already included in python) called cProfile.
You can then visualise the data using snakeviz.
This is a rough way on how to use it:
import cProfile
import pstats
with cProfile.Profile() as pr:
{CODE OR FUNCTION HERE}
stats = pstats.Stats(pr)
stats.sort_stats(pstats.SortKey.TIME)
# Now you have two options, either print the data or save it as a file
stats.print_stats() # Print The Stats
stats.dump_stats("File/path.prof") # Saves the data in a file, can me used to see the data visually
Now to visualise it:
Install snakeviz
Go to your filepath
Open cmd/terminal and type snakeviz filename.prof
For further clarification, watch this video:
https://www.youtube.com/watch?v=m_a0fN48Alw&t=188s&ab_channel=mCoding
import time
start = time.time()
#code goes here
end = time.time()
print('Time for code to run: ', end - start)
Use the timeit module:
import timeit
def funct_a(a):
return a
def funct_b(b):
return [b]*20
def funct_c(c):
return c-5
>>> print(timeit.timeit('funct_a(5)', globals=globals()))
0.09223939990624785
>>> print(timeit.timeit('funct_b("Oregon")', globals=globals()))
0.260303599992767
>>> print(timeit.timeit('funct_c(873)', globals=globals()))
0.14657660003285855

Pandas: Why is Series indexing using .loc taking 100x longer on the first run when timing it?

I'm slicing a quite big pandas series (~5M) using .loc and I stumble upon some weird behavior when checking times in an attempt to optimize my code.
It's weird that the first slicing attempt like series_object.loc[some_indexes] is taking 100x longer than the following ones.
When I try timeit it does not reflect this behaviour, but when checking the partial laps using `time``, we can see that the first lap is taking much longer than the following ones.
Is .loc using some sort of cacheing? if that's so, how does garbage collection is not influencing this?
Is timeit doing the cacheing even with garbage collector disabled and not behaving as it's suppose?
Which time should I trust that my app in production will take when running in a live environment?
I tried this on windows and linux machines using different versions of python (3.6, 3.7 and 2.7) and the behavior is always the same.
Thanks in advance for you help. This thing is banging my head for a week already and I miss not doubting %timeit :)
to reproduce:
Save the following code to a python file eg.:test_loc_times.py
import pandas as pd
import numpy as np
import timeit
import time, gc
def get_data():
ids = np.arange(size_bigseries)
big_series = pd.Series(index=ids, data=np.random.rand(len(ids)), name='{} elements series'.format(len(ids)))
small_slice = np.arange(size_slice)
return big_series, small_slice
# Method to test: a simple pandas slicing with .loc
def basic_loc_indexing(pd_series, slice_ids):
return pd_series.loc[slice_ids].dropna()
# method to time it
def timing_it(func, n, *args):
gcold = gc.isenabled()
gc.disable()
times = []
for i in range(n):
s = time.time()
func(*args)
times.append((time.time()-s)*1000)
if gcold:
gc.enable()
return times
if __name__ == '__main__':
import sys
n_tries = int(sys.argv[1]) if len(sys.argv)>1 and sys.argv[1] is not None else 1000
size_bigseries = int(sys.argv[2]) if len(sys.argv)>2 and sys.argv[2] is not None else 5000000 #5M
size_slice = int(sys.argv[3]) if len(sys.argv)>3 and sys.argv[3] is not None else 100 #5M
#1: timeit()
big_series, small_slice = get_data()
time_with_timeit = timeit.timeit('basic_loc_indexing(big_series, small_slice)',"gc.disable(); from __main__ import basic_loc_indexing, big_series, small_slice",number=n_tries)
print("using timeit: {:.6f}ms".format(time_with_timeit/n_tries*1000))
del big_series, small_slice
#2: time()
big_series, small_slice = get_data()
time_with_time = timing_it(basic_loc_indexing, n_tries, big_series, small_slice)
print("using time: {:.6f}ms".format(np.mean(time_with_time)))
print('head detail: {}\n'.format(time_with_time[:5]))
try out:
Run
python test_loc_times.py 1000 5000000 100
This will run timeit and time 1000 laps on slicing 100 elements from a 5M pandas.Series.
you can try it yourself with other values and the first run it always taking longer.
stdout:
>>> using timeit: 0.789754ms
>>> using time: 0.829869ms
>>> head detail: [145.02716064453125, 0.7691383361816406, 0.7028579711914062, 0.5738735198974609, 0.6380081176757812]
Weird right?
edit:
I found this answer which might be related. What do you think?
This code is likely not idempotent (has side effects that impact its execution).
timeit will run the code once first to measure the time and deduce the number of loops and runs it should use. If your code is not idempotent (has side effects, like cashing) then that first run (not recorded) will be longer and the subsequent (faster runs) will be measured and reported.
You can take a look at the arguments you can pass to timeit (see the doc) to specify the number of loops and forgo that initial run.
Also note that (taken from the doc linked above):
The times reported by %timeit will be slightly higher than those reported by the timeit.py script when variables are accessed. This is due to the fact that %timeit executes the statement in the namespace of the shell, compared with timeit.py, which uses a single setup statement to import function or create variables. Generally, the bias does not matter as long as results from timeit.py are not mixed with those from %timeit.
Edit: Missed the fact that you were passing the number of runs to timeit. In that case, only the latter part of my answer applies, but the numbers you are seeing seem to point to another issue...

Timeit module - Passing objects to setup?

I am trying to use the timeit module to time the speed of an algorithm that analyzes data.
The problem is that I have to do run some setup code in order to run this algorithm. Specifically, I have to load some documents from a database, and turn it into matrix representation.
The timeit module does not seem to allow me to pass in the matrix object, and instead forces me to set this up all over again in the setup parameter. Unfortunately this means that the running time of my algorithm is fuzzed by the running time of pre-processing.
Is there some way to pass in objects that were created already, to timeit in the setup parameter? Otherwise, how can I deal with situations where the setup code takes a nontrivial amount of time, and I don't want that to fuzz the code block I actually am trying to test?
Am I approaching this the wrong way?
Running time of your algorithm is not fuzzed by the running time of pre-processing. This can be proved as: Suppose I declare a list in __main__ module and run timeit to find index of some item in this list. But I need to pass the list to timeit too. The list passing is sort of pre-processing. Time returned by timeit shows 0.26 sec (see below code). Now if timeit would have calculated the pre-processing time (importing list from __main__) too, then the result would have been almost 1.1 sec, because importing list from __main__ requires 0.84 sec for 1000000 iterations (see below code). What timeit does is it imports list from __main__ only once and then calculates time required by the algorithm for given number of iterations.
>>> import timeit
>>> lst = range(10)
>>> timeit.timeit('lst.index(9)', 'from __main__ import lst', number = 1000000)
0.2645089626312256
>>> timeit.timeit('from __main__ import lst', number = 1000000)
0.8406829833984375
The time it takes to run the setup code doesn't affect the timeit module's timing calculations.
You should be able to pass your matrix into the setup parameter using import, eg
"from __main__ import mymatrix"

Categories

Resources