Related
I'd like to know why python gives me two different times when I re-order the two nested for loops.
The difference is that significant that causes inaccurate results.
This one almost gives me the result I expect to see:
for i in range(20000):
for j in possibleChars:
entered_pwd = passStr + j + possibleChars[0] * leftPassLen
st = time.perf_counter_ns()
verify_password(stored_pwd, entered_pwd)
endTime = time.perf_counter_ns() - st
tmr[j] += endTime
But this code generate inaccurate results from my view:
for i in possibleChars:
for j in range(20000):
entered_pwd = passStr + i + possibleChars[0] * leftPassLen
st = time.perf_counter_ns()
verify_password(stored_pwd, entered_pwd)
endTime = time.perf_counter_ns() - st
tmr[i] += endTime
This is the function I'm attempting to run timing attack on it:
def verify_password(stored_pwd, entered_pwd):
if len(stored_pwd) != len(entered_pwd):
return False
for i in range(len(stored_pwd)):
if stored_pwd[i] != entered_pwd[i]:
return False
return True
I also observed a problem with character 'U' (capital case), so to have successful runs I had to delete it from my possibleChars list.
The problem is when I measure the time for 'U', it is always near double as other chars.
Let me know if you have any question.
Summing up the timings may not be a good idea here:
One interruption due to e.g., scheduling will have a huge effect on the total and may completely invalidate your measurements.
Iterating like in the first loop is probably more likely to spread noise more evenly across the measurements (this is just an educated guess though).
However, it would be better to use the median or minimum time instead of the sum.
This way, you eliminate all noisy measurements.
That being said, I don't expect the timing difference to be huge and python being a high-level language will generate more noisy measurements compared to more low-level languages (because of garbage collection and so on).
But it still works :)
I've implemented an example relying on the minimum time (instead of the sum).
On my local machine, it works reliably except for the last character, where the timing difference is way smaller:
import time
import string
# We want to leak this
stored_pwd = "S3cret"
def verify_password(entered_pwd):
if len(stored_pwd) != len(entered_pwd):
return False
for i in range(len(stored_pwd)):
if stored_pwd[i] != entered_pwd[i]:
return False
return True
possibleChars = string.printable
MEASUREMENTS = 2000 # works even with numbers as small as 20 (for me)
def find_char(prefix, len_pwd):
tmr = {i: 9999999999 for i in possibleChars}
for i in possibleChars:
for j in range(MEASUREMENTS):
entered_pwd = prefix + i + i * (len_pwd - len(prefix) - 1)
st = time.perf_counter_ns()
verify_password(entered_pwd)
endTime = time.perf_counter_ns() - st
tmr[i] = min(endTime, tmr[i])
return max(tmr.items(), key = lambda x: x[1])[0]
def find_length(max_length = 100):
tmr = [99999999 for i in range(max_length + 1)]
for i in range(max_length + 1):
for j in range(MEASUREMENTS):
st = time.perf_counter_ns()
verify_password("X" * i)
endTime = time.perf_counter_ns() - st
tmr[i] = min(endTime, tmr[i])
return max(enumerate(tmr), key = lambda x: x[1])[0]
length = find_length()
print(f"password length: {length}")
recovered_password = ""
for i in range(length):
recovered_password += find_char(recovered_password, length)
print(f"{recovered_password}{'?' * (length - len(recovered_password))}")
print(f"Password: {recovered_password}")
I have to speed up my current code to do around 10^6 operations in a feasible time. Before I used multiprocessing in my the actual document I tried to do it in a mock case. Following is my attempt:
def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0
while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg
return out
def do_something(List):
# in real case this function takes about 0.5 seconds to finish for each
iteration
turn = []
for e in List:
turn.append((e[0]**2, e[1]**2,e[2]**2))
return turn
t1 = time.time()
List = []
#in the real case these 20's can go as high as 150
for i in range(1,20-2):
for k in range(i+1,20-1):
for j in range(k+1,20):
List.append((i,k,j))
t3 = time.time()
test = []
List = chunkIt(List,3)
if __name__ == '__main__':
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(do_something,List)
for result in results:
test.append(result)
test= np.array(test)
t2 = time.time()
T = t2-t1
T2 = t3-t1
However, when I increase the size of my "List" my computer tires to use all of my RAM and CPU and freezes. I even cut my "List" into 3 pieces so it will only use 3 of my cores. However, nothing changed. Also, when I tried to use it on a smaller data set I noticed the code ran much slower than when it ran on a single core.
I am still very new to multiprocessing in Python, am I doing something wrong. I would appreciate it if you could help me.
To reduce memory usage, I suggest you use instead the multiprocessing module and specifically the imap method method (or imap_unordered method). Unlike the map method of either multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor, the iterable argument is processed lazily. What this means is that if you use a generator function or generator expression for the iterable argument, you do not need to create the complete list of arguments in memory; as a processor in the pool become free and ready to execute more tasks, the generator will be called upon to generate the next argument for the imap call.
By default a chunksize value of 1 is used, which can be inefficient for a large iterable size. When using map and the default value of None for the chunksize argument, the pool will look at the length of the iterable first converting it to a list if necessary and then compute what it deems to be an efficient chunksize based on that length and the size of the pool. When using imap or imap_unordered, converting the iterable to a list would defeat the whole purpose of using that method. But if you know what that size would be (more or less) if the iterable were converted to a list, then there is no reason not to apply the same chunksize calculation the map method would have, and that is what is done below.
The following benchmarks perform the same processing first as a single process and then using multiprocessing using imap where each invocation of do_something on my desktop takes approximately .5 seconds. do_something now has been modified to just process a single i, k, j tuple as there is no longer any need to break up anything into smaller lists:
from multiprocessing import Pool, cpu_count
import time
def half_second():
HALF_SECOND_ITERATIONS = 10_000_000
sum = 0
for _ in range(HALF_SECOND_ITERATIONS):
sum += 1
return sum
def do_something(tpl):
# in real case this function takes about 0.5 seconds to finish for each iteration
half_second() # on my desktop
return tpl[0]**2, tpl[1]**2, tpl[2]**2
"""
def generate_tpls():
for i in range(1, 20-2):
for k in range(i+1, 20-1):
for j in range(k+1, 20):
yield i, k, j
"""
# Use smaller number of tuples so we finish in a reasonable amount of time:
def generate_tpls():
# 64 tuples:
for i in range(1, 5):
for k in range(1, 5):
for j in range(1, 5):
yield i, k, j
def benchmark1():
""" single processing """
t = time.time()
for tpl in generate_tpls():
result = do_something(tpl)
print('benchmark1 time:', time.time() - t)
def compute_chunksize(iterable_size, pool_size):
""" This is more-or-less the function used by the Pool.map method """
chunksize, remainder = divmod(iterable_size, 4 * pool_size)
if remainder:
chunksize += 1
return chunksize
def benchmark2():
""" multiprocssing """
t = time.time()
pool_size = cpu_count() # 8 logical cores (4 physical cores)
N_TUPLES = 64 # number of tuples that will be generated
pool = Pool(pool_size)
chunksize = compute_chunksize(N_TUPLES, pool_size)
for result in pool.imap(do_something, generate_tpls(), chunksize=chunksize):
pass
print('benchmark2 time:', time.time() - t)
if __name__ == '__main__':
benchmark1()
benchmark2()
Prints:
benchmark1 time: 32.261038303375244
benchmark2 time: 8.174998044967651
The nested For loops creating the array before the main definition appears to be the problem. Moving that part to underneath the main definition clears up any memory problems.
def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0
while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg
return out
def do_something(List):
# in real case this function takes about 0.5 seconds to finish for each
iteration
turn = []
for e in List:
turn.append((e[0]**2, e[1]**2,e[2]**2))
return turn
if __name__ == '__main__':
t1 = time.time()
List = []
#in the real case these 20's can go as high as 150
for i in range(1,20-2):
for k in range(i+1,20-1):
for j in range(k+1,20):
List.append((i,k,j))
t3 = time.time()
test = []
List = chunkIt(List,3)
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(do_something,List)
for result in results:
test.append(result)
test= np.array(test)
t2 = time.time()
T = t2-t1
T2 = t3-t1
I am trying to get the longest list of a set of five ordered position, 1 to 5 each, satisfying the condition that any two members of the list cannot share more than one identical position (index). I.e., 11111 and 12222 is permitted (only the 1 at index 0 is shared), but 11111 and 11222 is not permitted (same value at index 0 and 1).
I have tried a brute-force attack, starting with the complete list of permutations, 3125 members, and walking through the list element by element, rejecting the ones that do not match the criteria, in several steps:
step one: testing elements 2 to 3125 against element 1, getting a new shorter list L'
step one: testing elements 3 to N' against element 2', getting a shorter list yet L'',
and so on.
I get a 17 members solution, perfectly valid. The problem is that:
I know there are, at least, two 25-member valid solution found by a matter of good luck,
The solution by this brute-force method depends strongly on the initial order of the 3125 members list, so I have been able to find from 12- to 21-member solutions, shuffling the L0 list, but I have never hit the 25-member solutions.
Could anyone please put light on the problem? Thank you.
This is my approach so far
import csv, random
maxv = 0
soln=0
for p in range(0,1): #Intended to run multiple times
z = -1
while True:
z = z + 1
file1 = 'Step' + "%02d" % (z+0) + '.csv'
file2 = 'Step' + "%02d" % (z+1) + '.csv'
nextdata=[]
with open(file1, 'r') as csv_file:
data = list(csv.reader(csv_file))
#if file1 == 'Step00.csv': # related to p loop
# random.shuffle(data)
i = 0
while i <= z:
nextdata.append(data[i])
i = i + 1
for j in range(z, len(data)):
sum=0
for k in range(0,5):
if (data[z][k] == data[j][k]):
sum = sum + 1
if sum < 2:
nextdata.append(data[j])
ofile = open(file2, 'wb')
writer = csv.writer(ofile)
writer.writerows(nextdata)
ofile.close()
if (len(nextdata) < z + 1 + 1):
if (z+1)>= maxv:
maxv = z+1
print maxv
ofile = open("Solution"+"%02d" % soln + '.csv', 'wb')
writer = csv.writer(ofile)
writer.writerows(nextdata)
ofile.close()
soln = soln + 1
break
Here is a Picat model for the problem (as I understand it): http://hakank.org/picat/longest_subset_of_five_positions.pi It use constraint modelling and SAT solver.
Edit: Here is a MiniZinc model: http://hakank.org/minizinc/longest_subset_of_five_positions.mzn
The model (predicate go/0) check lengths of 2 to 100. All lengths between 2 and 25 has at least one solution (probably at lot more). So 25 is the longest sub sequence. Here is one 25 length solution:
{1,1,1,3,4}
{1,2,5,1,5}
{1,3,4,4,1}
{1,4,2,2,2}
{1,5,3,5,3}
{2,1,3,2,1}
{2,2,4,5,4}
{2,3,2,1,3}
{2,4,1,4,5}
{2,5,5,3,2}
{3,1,2,5,5}
{3,2,3,4,2}
{3,3,5,2,4}
{3,4,4,3,3}
{3,5,1,1,1}
{4,1,4,1,2}
{4,2,1,2,3}
{4,3,3,3,5}
{4,4,5,5,1}
{4,5,2,4,4}
{5,1,5,4,3}
{5,2,2,3,1}
{5,3,1,5,2}
{5,4,3,1,4}
{5,5,4,2,5}
There is a lot of different 25 lengths solutions (the predicate go2/0 checks that).
Here is the complete model (edited from the file above):
import sat.
main => go.
%
% Test all lengths from 2..100.
% 25 is the longest.
%
go ?=>
nolog,
foreach(M in 2..100)
println(check=M),
if once(check(M,_X)) then
println(M=ok)
else
println(M=not_ok)
end,
nl
end,
nl.
go => true.
%
% Check if there is a solution with M numbers
%
check(M, X) =>
N = 5,
X = new_array(M,N),
X :: 1..5,
foreach(I in 1..M, J in I+1..M)
% at most 1 same number in the same position
sum([X[I,K] #= X[J,K] : K in 1..N]) #<= 1,
% symmetry breaking: sort the sub sequence
lex_lt(X[I],X[J])
end,
solve([ff,split],X),
foreach(Row in X)
println(Row)
end,
nl.
I was trying python multiprocessing module to reduce time for my filtering code. At the beginning I have done some experiment. Results are not promising.
I've defined a function to run a loop within a certain range. Then I've run this function with and without threading and measured the time. Here is my code:
import time
from multiprocessing.pool import ThreadPool
def do_loop(i,j):
l = []
for i in range(i,j):
l.append(i)
return l
#loop veriable
x = 7
#without thredding
start_time = time.time()
c = do_loop(0,10**x)
print("--- %s seconds ---" % (time.time() - start_time))
#with thredding
def thread_work(n):
#dividing loop size
a = 0
b = int(n/2)
c = int(n/2)
#multiprocessing
pool = ThreadPool(processes=10)
async_result1 = pool.apply_async(do_loop, (a,b))
async_result2 = pool.apply_async(do_loop, (b,c))
async_result3 = pool.apply_async(do_loop, (c,n))
#get the result from all processes]
result = async_result1.get() + async_result2.get() + async_result3.get()
return result
start_time = time.time()
ll = thread_work(10**x)
print("--- %s seconds ---" % (time.time() - start_time))
For x=7 the result is:
--- 1.0931916236877441 seconds ---
--- 1.4213247299194336 seconds ---
Without threading it takes less time. And here is another problem. For X=8, most of the time I get MemoryError for threading. Once I got this result:
--- 17.04124426841736 seconds ---
--- 32.871358156204224 seconds ---
The solution is important as I need to optimize a filtering task which takes 6 hours.
Depending on your task, multiprocessing may or may not take longer.
If you want to take advantages of your CPU cores and speed up your filtering process then you should use multiprocessing.Pool
offers a convenient means of parallelizing the execution of a function
across multiple input values, distributing the input data across
processes (data parallelism).
I've been creating an example of data filtering and then I've been measuring the timing of a simple approach and the timing of a multiprocess approach. (starting from your code)
# take only the sentences that ends in "we are what we dream", the second word is "are"
import time
from multiprocessing.pool import Pool
LEN_FILTER_SENTENCE = len('we are what we dream')
num_process = 10
def do_loop(sentences):
l = []
for sentence in sentences:
if sentence[-LEN_FILTER_SENTENCE:].lower() =='we are what we doing' and sentence.split()[1] == 'are':
l.append(sentence)
return l
#with thredding
def thread_work(sentences):
#multiprocessing
pool = Pool(processes=num_process)
pool_food = (sentences[i: i + num_process] for i in range(0, len(sentences), num_process))
result = pool.map(do_loop, pool_food)
return result
def test(data_size=5, sentence_size=100):
to_be_filtered = ['we are what we doing'*sentence_size] * 10 ** data_size + ['we are what we dream'*sentence_size] * 10 ** data_size
start_time = time.time()
c = do_loop(to_be_filtered)
simple_time = (time.time() - start_time)
start_time = time.time()
ll = [e for l in thread_work(to_be_filtered) for e in l]
multiprocessing_time = (time.time() - start_time)
assert c == ll
return simple_time, multiprocessing_time
data_size represents the length of your data and sentence_size is a factor of multiplication for each data element, you can see that sentence_size is directly proportional with the number of CPU operations requested for each item from your data.
data_size = [1, 2, 3, 4, 5, 6]
results = {i: {'simple_time': [], 'multiprocessing_time': []} for i in data_size}
sentence_size = list(range(1, 500, 100))
for size in data_size:
for s_size in sentence_size:
simple_time, multiprocessing_time = test(size, s_size)
results[size]['simple_time'].append(simple_time)
results[size]['multiprocessing_time'].append(multiprocessing_time)
import pandas as pd
df_small_data = pd.DataFrame({'simple_data_size_1': results[1]['simple_time'],
'simple_data_size_2': results[2]['simple_time'],
'simple_data_size_3': results[3]['simple_time'],
'multiprocessing_data_size_1': results[1]['multiprocessing_time'],
'multiprocessing_data_size_2': results[2]['multiprocessing_time'],
'multiprocessing_data_size_3': results[3]['multiprocessing_time'],
'sentence_size': sentence_size})
df_big_data = pd.DataFrame({'simple_data_size_4': results[4]['simple_time'],
'simple_data_size_5': results[5]['simple_time'],
'simple_data_size_6': results[6]['simple_time'],
'multiprocessing_data_size_4': results[4]['multiprocessing_time'],
'multiprocessing_data_size_5': results[5]['multiprocessing_time'],
'multiprocessing_data_size_6': results[6]['multiprocessing_time'],
'sentence_size': sentence_size})
Ploting the timing for small data:
ax = df_small_data.set_index('sentence_size').plot(figsize=(20, 10), title = 'Simple vs multiprocessing approach for small data')
ax.set_ylabel('Time in seconds')
Ploting the timing for big data(relative big data):
As you can see, the multiprocessing power is revealing when you have big data that requires relatively significant CPU power for each data element.
The task here is so small that the parallelization overhead hugely dominates over the benefits. This is a common FAQ.
It's better you use multiprocessing.Process(), as python has Global Interpreter Lock(GIL). So even if you make threads to increase the speed of your tasks, it won't increase, it'll go one by one. You can refer python doc for GIL and threading.
Aroosh Rana may have the best answer, but when testing using that approach a couple things to watch out for. The way you grow your array in the loop may be very inefficient, instead consider allocating its full size up front. Also, look closely at the way you divided up the work, you have two loops that process half the array and one that goes from n/2 to n/2. Also as mentioned elsewhere the word done is rather trivial and wouldn't benefit from parallel processing.
I've tried to improve upon your previous test.
import time
from multiprocessing.pool import ThreadPool
import math
def do_loop(array, i,j):
for k in range(i,j):
array[k] = math.cos(1/(1+k))
return array
#loop veriable
x = 7
array_size = 2*10**x
#without thredding
start_time = time.time()
array = [0]*array_size
c = do_loop(array, 0,array_size)
print("--- %s seconds ---" % (time.time() - start_time))
#with thredding
def thread_work(n):
#dividing loop size
array = [0]*n
a = 0
b = int(n/3)
c = int(2*n/3)
#multiprocessing
pool = ThreadPool(processes=4)
async_result1 = pool.apply_async(do_loop, (array, a,b))
async_result2 = pool.apply_async(do_loop, (array, b,c))
async_result3 = pool.apply_async(do_loop, (array, c,n))
#get the result from all processes]
result1 = async_result1.get()
result2 = async_result2.get()
result3 = async_result3.get()
start_time = time.time()
result = result1+result2+result3
print("--- %s seconds ---" % (time.time() - start_time))
return result
start_time = time.time()
ll = thread_work(array_size)
print("--- %s seconds ---" % (time.time() - start_time))
Also keep in mind that with an approach like this you wouldn't have to combine the results at the end as each thread would be processing on the same array.
Why do you use multiprocessing for threads?
Best is to create multiple instances of threads. Give each of them their tasks. Finally, start all of them. And wait until they finish. In the meantime, collect results to some list.
From my experience (for one particular task), I have found that even creating a whole graph of threads at the beginning gives smaller overhead than directly before starting tasks in next node in graph. I mean it for 10, 100, 1000, 10000 threads. Just make sure that threads are sleeping during idle time i.e. time.sleep(0.5) to avoid wasting cycles of CPU.
With threads, you can use lists, dictionaries, and queues, which are thread-safe.
I would like to query the value of an exponentially weighted moving average at particular points. An inefficient way to do this is as follows. l is the list of times of events and queries has the times at which I want the value of this average.
a=0.01
l = [3,7,10,20,200]
y = [0]*1000
for item in l:
y[int(item)]=1
s = [0]*1000
for i in xrange(1,1000):
s[i] = a*y[i-1]+(1-a)*s[i-1]
queries = [23,68,103]
for q in queries:
print s[q]
Outputs:
0.0355271185019
0.0226018371526
0.0158992102478
In practice l will be very large and the range of values in l will also be huge. How can you find the values at the times in queries more efficiently, and especially without computing the potentially huge lists y and s explicitly. I need it to be in pure python so I can use pypy.
Is it possible to solve the problem in time proportional to len(l)
and not max(l) (assuming len(queries) < len(l))?
Here is my code for doing this:
def ewma(l, queries, a=0.01):
def decay(t0, x, t1, a):
from math import pow
return pow((1-a), (t1-t0))*x
assert l == sorted(l)
assert queries == sorted(queries)
samples = []
try:
t0, x0 = (0.0, 0.0)
it = iter(queries)
q = it.next()-1.0
for t1 in l:
# new value is decayed previous value, plus a
x1 = decay(t0, x0, t1, a) + a
# take care of all queries between t0 and t1
while q < t1:
samples.append(decay(t0, x0, q, a))
q = it.next()-1.0
# take care of all queries equal to t1
while q == t1:
samples.append(x1)
q = it.next()-1.0
# update t0, x0
t0, x0 = t1, x1
# take care of any remaining queries
while True:
samples.append(decay(t0, x0, q, a))
q = it.next()-1.0
except StopIteration:
return samples
I've also uploaded a fuller version of this code with unit tests and some comments to pastebin: http://pastebin.com/shhaz710
EDIT: Note that this does the same thing as what Chris Pak suggests in his answer, which he must have posted as I was typing this. I haven't gone through the details of his code, but I think mine is a bit more general. This code supports non-integer values in l and queries. It also works for any kind of iterables, not just lists since I don't do any indexing.
I think you could do it in ln(l) time, if l is sorted. The basic idea is that the non recursive form of EMA is a*s_i + (1-a)^1 * s_(i-1) + (1-a)^2 * s_(i-2) ....
This means for query k, you find the greatest number in l less than k, and for a estimation limit, use the following, where v is the index in l, l[v] is the value
(1-a)^(k-v) *l[v] + ....
Then, you spend lg(len(l)) time in search + a constant multiple for the depth of your estimation. I'll provide a code sample in a little bit (after work) if you want it, just wanted to get my idea out there while I was thinking about it
here's the code -
v is the dictionary of values at a given time; replace with 1 if it's just a 1 every time...
import math
from bisect import bisect_right
a = .01
limit = 1000
l = [1,5,14,29...]
def find_nearest_lt(l, time):
i = bisect_right(a, x)
if i:
return i-1
raise ValueError
def find_ema(l, time):
i = find_nearest_lt(l, time)
if l[i] == time:
result = a * v[l[i]
i -= 1
else:
result = 0
while (time-l[i]) < limit:
result += math.pow(1-a, time-l[i]) * v[l[i]]
i -= 1
return result
if I'm thinking correctly, the find nearest is l(n), then the while loop is <= 1000 iterations, guaranteed, so it's technically a constant (though a kind of large one). find_nearest was stolen from the page on bisect - http://docs.python.org/2/library/bisect.html
It appears that y is a binary value -- either 0 or 1 -- depending on the values of l. Why not use y = set(int(item) for item in l)? That's the most efficient way to store and look up a list of numbers.
Your code will cause an error the first time through this loop:
s = [0]*1000
for i in xrange(1000):
s[i] = a*y[i-1]+(1-a)*s[i-1]
because i-1 is -1 when i=0 (first pass of loop) and both y[-1] and s[-1] are the last element of the list, not the previous. Maybe you want xrange(1,1000)?
How about this code:
a=0.01
l = [3.0,7.0,10.0,20.0,200.0]
y = set(int(item) for item in l)
queries = [23,68,103]
ewma = []
x = 1 if (0 in y) else 0
for i in xrange(1, queries[-1]):
x = (1-a)*x
if i in y:
x += a
if i == queries[0]:
ewma.append(x)
queries.pop(0)
When it's done, ewma should have the moving averages for each query point.
Edited to include SchighSchagh's improvements.