find the index of values that satisfy A + B =C + D - python

Working on below problem, using Python 2.7. Post my code and wondering if any further smart ideas to make it run faster? I thought there might be some ideas which sort the list first, and leveraging sorting behavior, but cannot figure out so far. My code is O(n^2) time complexity.
Problem,
Given an array A of integers, find the index of values that satisfy A + B =C + D, where A,B,C & D are integers values in the array. Find all combinations of quadruples.
Code,
from collections import defaultdict
sumIndex = defaultdict(list)
def buildIndex(numbers):
for i in range(len(numbers)):
for j in range(i+1,len(numbers)):
sumIndex[numbers[i]+numbers[j]].append((i,j))
def checkResult():
for k,v in sumIndex.items():
if len(v) > 1:
for i in v:
print k, i
if __name__ == "__main__":
buildIndex([1,2,3,4])
checkResult()
Output, which is sum value, and indexes which sum could result in such value,
5 (0,3)
5 (1,2)

Consider the case where all the elements of the array are equal. Then we know the answer beforehand but merely printing the result will take O(n^2) time since there are n*(n-1)/2 number of such pairs. So I think it is safe to say that there is no approach with a better complexity than O(n^2) for this problem.

Yes it can be done in a way with complexity less than O(n^2). The algo is:
Create a duplicate array suppose indexArr[] storing the index of the element of the original array say origArr[].
Sort the origArr[] in ascending order using some algo having complexity O(nLogn). Likewise also shuffle the indexArr[] while sorting the origArr[].
Now you have to find the pairs in the sorted array, you will run 2 loops finding all the possible combinations. Suppose you select origArr[i] + origArr[i + 1] = sum.
Now you will search iff sum <= origArr[n] where n is the last element of the array which is the maximum element. Also if sum > origArr[n] then you will break the inner loop as well as the outer loop as no other combinations are possible.
Also you will break the inner loop if sum > origArr[j] as no other combinations are possible for that sum.
PS - The worst case scenario will be O(n^2).

faster, more Pythonic approach using itertools.combinations:
from collections import defaultdict
from itertools import combinations
def get_combos(l):
d = defaultdict(list)
for indices in combinations(range(len(l)),2):
d[(l[indices[0]] + l[indices[1]])].append(indices)
return {k:v for k,v in d.items() if len(v) > 1}
timing results
OP this
len(l)=4, min(repeat=100, number=10000) | 0.09334 | 0.08050
len(l)=50, min(repeat=10, number=100) | 0.08689 | 0.08996
len(l)=500, min(repeat=10, number=10) | 0.64974 | 0.59553
len(l)=1000, min(repeat=3, number=3) | 1.01559 | 0.83494
len(l)=5000, min(repeat=3, number=1) | 10.26168 | 8.92959
timing code
from collections import defaultdict
from itertools import combinations
from random import randint
from timeit import repeat
def lin_get_combos(l):
sumIndex = defaultdict(list)
for i in range(len(l)):
for j in range(i+1,len(l)):
sumIndex[l[i]+l[j]].append((i,j))
return {k:v for k,v in sumIndex.items() if len(v) > 1}
def craig_get_combos(l):
d = defaultdict(list)
for indices in combinations(range(len(l)),2):
d[(l[indices[0]] + l[indices[1]])].append(indices)
return {k:v for k,v in d.items() if len(v) > 1}
l = []
for _ in range(4):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=100, number=10000))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 100, number=10000))
print '%0.5f, %0.5f' % (t1, t2)
l = []
for _ in range(50):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=10, number=100))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 10, number=100))
print '%0.5f, %0.5f' % (t1, t2)
l = []
for _ in range(500):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=10, number=10))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 10, number=10))
print '%0.5f, %0.5f' % (t1, t2)
l = []
for _ in range(1000):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=3, number=3))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 3, number=3))
print '%0.5f, %0.5f' % (t1, t2)
l = []
for _ in range(5000):
l.append(randint(0,1000))
t1 = min(repeat(stmt='lin_get_combos(l)', setup='from __main__ import lin_get_combos, l', repeat=3, number=1))
t2 = min(repeat(stmt='craig_get_combos(l)', setup='from __main__ import craig_get_combos, l', repeat= 3, number=1))
print '%0.5f, %0.5f' % (t1, t2)

Related

save time with lambda function + map with 2D array

I was convinced to save computation time in using lambda function, but it's not that clear. look at this example:
import numpy as np
import timeit
def f_with_lambda():
a = np.array(range(5))
b = np.array(range(5))
A,B = np.meshgrid(a,b)
rst = list(map(lambda x,y : x+y , A, B))
return np.array(rst)
def f_with_for():
a = range(5)
b = np.array(range(5))
rst = [b+x for x in a]
return np.array(rst)
lambda_rst = f_with_lambda()
for_rst = f_with_for()
if __name__ == '__main__':
print(timeit.timeit("f_with_lambda()",setup = "from __main__ import f_with_lambda",number = 10000))
print(timeit.timeit("f_with_for()",setup = "from __main__ import f_with_for",number = 10000))
result is simple:
-lambda function result with time it is 0.3514268280014221 s
- with for loop : 0.10633227700236603 s
How do I write my lambda function to be competitive ? I noticed the list function to get results from de map object is not good in time. Any other way to proceed ? the mesgrid function is certainly not the best as well...
every tip is welcome!
Considering the remark about the list:
import numpy as np
import timeit
def f_with_lambda():
A,B = np.meshgrid(range(150),range(150))
return np.array(map(lambda x,y : x+y , A, B))
def f_with_for():
return np.array([np.array(range(150))+x for x in range(150)])
if __name__ == '__main__':
print(timeit.timeit("f_with_lambda()",setup = "from __main__ import f_with_lambda",number = 10000))
print(timeit.timeit("f_with_for()",setup = "from __main__ import f_with_for",number = 10000))
it is changing a lot of things. This time (lambda vs for)
for 5:
0.30227499100146815 vs 0.2510572589999356 (quite similar)
for 150:
0.6687559890015109 vs 20.31807473200024 ( :) :) :) ) !! great job! thank you!
Memory allocation is taking time (it should call an OS procedure, it might be delayed).
In the lambda version, you allocated a, b, meshgrid, rst (list and array versions) + the return array.
In the for version, you allocated b and rst + the return array. a is a generator so it takes no time to create and load it in memory.
This is why your function using lambda is slower.
Plus, don't use list to handle result of np-array operations to cast it back to np-array.
Just by removing the list() it become faster (from 0.9 to 0.4).
def f_with_lambda():
a = np.array(range(SIZE))
b = np.array(range(SIZE))
A,B = np.meshgrid(a,b)
rst = map(lambda x,y : x+y , A, B)
return np.array(rst)
See https://stackoverflow.com/a/46470401/9453926 for speed comparison.
I compacted the code:
import numpy as np
import timeit
def f_with_lambda():
A,B = np.meshgrid(range(150),range(150))
return np.array(list(map(lambda x,y : x+y , A, B)))
def f_with_for():
return np.array([np.array(range(150))+x for x in range(150)])
if __name__ == '__main__':
print(timeit.timeit("f_with_lambda()",setup = "from __main__ import f_with_lambda",number = 10000))
print(timeit.timeit("f_with_for()",setup = "from __main__ import f_with_for",number = 10000))
This time, for a 5x5, the result is
Lambda vs for
0.38113487999726203 vs 0.24913009200099623
and with 150 it's better:
2.680842614001449 vs 20.176408246999927
But I found no way to integrate the mesgrid inside the lambda function. and the list conversion before the array is sad as well.
I took time to integrate the last remark from politinsa:
import numpy as np
import timeit
def f_with_lambda():
A,B = np.meshgrid(range(150),range(150))
return np.array(list(map(lambda x,y : x+y , A, B)))
def f_with_for():
return np.array([np.array(range(150))+x for x in range(150)])
def f_with_lambda_nolist():
A,B = np.meshgrid(range(150),range(150))
return np.array(map(lambda x,y : x+y , A, B))
if __name__ == '__main__':
print(timeit.timeit("f_with_lambda()",setup = "from __main__ import f_with_lambda",number = 10000))
print(timeit.timeit("f_with_for()",setup = "from __main__ import f_with_for",number = 10000))
print(timeit.timeit("f_with_lambda_nolist()",setup = "from __main__ import f_with_lambda_nolist",number = 10000))
results are:
2.4421722999977646 s
18.75847979998798 s
0.6800016999914078 s -> list conversion has (as explained) a real impact on memory allocation

Get the size of a conditional subset of a list

Assume that you have a list with an arbitrary amounts of items, and you wish to get the number of items that match a specific conditions. I though of two ways to do this in a sensible manner but I am not sure which one is best (more pythonic) - or if there is perhaps a better option (without sacrificing too much readability).
import numpy.random as nprnd
import timeit
my = nprnd.randint(1000, size=1000000)
def with_len(my_list):
much = len([t for t in my_list if t >= 500])
def with_sum(my_list):
many = sum(1 for t in my_list if t >= 500)
t1 = timeit.Timer('with_len(my)', 'from __main__ import with_len, my')
t2 = timeit.Timer('with_sum(my)', 'from __main__ import with_sum, my')
print("with len:",t1.timeit(1000)/1000)
print("with sum:",t2.timeit(1000)/1000)
Performance is almost identical between these two cases. However, which of these is more pythonic? Or is there a better alternative?
For those who are curious, I tested the proposed solutions (from comments and answers) and these are the results:
import numpy as np
import timeit
import functools
my = np.random.randint(1000, size=100000)
def with_len(my_list):
return len([t for t in my_list if t >= 500])
def with_sum(my_list):
return sum(1 for t in my_list if t >= 500)
def with_sum_alt(my_list):
return sum(t >= 500 for t in my_list)
def with_lambda(my_list):
return functools.reduce(lambda a, b: a + (1 if b >= 500 else 0), my_list, 0)
def with_np(my_list):
return len(np.where(my_list>=500)[0])
t1 = timeit.Timer('with_len(my)', 'from __main__ import with_len, my')
t2 = timeit.Timer('with_sum(my)', 'from __main__ import with_sum, my')
t3 = timeit.Timer('with_sum_alt(my)', 'from __main__ import with_sum_alt, my')
t4 = timeit.Timer('with_lambda(my)', 'from __main__ import with_lambda, my')
t5 = timeit.Timer('with_np(my)', 'from __main__ import with_np, my')
print("with len:", t1.timeit(1000)/1000)
print("with sum:", t2.timeit(1000)/1000)
print("with sum_alt:", t3.timeit(1000)/1000)
print("with lambda:", t4.timeit(1000)/1000)
print("with np:", t5.timeit(1000)/1000)
Python 2.7
('with len:', 0.02201753337348283)
('with sum:', 0.022727363518455238)
('with sum_alt:', 0.2370256687439941) # <-- very slow!
('with lambda:', 0.026367264818657078)
('with np:', 0.0005811764306089913) # <-- very fast!
Python 3.6
with len: 0.017649643657480736
with sum: 0.0182978007766851
with sum_alt: 0.19659815740239048
with lambda: 0.02691670741400111
with np: 0.000534095418615152
The 2nd one, with_sum is more pythonic in the sense that it uses much less memory as it doesn't build the whole list because the generator expression is fed to sum().
I'm with #Chris_Rands. But as far as performance is concerned, there is a faster way using numpy:
import numpy as np
def with_np(my_list):
return len(np.where(my_list>=500)[0])

Python: Why is list comprehension slower than for loop

Essentially these are the same functions - except list comprehension uses sum instead of x=0; x+= since the later is not supported. Why is list comprehension compiled to something 40% slower?
#list comprehension
def movingAverage(samples, n=3):
return [float(sum(samples[i-j] for j in range(n)))/n for i in range(n-1, len(samples))]
#regular
def moving_average(samples, n=3):
l =[]
for i in range(n-1, len(samples)):
x= 0
for j in range(n):
x+= samples[i-j]
l.append((float(x)/n))
return l
For timing the sample inputs I used variations on [i*random.random() for i in range(x)]
You are using a generator expression in your list comprehension:
sum(samples[i-j] for j in range(n))
Generator expressions require a new frame to be created each time you run one, just like a function call. This is relatively expensive.
You don't need to use a generator expression at all; you only need to slice the samples list:
sum(samples[i - n + 1:i + 1])
You can specify a second argument, a start value for the sum() function; set it to 0.0 to get a float result:
sum(samples[i - n + 1:i + 1], 0.0)
Together these changes make all the difference:
>>> from timeit import timeit
>>> import random
>>> testdata = [i*random.random() for i in range(1000)]
>>> def slow_moving_average(samples, n=3):
... return [float(sum(samples[i-j] for j in range(n)))/n for i in range(n-1, len(samples))]
...
>>> def fast_moving_average(samples, n=3):
... return [sum(samples[i - n + 1:i + 1], 0.0) / n for i in range(n-1, len(samples))]
...
>>> def verbose_moving_average(samples, n=3):
... l =[]
... for i in range(n-1, len(samples)):
... x = 0.0
... for j in range(n):
... x+= samples[i-j]
... l.append(x / n)
... return l
...
>>> timeit('f(s)', 'from __main__ import verbose_moving_average as f, testdata as s', number=1000)
0.9375386269966839
>>> timeit('f(s)', 'from __main__ import slow_moving_average as f, testdata as s', number=1000)
1.9631599469939829
>>> timeit('f(s)', 'from __main__ import fast_moving_average as f, testdata as s', number=1000)
0.5647804250038462

find the best way for factorial in python?

I am researching on speed of factorial. But I am using two ways only,
import timeit
def fact(N):
B = N
while N > 1:
B = B * (N-1)
N = N-1
return B
def fact1(N):
B = 1
for i in range(1, N+1):
B = B * i
return B
print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
Here is the output,
0.540276050568 120
0.654400110245 120
From above code I have observed,
While take less time than for
My question is,
Is the best way to find the factorial in python ?
If you're looking for the best, why not use the one provided in the math module?
>>> import math
>>> math.factorial
<built-in function factorial>
>>> math.factorial(10)
3628800
And a comparison of timings on my machine:
>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
0.840167045593 120
>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
1.04350399971 120
>>> print timeit.timeit('factorial(5)', setup="from math import factorial")
0.149857997894
We see that the builtin is significantly better than either of the pure python variants you proposed.
TLDR; microbenchmarks aren't very useful
For Cpython, try this:
>>> from math import factorial
>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)
1.38128209114 120
>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1(5)
1.46199703217 120
>>> print timeit.timeit('factorial(5)', setup="from math import factorial"), factorial(5)
0.397044181824 120
But under pypy, the while is faster than the one from math
>>>> print timeit.timeit('fact(5)', setup="from __main__ import fact"), fact(5)\
0.170556783676 120
>>>> print timeit.timeit('fact1(5)', setup="from __main__ import fact1"), fact1\
(5)
0.319650173187 120
>>>> print timeit.timeit('factorial(5)', setup="from math import factorial"), f\
actorial(5)
0.210616111755 120
So it depends on the implementation. Now try bigger numbers
>>>> print timeit.timeit('fact(50)', setup="from __main__ import fact"), fact(50)
7.71517109871 30414093201713378043612608166064768844377641568960512000000000000
>>>> print timeit.timeit('fact1(50)', setup="from __main__ import fact1"), fact1(50)
6.58060312271 30414093201713378043612608166064768844377641568960512000000000000
>>>> print timeit.timeit('factorial(50)', setup="from math import factorial"), factorial(50)
6.53072690964 30414093201713378043612608166064768844377641568960512000000000000
while is in last place, and the version using for is about the same as the one from the math module
Otherwise, if you're looking for a Python implementation (this is my favourite):
from operator import mul
def factorial(n):
return reduce(mul, range(1, (n + 1)), 1)
Usage:
>>> factorial(0)
1
>>> factorial(1)
1
>>> factorial(2)
2
>>> factorial(3)
6
>>> factorial(4)
24
>>> factorial(5)
120
>>> factorial(10)
3628800
Performance: (On my desktop:)
$ python -m timeit -c -s "fact = lambda n: reduce(lambda a, x: a * x, range(1, (n + 1)), 1)" "fact(10)"
1000000 loops, best of 3: 1.98 usec per loop
I have tried with reduce(lambda x, y: x*y, range(1, 5))
>>>timeit("import math; math.factorial(4)")
1.0205099133840179
>>>timeit("reduce(lambda x, y: x*y, range(1, 5))")
1.4047879075160665
>>>timeit("from operator import mul;reduce(mul, range(1, 5))")
2.530837320051319

Is there a faster way to get input in python?

In coding competitions we encounter inputs like:
2 3
4 5
So we do the following:
m, n = [int(x) for x in raw_input().split(' ')]
Is there a faster way of doing the same thing?
For all practical purposes, That's about as fast as you can get. On some machines, you may see a speedup on the order or a couple percent if you go with map instead of a list comprehension, but that's not guaranteed.
Here's some quick timings on my machine:
from itertools import imap
#map
>>> timeit.timeit('x,y = map(int,line.split(" "))','from __main__ import line')
4.7857139110565186
>>> timeit.timeit('x,y = map(int,line.split())','from __main__ import line')
4.5680718421936035
#list comprehension
>>> timeit.timeit('x,y = [int(x) for x in line.split(" ")]','from __main__ import line')
4.3816750049591064
>>> timeit.timeit('x,y = [int(x) for x in line.split()]','from __main__ import line')
4.3246541023254395
#itertools.imap
>>> timeit.timeit('x,y = imap(int,line.split(" "))','from __main__ import line,imap')
4.431504011154175
>>> timeit.timeit('x,y = imap(int,line.split())','from __main__ import line,imap')
4.3257410526275635
#generator expression
>>> timeit.timeit('x,y = (int(x) for x in line.split(" "))','from __main__ import line')
4.897794961929321
>>> timeit.timeit('x,y = (int(x) for x in line.split())','from __main__ import line')
4.732620000839233
Surprisingly, split() seems to perform better than split(" ").
If you're guaranteed to have ascii input of numbers between 0 and 9, you can do a good bit better using ord:
>>>timeit.timeit('x,y = [ord(x)-48 for x in line.split(" ")]','from __main__ import line')
1.377655029296875
>>> timeit.timeit('x,y = [ord(x)-48 for x in line.split()]','from __main__ import line')
1.3243558406829834
But that imposes a severe restriction on your inputs.
One other idea that you could try (I have no idea what the performance implications would be), but you could read your lines from sys.stdin:
import sys
for line in sys.stdin:
x,y = [ord(x)-48 for x in line.split()]
Use map(), it's faster than list comprehensions when used with built-in functions:
m, n = map(int, raw_input().split())

Categories

Resources