Optimizing append() in for loops - python

I have a for loop which deals with more than 9 million combinations (for this, I've used itertools library), must perform the code below faster, it's taking too long to loop over all combinations. Appreciate any suggestions
wb = xw.books('FX VEGA BT.xlsm')
sht = wb.sheets['Sheet1']
#retrieving data from excel
df = pd.DataFrame(sht.range('PY_PNL').value, columns=['10','20','25','40','50','60','70','75','80','90'])
#df has shape of 3115 rows × 10 columns
def sharpe(x):
s = round(np.average(x)/np.std(x)*np.sqrt(252),2)
return s
shrps = []
outlist = []
mult = (-1,-2.5,0,1,2.5)
perm = itertools.product(mult,repeat = 10)
for p in perm:
c = df*p
c = c.sum(axis='columns')
outlist.append(p)
shrps.append(sharpe(c))

You can use a list comprehension and it'll be a bit more faster:
shrps = [sharpe((df*p).sum(axis='columns')) for p in perms]
If you really need a copy of perm named as outlist, you can use deepcopy from copy package:
import copy
outlist = copy.deepcopy(perm)
To speed up the process more, you can change something (I don't know how it looks like) in sharpe() function.

Related

Python: Ignore repetition in a loop

Let's say I have a function like this, to merge names in two lists:
def merge_name(list1="John",list2="Doe"):
def merge(name1=list1,name2=list2):
merge=name1+"-"+name2
data={"Merged":merge}
return data
d = pd.DataFrame()
for i,j in [(i,j) for i in list1 for j in list2]:
if i==j:
d=d
else:
x = merge(name1=i,name2=j)
ans=pd.DataFrame({"Merged":[x["Merged"]]})
d=pd.concat([d,ans])
return d
What I am interested in are unique combinations, i.e, "John-Doe" and "Doe-John" are the same for my purposes. So if I run something like this:
names1=["John","Doe","Richard"]
names2=["John","Doe","Richard","Joana"]
df=merge_name(list1=names1,list2=names2)
I will get:
John-Doe
John-Richard
John-Joana
Doe-John
Doe-Richard
Doe-Joana
Richard-John
Richard-Doe
Richard-Joana
The groups in bold are all repetitions. Essentially, every time it comes to the next i, it creates n-1 repeated groups, with n being the position in names1. Is there a way to avoid this, like drop the top name in "list2" every time j becomes the last element in the list?
Thanks in advance.
I have tried to update list2 while in loop but obviously that does not work
Below code can be useful
import pandas as pd
def merge_name(list1="John", list2="Doe"):
merged=[]
for i in list1:
for j in list2:
if (i!=j) and (f"{j} - {i}" not in merged):
merged.append(f"{i} - {j}")
df = pd.DataFrame(set(merged))
return df
names1 = ["John", "Doe", "Richard"]
names2 = ["John", "Doe", "Richard", "Joana"]
df = merge_name(list1=names1, list2=names2)
print(df)
Below is my solution with some explanations:
def combineName(listName):
res = []
for i in range(len(listName)):
for j in range(i+1, len(listName)):
res.append(listName[i] + "-" + listName[j])
return res
names1=["John","Doe","Richard"]
names2=["John","Doe","Richard","Joana"]
listName = list(set(names1 + names2))
print(listName)
print(combineName(listName))
First, you should create a simple list without repetitions. This way you only get unique elements in your list. To do this, I used a set.
I take care to transform my set into a list because later I go through the structure in a given order, which is not supposed to be true for a set.
Secondly, the function creates all the combinations. There are two loops, and you notice that the second loop has a special range.
Indeed, you do not want repetitions such as "John-Doe" and "Doe-John".
Each combination is created at a unique time!

Generate numbers that meet criteria in Python

The below code takes 2 seconds to finish.
The code looks clean but is very inefficient.
I am trying to pre-generate the ways you can build up to a total of max_units in increments of 2.
I'd then filter the created table to where secondary_categories meet certain criteria:
'A' is >10% of the total and 'B'<=50% of the total.
Do you see a better way to get the combinations in increments of 2 that meet criteria like the above?
import itertools
import pandas as pd
primary_types= ['I','II']
secondary_categories= ['A','B']
unitcategories= len(primary_types)*len(secondary_categories) #up to 8
min_units= 108; max_units= 110 #between 20 and 400
max_of_one_type= max_units
args =[[i for i in range(2,max_of_one_type, 2)] for x in range(unitcategories)]
lista= list(itertools.product(*args))
filt= [True if max_units>=l>=min_units else False for l in list(map(sum, lista))]
lista= list(itertools.compress(lista, filt))
df=pd.DataFrame(lista, columns= pd.MultiIndex.from_product([primary_types, secondary_categories], names=['', '']))
df['Total']=df.sum(axis=1)
df
Extending the following makes it take significantly longer or run out of memory: primary_types, secondary_categories, min_units, max_units.
Thank you
OK so I'm posting this just FYI but I don't think it's an ideal solution. I believe there exists a far more elegant solution and I bet it involves numpy. However, this should at least be faster than the OP:
import itertools
import pandas as pd
primary_types = ["I", "II"]
secondary_categories = ["A", "B"]
unitcategories = len(primary_types) * len(secondary_categories) # up to 8
min_units = 54
max_units = 55 # between 10 and 200
max_of_one_type = max_units
args = [range(1, max_of_one_type) for x in range(unitcategories)]
lista = [x for x in itertools.product(*args)if max_units >= sum(x) >= min_units]
df = pd.DataFrame(
lista,
columns=pd.MultiIndex.from_product(
[primary_types, secondary_categories], names=["", ""]
),
)
df["Total"] = df.sum(axis=1)
df = df * 2 # multiply by 2 to get the result you want
I divided everything by 2 at the start and multiplied the result at the end by 2.
I removed all unnecessary uses of list
I removed the itertools.compress and filt and instead just put an if in the list comprehension (where lista is declared and assigned)

Create a list by doing operations which depend on multiple indices - python

I am looking for a faster and more pythonic way to create a list whose elements depend on multiple indices from another list. An example of the code:
import numpy as np
xrandomsorted = np.sort(np.random.randn(1000000)) #input which needs to be used
Npts = int(len(xrandomsorted)/3)
#Part to be optimised begins here
final_list = np.zeros(Npts)
for i in range(Npts):
xval = 12 - 3*xrandomsorted[i] + 7*xrandomsorted[2*i] - xrandomsorted[3*i]
final_list[i] = xval
I have found this solution to be marginally faster (though I still think there may be better solutions!):
list1 = xrandomsorted[0:Npts]
list2 = xrandomsorted[::2][0:Npts]
list3 = xrandomsorted[::3][0:Npts]
final_list = []
for value1, value2, value3 in zip(list1, list2, list3):
xval = 12 - 3*value1 + 7*value2 -value3
final_list.append(xval)
Is there any other way to make the code faster without using numba/cython?
You can use NumPy slicing for a vectorised solution:
n = Npts
A = xrandomsorted
res = 12 - 3*A[:n] + 7*A[:n*2:2] - A[:n*3:3]
The syntax is akin to Python list slicing syntax, i.e. arr[start : stop : step].
Have you tried itemgetter?:
for i in range(Npts):
a,b,c = operator.itemgetter(i,2*i,3*1)(xrandomsorted)
xval = 12 - 3*a + 7*b - c
final_list[i] = xval
It is a powerful tool although don't know about the fastness.

Problems with the zip function: lists that seem not iterable

I'm having some troubles trying to use four lists with the zip function.
In particular, I'm getting the following error at line 36:
TypeError: zip argument #3 must support iteration
I've already read that it happens with not iterable objects, but I'm using it on two lists! And if I try use the zip only on the first 2 lists it works perfectly: I have problems only with the last two.
Someone has ideas on how to solve that? Many thanks!
import numpy
#setting initial values
R = 330
C = 0.1
f_T = 1/(2*numpy.pi*R*C)
w_T = 2*numpy.pi*f_T
n = 10
T = 1
w = (2*numpy.pi)/T
t = numpy.linspace(-2, 2, 100)
#making the lists c_k, w_k, a_k, phi_k
c_karray = []
w_karray = []
A_karray = []
phi_karray = []
#populating the lists
for k in range(1, n, 2):
c_k = 2/(k*numpy.pi)
w_k = k*w
A_k = 1/(numpy.sqrt(1+(w_k)**2))
phi_k = numpy.arctan(-w_k)
c_karray.append(c_k)
w_karray.append(w_k)
A_karray.append(A_k)
phi_karray.append(phi_k)
#making the function w(t)
w = []
#doing the sum for each t and populate w(t)
for i in t:
w_i = ([(A_k*c_k*numpy.sin(w_k*i+phi_k)) for c_k, w_k, A_k, phi_k in zip(c_karray, w_karray, A_k, phi_k)])
w.append(sum(w_i)
Probably you mistyped the last 2 elements in zip. They should be A_karray and phi_karray, because phi_k and A_k are single values.
My result for w is:
[-0.11741034896740517,
-0.099189027720991918,
-0.073206290274556718,
...
-0.089754003567358978,
-0.10828235682188027,
-0.1174103489674052]
HTH,
Germán.
I believe you want zip(c_karray, w_karray, A_karray, phi_karray). Additionally, you should produce this once, not each iteration of the for the loop.
Furthermore, you are not really making use of numpy. Try this instead of your loops.
d = numpy.arange(1, n, 2)
c_karray = 2/(d*numpy.pi)
w_karray = d*w
A_karray = 1/(numpy.sqrt(1+(w_karray)**2))
phi_karray = numpy.arctan(-w_karray)
w = (A_karray*c_karray*numpy.sin(w_karray*t[:,None]+phi_karray)).sum(axis=-1)

Fast way to remove a few items from a list/queue

This is a follow up to a similar question which asked the best way to write
for item in somelist:
if determine(item):
code_to_remove_item
and it seems the consensus was on something like
somelist[:] = [x for x in somelist if not determine(x)]
However, I think if you are only removing a few items, most of the items are being copied into the same object, and perhaps that is slow. In an answer to another related question, someone suggests:
for item in reversed(somelist):
if determine(item):
somelist.remove(item)
However, here the list.remove will search for the item, which is O(N) in the length of the list. May be we are limited in that the list is represented as an array, rather than a linked list, so removing items will need to move everything after it. However, it is suggested here that collections.dequeue is represented as a doubly linked list. It should then be possible to remove in O(1) while iterating. How would we actually accomplish this?
Update:
I did some time testing as well, with the following code:
import timeit
setup = """
import random
random.seed(1)
b = [(random.random(),random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
"""
listcomp = """
c[:] = [x for x in b if tokeep(x)]
"""
filt = """
c = filter(tokeep, b)
"""
print "list comp = ", timeit.timeit(listcomp,setup, number = 10000)
print "filtering = ", timeit.timeit(filt,setup, number = 10000)
and got:
list comp = 4.01255393028
filtering = 3.59962391853
The list comprehension is the asymptotically optimal solution:
somelist = [x for x in somelist if not determine(x)]
It only makes one pass over the list, so runs in O(n) time. Since you need to call determine() on each object, any algorithm will require at least O(n) operations. The list comprehension does have to do some copying, but it's only copying references to the objects not copying the objects themselves.
Removing items from a list in Python is O(n), so anything with a remove, pop, or del inside the loop will be O(n**2).
Also, in CPython list comprehensions are faster than for loops.
If you need to remove item in O(1) you can use HashMaps
Since list.remove is equivalent to del list[list.index(x)], you could do:
for idx, item in enumerate(somelist):
if determine(item):
del somelist[idx]
But: you should not modify the list while iterating over it. It will bite you, sooner or later. Use filter or list comprehension first, and optimise later.
A deque is optimized for head and tail removal, not for arbitrary removal in the middle. The removal itself is fast, but you still have to traverse the list to the removal point. If you're iterating through the entire length, then the only difference between filtering a deque and filtering a list (using filter or a comprehension) is the overhead of copying, which at worst is a constant multiple; it's still a O(n) operation. Also, note that the objects in the list aren't being copied -- just the references to them. So it's not that much overhead.
It's possible that you could avoid copying like so, but I have no particular reason to believe this is faster than a straightforward list comprehension -- it's probably not:
write_i = 0
for read_i in range(len(L)):
L[write_i] = L[read_i]
if L[read_i] not in ['a', 'c']:
write_i += 1
del L[write_i:]
I took a stab at this. My solution is slower, but requires less memory overhead (i.e. doesn't create a new array). It might even be faster in some circumstances!
This code has been edited since its first posting
I had problems with timeit, I might be doing this wrong.
import timeit
setup = """
import random
random.seed(1)
global b
setup_b = [(random.random(), random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
# define and call to turn into psyco bytecode (if using psyco)
b = setup_b[:]
def listcomp():
c[:] = [x for x in b if tokeep(x)]
listcomp()
b = setup_b[:]
def filt():
c = filter(tokeep, b)
filt()
b = setup_b[:]
def forfilt():
marked = (i for i, x in enumerate(b) if tokeep(x))
shift = 0
for n in marked:
del b[n - shift]
shift += 1
forfilt()
b = setup_b[:]
def forfiltCheating():
marked = (i for i, x in enumerate(b) if (x[1] > .45) and (x[1] < .5))
shift = 0
for n in marked:
del b[n - shift]
shift += 1
forfiltCheating()
"""
listcomp = """
b = setup_b[:]
listcomp()
"""
filt = """
b = setup_b[:]
filt()
"""
forfilt = """
b = setup_b[:]
forfilt()
"""
forfiltCheating = '''
b = setup_b[:]
forfiltCheating()
'''
psycosetup = '''
import psyco
psyco.full()
'''
print "list comp = ", timeit.timeit(listcomp, setup, number = 10000)
print "filtering = ", timeit.timeit(filt, setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, setup, number = 10000)
print '\nnow with psyco \n'
print "list comp = ", timeit.timeit(listcomp, psycosetup + setup, number = 10000)
print "filtering = ", timeit.timeit(filt, psycosetup + setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, psycosetup + setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, psycosetup + setup, number = 10000)
And here are the results
list comp = 6.56407690048
filtering = 5.64738512039
forfilter = 7.31555104256
forfiltCheating = 4.8994679451
now with psyco
list comp = 8.0485959053
filtering = 7.79016900063
forfilter = 9.00477004051
forfiltCheating = 4.90830993652
I must be doing something wrong with psyco, because it is actually running slower.
elements are not copied by list comprehension
this took me a while to figure out. See the example code below, to experiment yourself with different approaches
code
You can specify how long a list element takes to copy and how long it takes to evaluate. The time to copy is irrelevant for list comprehension, as it turned out.
import time
import timeit
import numpy as np
def ObjectFactory(time_eval, time_copy):
"""
Creates a class
Parameters
----------
time_eval : float
time to evaluate (True or False, i.e. keep in list or not) an object
time_copy : float
time to (shallow-) copy an object. Used by list comprehension.
Returns
-------
New class with defined copy-evaluate performance
"""
class Object:
def __init__(self, id_, keep):
self.id_ = id_
self._keep = keep
def __repr__(self):
return f"Object({self.id_}, {self.keep})"
#property
def keep(self):
time.sleep(time_eval)
return self._keep
def __copy__(self): # list comprehension does not copy the object
time.sleep(time_copy)
return self.__class__(self.id_, self._keep)
return Object
def remove_items_from_list_list_comprehension(lst):
return [el for el in lst if el.keep]
def remove_items_from_list_new_list(lst):
new_list = []
for el in lst:
if el.keep:
new_list += [el]
return new_list
def remove_items_from_list_new_list_by_ind(lst):
new_list_inds = []
for ee in range(len(lst)):
if lst[ee].keep:
new_list_inds += [ee]
return [lst[ee] for ee in new_list_inds]
def remove_items_from_list_del_elements(lst):
"""WARNING: Modifies lst"""
new_list_inds = []
for ee in range(len(lst)):
if lst[ee].keep:
new_list_inds += [ee]
for ind in new_list_inds[::-1]:
if not lst[ind].keep:
del lst[ind]
if __name__ == "__main__":
ClassSlowCopy = ObjectFactory(time_eval=0, time_copy=0.1)
ClassSlowEval = ObjectFactory(time_eval=1e-8, time_copy=0)
keep_ratio = .8
n_runs_timeit = int(1e2)
n_elements_list = int(1e2)
lsts_to_tests = dict(
list_slow_copy_remove_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_copy_keep_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_eval_remove_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_eval_keep_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
)
for lbl, lst in lsts_to_tests.items():
print()
for fct in [
remove_items_from_list_list_comprehension,
remove_items_from_list_new_list,
remove_items_from_list_new_list_by_ind,
remove_items_from_list_del_elements,
]:
lst_loc = lst.copy()
t = timeit.timeit(lambda: fct(lst_loc), number=n_runs_timeit)
print(f"{fct.__name__}, {lbl}: {t=}")
output
remove_items_from_list_list_comprehension, list_slow_copy_remove_many: t=0.0064229519994114526
remove_items_from_list_new_list, list_slow_copy_remove_many: t=0.006507338999654166
remove_items_from_list_new_list_by_ind, list_slow_copy_remove_many: t=0.006562008995388169
remove_items_from_list_del_elements, list_slow_copy_remove_many: t=0.0076057760015828535
remove_items_from_list_list_comprehension, list_slow_copy_keep_many: t=0.006243691001145635
remove_items_from_list_new_list, list_slow_copy_keep_many: t=0.007145451003452763
remove_items_from_list_new_list_by_ind, list_slow_copy_keep_many: t=0.007032064997474663
remove_items_from_list_del_elements, list_slow_copy_keep_many: t=0.007690364996960852
remove_items_from_list_list_comprehension, list_slow_eval_remove_many: t=1.2495998149970546
remove_items_from_list_new_list, list_slow_eval_remove_many: t=1.1657221479981672
remove_items_from_list_new_list_by_ind, list_slow_eval_remove_many: t=1.2621939050004585
remove_items_from_list_del_elements, list_slow_eval_remove_many: t=1.4632593330024974
remove_items_from_list_list_comprehension, list_slow_eval_keep_many: t=1.1344162709938246
remove_items_from_list_new_list, list_slow_eval_keep_many: t=1.1323430630000075
remove_items_from_list_new_list_by_ind, list_slow_eval_keep_many: t=1.1354237199993804
remove_items_from_list_del_elements, list_slow_eval_keep_many: t=1.3084568729973398
import collections
list1=collections.deque(list1)
for i in list2:
try:
list1.remove(i)
except:
pass
INSTEAD OF CHECKING IF ELEMENT IS THERE. USING TRY EXCEPT.
I GUESS THIS FASTER

Categories

Resources