How can i iterate over a large list quickly?

How can i iterate over a large list quickly? - python

I am trying to iterate over a large list. I want a method that can iterate this list quickly. But it takes much time to iterate. Is there any method to iterate quickly or python is not built to do this.
My code snippet is :-
for i in THREE_INDEX:
if check_balanced(rc, pc):
print('balanced')
else:
rc, pc = equation_suffix(rc, pc, i)
Here THREE_INDEX has a length of 117649. It takes much time to iterate over this list, is there any method to iterate it quicker. But it takes around 4-5 minutes to iterate
equation_suffix functions:
def equation_suffix(rn, pn, suffix_list):
len_rn = len(rn)
react_suffix = suffix_list[: len_rn]
prod_suffix = suffix_list[len_rn:]
for re in enumerate(rn):
rn[re[0]] = add_suffix(re[1], react_suffix[re[0]])
for pe in enumerate(pn):
pn[pe[0]] = add_suffix(pe[1], prod_suffix[pe[0]])
return rn, pn
check_balanced function:
def check_balanced(rl, pl):
total_reactant = []
total_product = []
reactant_name = []
product_name = []
for reactant in rl:
total_reactant.append(separate_num(separate_brackets(reactant)))
for product in pl:
total_product.append(separate_num(separate_brackets(product)))
for react in total_reactant:
for key in react:
val = react.get(key)
val_dict = {key: val}
reactant_name.append(val_dict)
for prod in total_product:
for key in prod:
val = prod.get(key)
val_dict = {key: val}
product_name.append(val_dict)
reactant_name = flatten_dict(reactant_name)
product_name = flatten_dict(product_name)
for elem in enumerate(reactant_name):
val_r = reactant_name.get(elem[1])
val_p = product_name.get(elem[1])
if val_r == val_p:
if elem[0] == len(reactant_name) - 1:
return True
else:
return False

I believe the reason why "iterating" the list take a long time is due to the methods you are calling inside the for loop. I took out the methods just to test the speed of the iteration, it appears that iterating through a list of size 117649 is very fast. Here is my test script:
import time
start_time = time.time()
new_list = [(1, 2, 3) for i in range(117649)]
end_time = time.time()
print(f"Creating the list took: {end_time - start_time}s")
start_time = time.time()
for i in new_list:
pass
end_time = time.time()
print(f"Iterating the list took: {end_time - start_time}s")
Output is:
Creating the list took: 0.005337953567504883s
Iterating the list took: 0.0035648345947265625s
Edit: time() returns second.

General for loops aren't an issue, but using them to build (or rebuild) lists is usually slower than using list comprehensions (or in some cases, map/filter, though those are advanced tools that are often a pessimization).
Your functions could be made significantly simpler this way, and they'd get faster to boot. Example rewrites:
def equation_suffix(rn, pn, suffix_list):
prod_suffix = suffix_list[len(rn):]
# Change `rn =` to `rn[:] = ` if you must modify the caller's list as in your
# original code, not just return the modified list (which would be fine in your original code)
rn = [add_suffix(r, suffix) for r, suffix in zip(rn, suffix_list)] # No need to slice suffix_list; zip'll stop when rn exhausted
pn = [add_suffix(p, suffix) for p, suffix in zip(pn, prod_suffix)]
return rn, pn
def check_balanced(rl, pl):
# These can be generator expressions, since they're iterated once and thrown away anyway
total_reactant = (separate_num(separate_brackets(reactant)) for reactant in rl)
total_product = (separate_num(separate_brackets(product)) for product in pl)
reactant_name = []
product_name = []
# Use .items() to avoid repeated lookups, and concat simple listcomps to reduce calls to append
for react in total_reactant:
reactant_name += [{key: val} for key, val in react.items()]
for prod in total_product:
product_name += [{key: val} for key, val in prod.items()]
# These calls are suspicious, and may indicate optimizations to be had on prior lines
reactant_name = flatten_dict(reactant_name)
product_name = flatten_dict(product_name)
for i, (elem, val_r) in enumerate(reactant_name.items()):
if val_r == product_name.get(elem):
if i == len(reactant_name) - 1:
return True
else:
# I'm a little suspicious of returning False the first time a single
# key's value doesn't match. Either it's wrong, or it indicates an
# opportunity to write short-circuiting code that doesn't have
# to fully construct reactant_name and product_name when much of the time
# there will be an early mismatch
return False
I'll also note that using enumerate without unpacking the result is going to get worse performance, and more cryptic code; in this case (and many others), enumerate isn't needed, as listcomps and genexprs can accomplish the same result without knowing the index, but when it is needed, always unpack, e.g. for i, elem in enumerate(...): then using i and elem separately will always run faster than for packed in enumerate(...): and using packed[0] and packed[1] (and if you have more useful names than i and elem, it'll be much more readable to boot).

Related

How can you speed up repeated API calls?

For a detailed understanding, I have attached a link of file:
Actual file
I have data in the list, that has similar syntax like:
i = [a.b>c.d , e.f.g>h.i.j ]
e.g. Each element of the list has multiple sub-elements separated by "." and ">".
for i in range(0, len(l)):
reac={}
reag={}
t = l[i].split(">")
REAC = t[0]
Reac = REAC.split(".")
for o in range(len(Reac)):
reaco = "https://ai.chemistryinthecloud.com/smilies/" + Reac[o]
respo = requests.get(reaco)
reac[o] ={"Smile":Reac[o],"Details" :respo.json()}
if (len(t) != 1):
REAG = t[1]
Reag = REAG.split(".")
for k in range(len(Reag)):
reagk = "https://ai.chemistryinthecloud.com/smilies/" + Reag[k]
repo = requests.get(reagk)
reag[k] = {"Smile": Reag[k], "Details" :repo.json()}
res = {"Reactants": list(reac.values()), "Reagents": list(reag.values())}
boo.append(res)
else:
res = {"Reactants": list(reac.values()), "Reagents": "No reagents"}
boo.append(res)
We have separated all the elements and for each element, we are calling 3rd party API. That consumes too much time.
Is there any way to reduce this time and optimize for the loop?
It takes around 1 minute to respond. We want to optimize to 5-10 seconds.

Just do this instead. I changed the format of your output because it made things unnecessisarily complicated, and removed a lot of non-pythonic things you were doing. The second time you run should be instantaneous, and the first time should be much faster if you have repeated values across equations or sides of the reaction.
# pip install cachetools diskcache
from cachetools import cached
from diskcache import Cache
BASE_URL = "https://ai.chemistryinthecloud.com/smilies/"
CACHEDIR = "api_cache"
#cached(cache=Cache(CACHEDIR))
def get_similies(value):
return requests.get(BASE_URL + value).json()
def parse_eq(eq):
reac, _, reag = (set(v.split(".")) for v in eq.partition(">"))
return {
"reactants": {i: get_similies(i) for i in reac},
"reagents": {i: get_similies(i) for i in reag}
}
result = [parse_eq(eq) for eq in eqs]

iterate over several collections in parallel

I am trying to create a list of objects (from a class defined earlier) through a loop. The structure looks something like:
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in ticker_symbols:
stock = Share(i)
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
stock_object = Company(pe, ps)
stock_list.append(stock_object)
I would however want to add one more attribute to the Company-objects (stock_object) through the loop. The attribute would be a value from another list, like (arbitrary numbers) [5, 10, 20] where the first attribute would go to the first object, the second to the second object etc.Is it possible to do something like:
for i, j in ticker_symbols, list2:
#dostuff
? Could not get this sort of nested loop to work on my own. Thankful for any help.

I believe that all you have to do is change the for loop.
Instead of "for i in ticker_symbols:" you should loop like
"for i in range(len(ticker_symbols))" and then use the index i to do whatever you want with the second list.
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in range(len(ticker_symbols):
stock = Share(ticker_symbols[i])
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
# And then you can write
px = whatever list2[i]
stock_object = Company(pe, ps, px)
stock_list.append(stock_object)
Some people say that using index to iterate is not good practice, but I don't think so specially if the code works.

Try:
for i, j in zip(ticker_symbols, list2):
Or
for (k, i) in enumerate(ticker_symbols):
j = list2[k]
Equivalently:
for index in range(len(ticker_symbols)):
i = ticker_symbols[index]
j = list2[index]

Python: Concatenate similiar objects in List

I have a list containing strings as ['Country-Points'].
For example:
lst = ['Albania-10', 'Albania-5', 'Andorra-0', 'Andorra-4', 'Andorra-8', ...other countries...]
I want to calculate the average for each country without creating a new list. So the output would be (in the case above):
lst = ['Albania-7.5', 'Andorra-4.25', ...other countries...]
Would realy appreciate if anyone can help me with this.
EDIT:
this is what I've got so far. So, "data" is actually a dictionary, where the keys are countries and the values are list of other countries points' to this country (the one as Key). Again, I'm new at Python so I don't realy know all the built-in functions.
for key in self.data:
lst = []
index = 0
score = 0
cnt = 0
s = str(self.data[key][0]).split("-")[0]
for i in range(len(self.data[key])):
if s in self.data[key][i]:
a = str(self.data[key][i]).split("-")
score += int(float(a[1]))
cnt+=1
index+=1
if i+1 != len(self.data[key]) and not s in self.data[key][i+1]:
lst.append(s + "-" + str(float(score/cnt)))
s = str(self.data[key][index]).split("-")[0]
score = 0
self.data[key] = lst

itertools.groupby with a suitable key function can help:
import itertools
def get_country_name(item):
return item.split('-', 1)[0]
def get_country_value(item):
return float(item.split('-', 1)[1])
def country_avg_grouper(lst) :
for ctry, group in itertools.groupby(lst, key=get_country_name):
values = list(get_country_value(c) for c in group)
avg = sum(values)/len(values)
yield '{country}-{avg}'.format(country=ctry, avg=avg)
lst[:] = country_avg_grouper(lst)
The key here is that I wrote a function to do the change out of place and then I can easily make the substitution happen in place by using slice assignment.

I would probabkly do this with an intermediate dictionary.
def country(s):
return s.split('-')[0]
def value(s):
return float(s.split('-')[1])
def country_average(lst):
country_map = {}|
for point in lst:
c = country(pair)
v = value(pair)
old = country_map.get(c, (0, 0))
country_map[c] = (old[0]+v, old[1]+1)
return ['%s-%f' % (country, sum/count)
for (country, (sum, count)) in country_map.items()]
It tries hard to only traverse the original list only once, at the expense of quite a few tuple allocations.

Python nested loop comparing two lists and updating a dictionary

This code works as expected but takes up a lot of memory and takes vastly longer to run than any other part of my code.
def function(input1, input2):
mapping = []
for item in input1:
risks = {"A":0, "B":0, "C":0, "D":0, "E":0}
temp = []
for row in input2:
if item in row[0]:
for key in risks.keys():
if row[1] == key:
risks[key] += 1
temp.append(item)
for key in risks.keys():
temp.append(risks[key])
mapping.append(temp)
return mapping
I'm hoping to find a more efficient way to do this and with far less memory. input1 is a list of unique strings and input2 is a list of tuples that are not unique. There has got to be a better way to do this.
Thanks for your help.

First some test data:
import random
input1 = list(range(1000))
input2 = [
([random.randint(0, 1000) for _ in range(100)], random.choice("ABCDE"))
for _ in range(10000)
]
Then my new function:
def newfunction(input1, input2):
input1_map = {i: dict.fromkeys("ABCDE", 0) for i in input1}
for row in input2:
for i1 in set(row[0]):
try:
input1_map[i1][row[1]] += 1
except KeyError:
pass
return [[i] + list(input1_map[i].values()) for i in input1]
This is of depth 2, not 3, so ends up being a lot faster for large inputs.
If row[0] does not contain duplicates, change set(row[0]) to just row[0].
If the try should never fail, remove it.
Choose better variable names. I don't know what things represent so my names here are pretty bad.
The last statement could be micro-optimised away if speed is too much of a concern, but I wouldn't expect it would matter much.
The list(input1_map[i].values()) is non-deterministic. Think about that.
For reference, here's the old version:
def function(input1, input2):
mapping = []
for item in input1:
risks = {"A":0, "B":0, "C":0, "D":0, "E":0}
temp = []
for row in input2:
if item in row[0]:
for key in risks.keys():
if row[1] == key:
risks[key] += 1
temp.append(item)
for key in risks.keys():
temp.append(risks[key])
mapping.append(temp)
return mapping
And it passes my test:
function(input1, input2) == newfunction(input1, input2)
#>>> True

Fast way to remove a few items from a list/queue

This is a follow up to a similar question which asked the best way to write
for item in somelist:
if determine(item):
code_to_remove_item
and it seems the consensus was on something like
somelist[:] = [x for x in somelist if not determine(x)]
However, I think if you are only removing a few items, most of the items are being copied into the same object, and perhaps that is slow. In an answer to another related question, someone suggests:
for item in reversed(somelist):
if determine(item):
somelist.remove(item)
However, here the list.remove will search for the item, which is O(N) in the length of the list. May be we are limited in that the list is represented as an array, rather than a linked list, so removing items will need to move everything after it. However, it is suggested here that collections.dequeue is represented as a doubly linked list. It should then be possible to remove in O(1) while iterating. How would we actually accomplish this?
Update:
I did some time testing as well, with the following code:
import timeit
setup = """
import random
random.seed(1)
b = [(random.random(),random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
"""
listcomp = """
c[:] = [x for x in b if tokeep(x)]
"""
filt = """
c = filter(tokeep, b)
"""
print "list comp = ", timeit.timeit(listcomp,setup, number = 10000)
print "filtering = ", timeit.timeit(filt,setup, number = 10000)
and got:
list comp = 4.01255393028
filtering = 3.59962391853

The list comprehension is the asymptotically optimal solution:
somelist = [x for x in somelist if not determine(x)]
It only makes one pass over the list, so runs in O(n) time. Since you need to call determine() on each object, any algorithm will require at least O(n) operations. The list comprehension does have to do some copying, but it's only copying references to the objects not copying the objects themselves.
Removing items from a list in Python is O(n), so anything with a remove, pop, or del inside the loop will be O(n**2).
Also, in CPython list comprehensions are faster than for loops.

If you need to remove item in O(1) you can use HashMaps

Since list.remove is equivalent to del list[list.index(x)], you could do:
for idx, item in enumerate(somelist):
if determine(item):
del somelist[idx]
But: you should not modify the list while iterating over it. It will bite you, sooner or later. Use filter or list comprehension first, and optimise later.

A deque is optimized for head and tail removal, not for arbitrary removal in the middle. The removal itself is fast, but you still have to traverse the list to the removal point. If you're iterating through the entire length, then the only difference between filtering a deque and filtering a list (using filter or a comprehension) is the overhead of copying, which at worst is a constant multiple; it's still a O(n) operation. Also, note that the objects in the list aren't being copied -- just the references to them. So it's not that much overhead.
It's possible that you could avoid copying like so, but I have no particular reason to believe this is faster than a straightforward list comprehension -- it's probably not:
write_i = 0
for read_i in range(len(L)):
L[write_i] = L[read_i]
if L[read_i] not in ['a', 'c']:
write_i += 1
del L[write_i:]

I took a stab at this. My solution is slower, but requires less memory overhead (i.e. doesn't create a new array). It might even be faster in some circumstances!
This code has been edited since its first posting
I had problems with timeit, I might be doing this wrong.
import timeit
setup = """
import random
random.seed(1)
global b
setup_b = [(random.random(), random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
# define and call to turn into psyco bytecode (if using psyco)
b = setup_b[:]
def listcomp():
c[:] = [x for x in b if tokeep(x)]
listcomp()
b = setup_b[:]
def filt():
c = filter(tokeep, b)
filt()
b = setup_b[:]
def forfilt():
marked = (i for i, x in enumerate(b) if tokeep(x))
shift = 0
for n in marked:
del b[n - shift]
shift += 1
forfilt()
b = setup_b[:]
def forfiltCheating():
marked = (i for i, x in enumerate(b) if (x[1] > .45) and (x[1] < .5))
shift = 0
for n in marked:
del b[n - shift]
shift += 1
forfiltCheating()
"""
listcomp = """
b = setup_b[:]
listcomp()
"""
filt = """
b = setup_b[:]
filt()
"""
forfilt = """
b = setup_b[:]
forfilt()
"""
forfiltCheating = '''
b = setup_b[:]
forfiltCheating()
'''
psycosetup = '''
import psyco
psyco.full()
'''
print "list comp = ", timeit.timeit(listcomp, setup, number = 10000)
print "filtering = ", timeit.timeit(filt, setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, setup, number = 10000)
print '\nnow with psyco \n'
print "list comp = ", timeit.timeit(listcomp, psycosetup + setup, number = 10000)
print "filtering = ", timeit.timeit(filt, psycosetup + setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, psycosetup + setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, psycosetup + setup, number = 10000)
And here are the results
list comp = 6.56407690048
filtering = 5.64738512039
forfilter = 7.31555104256
forfiltCheating = 4.8994679451
now with psyco
list comp = 8.0485959053
filtering = 7.79016900063
forfilter = 9.00477004051
forfiltCheating = 4.90830993652
I must be doing something wrong with psyco, because it is actually running slower.

elements are not copied by list comprehension
this took me a while to figure out. See the example code below, to experiment yourself with different approaches
code
You can specify how long a list element takes to copy and how long it takes to evaluate. The time to copy is irrelevant for list comprehension, as it turned out.
import time
import timeit
import numpy as np
def ObjectFactory(time_eval, time_copy):
"""
Creates a class
Parameters
----------
time_eval : float
time to evaluate (True or False, i.e. keep in list or not) an object
time_copy : float
time to (shallow-) copy an object. Used by list comprehension.
Returns
-------
New class with defined copy-evaluate performance
"""
class Object:
def __init__(self, id_, keep):
self.id_ = id_
self._keep = keep
def __repr__(self):
return f"Object({self.id_}, {self.keep})"
#property
def keep(self):
time.sleep(time_eval)
return self._keep
def __copy__(self): # list comprehension does not copy the object
time.sleep(time_copy)
return self.__class__(self.id_, self._keep)
return Object
def remove_items_from_list_list_comprehension(lst):
return [el for el in lst if el.keep]
def remove_items_from_list_new_list(lst):
new_list = []
for el in lst:
if el.keep:
new_list += [el]
return new_list
def remove_items_from_list_new_list_by_ind(lst):
new_list_inds = []
for ee in range(len(lst)):
if lst[ee].keep:
new_list_inds += [ee]
return [lst[ee] for ee in new_list_inds]
def remove_items_from_list_del_elements(lst):
"""WARNING: Modifies lst"""
new_list_inds = []
for ee in range(len(lst)):
if lst[ee].keep:
new_list_inds += [ee]
for ind in new_list_inds[::-1]:
if not lst[ind].keep:
del lst[ind]
if __name__ == "__main__":
ClassSlowCopy = ObjectFactory(time_eval=0, time_copy=0.1)
ClassSlowEval = ObjectFactory(time_eval=1e-8, time_copy=0)
keep_ratio = .8
n_runs_timeit = int(1e2)
n_elements_list = int(1e2)
lsts_to_tests = dict(
list_slow_copy_remove_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_copy_keep_many = [ClassSlowCopy(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_eval_remove_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
list_slow_eval_keep_many = [ClassSlowEval(ii, np.random.rand() > keep_ratio) for ii in range(n_elements_list)],
)
for lbl, lst in lsts_to_tests.items():
print()
for fct in [
remove_items_from_list_list_comprehension,
remove_items_from_list_new_list,
remove_items_from_list_new_list_by_ind,
remove_items_from_list_del_elements,
]:
lst_loc = lst.copy()
t = timeit.timeit(lambda: fct(lst_loc), number=n_runs_timeit)
print(f"{fct.__name__}, {lbl}: {t=}")
output
remove_items_from_list_list_comprehension, list_slow_copy_remove_many: t=0.0064229519994114526
remove_items_from_list_new_list, list_slow_copy_remove_many: t=0.006507338999654166
remove_items_from_list_new_list_by_ind, list_slow_copy_remove_many: t=0.006562008995388169
remove_items_from_list_del_elements, list_slow_copy_remove_many: t=0.0076057760015828535
remove_items_from_list_list_comprehension, list_slow_copy_keep_many: t=0.006243691001145635
remove_items_from_list_new_list, list_slow_copy_keep_many: t=0.007145451003452763
remove_items_from_list_new_list_by_ind, list_slow_copy_keep_many: t=0.007032064997474663
remove_items_from_list_del_elements, list_slow_copy_keep_many: t=0.007690364996960852
remove_items_from_list_list_comprehension, list_slow_eval_remove_many: t=1.2495998149970546
remove_items_from_list_new_list, list_slow_eval_remove_many: t=1.1657221479981672
remove_items_from_list_new_list_by_ind, list_slow_eval_remove_many: t=1.2621939050004585
remove_items_from_list_del_elements, list_slow_eval_remove_many: t=1.4632593330024974
remove_items_from_list_list_comprehension, list_slow_eval_keep_many: t=1.1344162709938246
remove_items_from_list_new_list, list_slow_eval_keep_many: t=1.1323430630000075
remove_items_from_list_new_list_by_ind, list_slow_eval_keep_many: t=1.1354237199993804
remove_items_from_list_del_elements, list_slow_eval_keep_many: t=1.3084568729973398

import collections
list1=collections.deque(list1)
for i in list2:
try:
list1.remove(i)
except:
pass
INSTEAD OF CHECKING IF ELEMENT IS THERE. USING TRY EXCEPT.
I GUESS THIS FASTER

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can i iterate over a large list quickly? - python

Related

How can you speed up repeated API calls?

iterate over several collections in parallel

Python: Concatenate similiar objects in List

Python nested loop comparing two lists and updating a dictionary

Fast way to remove a few items from a list/queue

Categories

Resources