I have several functions to create a list within a range. I'm using a time function I wrote, but it isn't timing the list functions I input. My list functions return the created list, currently. And the error is telling me, when I use time_it() that result cannot pass through.
# one of my list functions
def for_list(x):
x = range(x)
list_1 = []
for i in x:
i = str(i)
list_1 += i
return list_1
# timing function
def time_limit(tx):
start_time = process_time()
tx()
end_time = process_time()
time = (end_time - start_time)
print(f'{tx.__name__}, {time:.15f}')
SIZE = 10000
time_limit(for_list(SIZE))
Am I suppose to return something differently or is my time_limit() incorrect?
Inside the function time_limit() you are calling the for list twice.
It is called once when passed through and called again on the tx() line.
When removing that line it should look like this:
# one of my list functions
def for_list(x):
x = range(x)
list_1 = []
for i in x:
i = str(i)
list_1 += i
return list_1
# timing function
def time_limit(tx):
start_time = process_time()
end_time = process_time()
time = (end_time - start_time)
print(f'{tx.__name__}, {time:.15f}')
SIZE = 10000
time_limit(for_list(SIZE))
Related
I'm trying to make a code in python. I know beforehand the size of list.
Suppose list is l and its size is n.
So should I first initialise a list of length n and then using a for loop put l[i]=i for i from 0 to n.
OR
Is it better to first initialise empty list and then usinig a for loop append items into the list?
I know that both the two approaches will yield the same result.
I just mean to ask if one is faster or more efficient than the other.
Appending is faster because lists are designed to be inherently mutable. You can initialise a fixed-size list with code like
x = [None]*n
But this in itself takes time. I wrote a quick program to compare the speeds as follows.
import time
def main1():
x = [None]*10000
for i in range(10000):
x[i] = i
print(x)
def main2():
y = []
for i in range(10000):
y.append(i)
print(y)
start_time = time.time()
main1()
print("1 --- %s seconds ---" % (time.time() - start_time))
#1 --- 0.03682112693786621 seconds ---
start_time = time.time()
main2()
print("2 --- %s seconds ---" % (time.time() - start_time))
#2 --- 0.01464700698852539 seconds ---
As you can see, the first way is actually significantly slower on my computer!
If you now the size of your list your first approach is correct.
you can use numpy library to make your list with any size you want.
import numpy as np
X = np.zeros(n, y) #size of your list
#use your for loop here
for i in range(10):
a[i] = i+1
X.tolist()
I got the opposite result compared to another answer.
import time
count = 10000000
def main1():
x = [None]*count
for i in range(count):
x[i] = i
def main2():
y = []
for i in range(count):
y.append(i)
start_time = time.time()
main1()
end_time = time.time()
print("1:", end_time-start_time) # 0.488
start_time = time.time()
main2()
end_time = time.time()
print("2:", end_time-start_time) # 0.728
I am trying to iterate over a large list. I want a method that can iterate this list quickly. But it takes much time to iterate. Is there any method to iterate quickly or python is not built to do this.
My code snippet is :-
for i in THREE_INDEX:
if check_balanced(rc, pc):
print('balanced')
else:
rc, pc = equation_suffix(rc, pc, i)
Here THREE_INDEX has a length of 117649. It takes much time to iterate over this list, is there any method to iterate it quicker. But it takes around 4-5 minutes to iterate
equation_suffix functions:
def equation_suffix(rn, pn, suffix_list):
len_rn = len(rn)
react_suffix = suffix_list[: len_rn]
prod_suffix = suffix_list[len_rn:]
for re in enumerate(rn):
rn[re[0]] = add_suffix(re[1], react_suffix[re[0]])
for pe in enumerate(pn):
pn[pe[0]] = add_suffix(pe[1], prod_suffix[pe[0]])
return rn, pn
check_balanced function:
def check_balanced(rl, pl):
total_reactant = []
total_product = []
reactant_name = []
product_name = []
for reactant in rl:
total_reactant.append(separate_num(separate_brackets(reactant)))
for product in pl:
total_product.append(separate_num(separate_brackets(product)))
for react in total_reactant:
for key in react:
val = react.get(key)
val_dict = {key: val}
reactant_name.append(val_dict)
for prod in total_product:
for key in prod:
val = prod.get(key)
val_dict = {key: val}
product_name.append(val_dict)
reactant_name = flatten_dict(reactant_name)
product_name = flatten_dict(product_name)
for elem in enumerate(reactant_name):
val_r = reactant_name.get(elem[1])
val_p = product_name.get(elem[1])
if val_r == val_p:
if elem[0] == len(reactant_name) - 1:
return True
else:
return False
I believe the reason why "iterating" the list take a long time is due to the methods you are calling inside the for loop. I took out the methods just to test the speed of the iteration, it appears that iterating through a list of size 117649 is very fast. Here is my test script:
import time
start_time = time.time()
new_list = [(1, 2, 3) for i in range(117649)]
end_time = time.time()
print(f"Creating the list took: {end_time - start_time}s")
start_time = time.time()
for i in new_list:
pass
end_time = time.time()
print(f"Iterating the list took: {end_time - start_time}s")
Output is:
Creating the list took: 0.005337953567504883s
Iterating the list took: 0.0035648345947265625s
Edit: time() returns second.
General for loops aren't an issue, but using them to build (or rebuild) lists is usually slower than using list comprehensions (or in some cases, map/filter, though those are advanced tools that are often a pessimization).
Your functions could be made significantly simpler this way, and they'd get faster to boot. Example rewrites:
def equation_suffix(rn, pn, suffix_list):
prod_suffix = suffix_list[len(rn):]
# Change `rn =` to `rn[:] = ` if you must modify the caller's list as in your
# original code, not just return the modified list (which would be fine in your original code)
rn = [add_suffix(r, suffix) for r, suffix in zip(rn, suffix_list)] # No need to slice suffix_list; zip'll stop when rn exhausted
pn = [add_suffix(p, suffix) for p, suffix in zip(pn, prod_suffix)]
return rn, pn
def check_balanced(rl, pl):
# These can be generator expressions, since they're iterated once and thrown away anyway
total_reactant = (separate_num(separate_brackets(reactant)) for reactant in rl)
total_product = (separate_num(separate_brackets(product)) for product in pl)
reactant_name = []
product_name = []
# Use .items() to avoid repeated lookups, and concat simple listcomps to reduce calls to append
for react in total_reactant:
reactant_name += [{key: val} for key, val in react.items()]
for prod in total_product:
product_name += [{key: val} for key, val in prod.items()]
# These calls are suspicious, and may indicate optimizations to be had on prior lines
reactant_name = flatten_dict(reactant_name)
product_name = flatten_dict(product_name)
for i, (elem, val_r) in enumerate(reactant_name.items()):
if val_r == product_name.get(elem):
if i == len(reactant_name) - 1:
return True
else:
# I'm a little suspicious of returning False the first time a single
# key's value doesn't match. Either it's wrong, or it indicates an
# opportunity to write short-circuiting code that doesn't have
# to fully construct reactant_name and product_name when much of the time
# there will be an early mismatch
return False
I'll also note that using enumerate without unpacking the result is going to get worse performance, and more cryptic code; in this case (and many others), enumerate isn't needed, as listcomps and genexprs can accomplish the same result without knowing the index, but when it is needed, always unpack, e.g. for i, elem in enumerate(...): then using i and elem separately will always run faster than for packed in enumerate(...): and using packed[0] and packed[1] (and if you have more useful names than i and elem, it'll be much more readable to boot).
Good morning, I have developed a simple merge-sort algorithm which I want to compare its performance when it is parallelized and not parallelized.
First, I am generating a list of numbers to sort and check how long it takes for the merge-sort to sort the list.
The next thing I want to do is passing the list of numbers into sc.parallelize() and converting the list to RDD followed by passing the merge-sort function into mapPartitions() and then collect().
import random
import time
from pyspark import SparkContext
def execute_merge_sort(generated_list):
start_time = time.time()
sorted_list = merge_sort(generated_list)
elapsed = time.time() - start_time
print('Simple merge sort: %f sec' % elapsed)
return sorted_list
def generate_list(length):
N = length
generated_list = [random.random() for num in range(N)]
return generated_list
def merging(left_side, right_side):
result = []
i = j = 0
while i < len(left_side) and j < len(right_side):
if left_side[i] <= right_side[j]:
result.append(left_side[i])
i += 1
else:
result.append(right_side[j])
j += 1
if i == len(left_side):
result.extend(right_side[j:])
else:
result.extend(left_side[i:])
return result
def merge_sort(generated_list):
if len(generated_list) <= 1:
return generated_list
middle_value = len(generated_list) // 2
sorted_list = merging(merge_sort(generated_list[:middle_value]), merge_sort(generated_list[middle_value:]))
return sorted_list
def is_sorted(num_array):
for i in range(1, len(num_array)):
if num_array[i] < num_array[i - 1]:
return False
return True
generate_list = generate_list(500000)
sorted_list = execute_merge_sort(generate_list)
sc = SparkContext()
rdd = sc.parallelize(generate_list).mapPartitions(execute_merge_sort).collect()
When I'm executing this sc.parallelize(generate_list).mapPartitions(execute_merge_sort).collect() I'm getting the following error:
File "<ipython-input-15-1b7974b4fa56>", line 7, in execute_merge_sort
File "<ipython-input-15-1b7974b4fa56>", line 36, in merge_sort
TypeError: object of type 'itertools.chain' has no len()
Any help would be appreciated. Thanks in advance.
I figured out how to solve the problem of TypeError: 'float' object is not iterable.
This can be solved by flattening the data using flatMap(lambda x: x) and calling glom() in order to wrap the list and make it executable by the function execute_merge_sort.
By executing the following line, the returned result is a list containing sorted lists.
sc.parallelize(random_list_of_lists).flatMap(lambda x: x).glom().mapPartitions(execute_merge_sort_rdd).collect()
I am having a problem with measuring the time of a function.
My function is a "linear search":
def linear_search(obj, item,):
for i in range(0, len(obj)):
if obj[i] == item:
return i
return -1
And I made another function that measures the time 100 times and adds all the results to a list:
def measureTime(a):
nl=[]
import random
import time
for x in range(0,100): #calculating time
start = time.time()
a
end =time.time()
times=end-start
nl.append(times)
return nl
When I'm using measureTime(linear_search(list,random.choice(range(0,50)))), the function always returns [0.0].
What can cause this problem? Thanks.
you are actually passing the result of linear_search into function measureTime, you need to pass in the function and arguments instead for them to be execute inside measureTime function like #martijnn2008 answer
Or better wise you can consider using timeit module to to the job for you
from functools import partial
import timeit
def measureTime(n, f, *args):
# return average runtime for n number of times
# use a for loop with number=1 to get all individual n runtime
return timeit.timeit(partial(f, *args), number=n)
# running within the module
measureTime(100, linear_search, list, random.choice(range(0,50)))
# if running interactively outside the module, use below, lets say your module name mymodule
mymodule.measureTime(100, mymodule.linear_search, mymodule.list, mymodule.random.choice(range(0,50)))
Take a look at the following example, don't know exactly what you are trying to achieve so I guessed it ;)
import random
import time
def measureTime(method, n, *args):
start = time.time()
for _ in xrange(n):
method(*args)
end = time.time()
return (end - start) / n
def linear_search(lst, item):
for i, o in enumerate(lst):
if o == item:
return i
return -1
lst = [random.randint(0, 10**6) for _ in xrange(10**6)]
repetitions = 100
for _ in xrange(10):
item = random.randint(0, 10**6)
print 'average runtime =',
print measureTime(linear_search, repetitions, lst, item) * 1000, 'ms'
Running on Python 3.3
I am attempting to create an efficient algorithm to pull all of the similar elements between two lists. The problem is two fold. First, I can not seem to find any algorithms online. Second, there should be a more efficient way.
By 'similar elements', I mean two elements that are equal in value (be it string, int, whatever).
Currently, I am taking a greedy approach by:
Sorting the lists that are being compared,
Comparing each element in the shorter list to each element in the larger list,
Since the largeList and smallList are sorted we can save the last index that was visited,
Continue from the previous index (largeIndex).
Currently, the run-time seems to be average of O(nlog(n)). This can be seen by running the test cases listed after this block of code.
Right now, my code looks as such:
def compare(small,large,largeStart,largeEnd):
for i in range(largeStart, largeEnd):
if small==large[i]:
return [1,i]
if small<large[i]:
if i!=0:
return [0,i-1]
else:
return [0, i]
return [0,largeStart]
def determineLongerList(aList, bList):
if len(aList)>len(bList):
return (aList, bList)
elif len(aList)<len(bList):
return (bList, aList)
else:
return (aList, bList)
def compareElementsInLists(aList, bList):
import time
startTime = time.time()
holder = determineLongerList(aList, bList)
sameItems = []
iterations = 0
##########################################
smallList = sorted(holder[1])
smallLength = len(smallList)
smallIndex = 0
largeList = sorted(holder[0])
largeLength = len(largeList)
largeIndex = 0
while (smallIndex<smallLength):
boolean = compare(smallList[smallIndex],largeList,largeIndex,largeLength)
if boolean[0]==1:
#`compare` returns 1 as True
sameItems.append(smallList[smallIndex])
oldIndex = largeIndex
largeIndex = boolean[1]
else:
#else no match and possible new index
oldIndex = largeIndex
largeIndex = boolean[1]
smallIndex+=1
iterations =largeIndex-oldIndex+iterations+1
print('RAN {it} OUT OF {mathz} POSSIBLE'.format(it=iterations, mathz=smallLength*largeLength))
print('RATIO:\t\t'+str(iterations/(smallLength*largeLength))+'\n')
return sameItems
, and here are some test cases:
def testLargest():
import time
from random import randint
print('\n\n******************************************\n')
start_time = time.time()
lis = []
for i in range(0,1000000):
ran = randint(0,1000000)
lis.append(ran)
lis2 = []
for i in range(0,1000000):
ran = randint(0,1000000)
lis2.append(ran)
timeTaken = time.time()-start_time
print('CREATING LISTS TOOK:\t\t'+str(timeTaken))
print('\n******************************************')
start_time = time.time()
c = compareElementsInLists(lis, lis2)
timeTaken = time.time()-start_time
print('COMPARING LISTS TOOK:\t\t'+str(timeTaken))
print('NUMBER OF SAME ITEMS:\t\t'+str(len(c)))
print('\n******************************************')
#testLargest()
'''
One rendition of testLargest:
******************************************
CREATING LISTS TOOK: 21.009342908859253
******************************************
RAN 999998 OUT OF 1000000000000 POSSIBLE
RATIO: 9.99998e-07
COMPARING LISTS TOOK: 13.99990701675415
NUMBER OF SAME ITEMS: 632328
******************************************
'''
def testLarge():
import time
from random import randint
print('\n\n******************************************\n')
start_time = time.time()
lis = []
for i in range(0,1000000):
ran = randint(0,100)
lis.append(ran)
lis2 = []
for i in range(0,1000000):
ran = randint(0,100)
lis2.append(ran)
timeTaken = time.time()-start_time
print('CREATING LISTS TOOK:\t\t'+str(timeTaken))
print('\n******************************************')
start_time = time.time()
c = compareElementsInLists(lis, lis2)
timeTaken = time.time()-start_time
print('COMPARING LISTS TOOK:\t\t'+str(timeTaken))
print('NUMBER OF SAME ITEMS:\t\t'+str(len(c)))
print('\n******************************************')
testLarge()
If you are just searching for all elements which are in both lists, you should use data types meant to handle such tasks. In this case, sets or bags would be appropriate. These are internally represented by hashing mechanisms which are even more efficient than searching in sorted lists.
(collections.Counter represents a suitable bag.)
If you do not care for doubled elements, then sets would be fine.
a = set(listA)
print a.intersection(listB)
This will print all elements which are in listA and in listB. (Without doubled output for doubled input elements.)
import collections
a = collections.Counter(listA)
b = collections.Counter(listB)
print a & b
This will print how many elements are how often in both lists.
I didn't make any measuring but I'm pretty sure these solutions are way faster than your self-made attempts.
To convert a counter into a list of all represented elements again, you can use list(c.elements()).
Using ipython magic for timeit but it doesn't compare favourably with just a standard set() intersection.
Setup:
import random
alist = [random.randint(0, 100000) for _ in range(1000)]
blist = [random.randint(0, 100000) for _ in range(1000)]
Compare Elements:
%%timeit -n 1000
compareElementsInLists(alist, blist)
1000 loops, best of 3: 1.9 ms per loop
Vs Set Intersection
%%timeit -n 1000
set(alist) & set(blist)
1000 loops, best of 3: 104 µs per loop
Just to make sure we get the same results:
>>> compareElementsInLists(alist, blist)
[8282, 29521, 43042, 47193, 48582, 74173, 96216, 98791]
>>> set(alist) & set(blist)
{8282, 29521, 43042, 47193, 48582, 74173, 96216, 98791}