Python - large list efficiency [duplicate] - python

This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(31 answers)
Closed 8 years ago.
I have a function, unique(a) that takes a list, a, of numbers and returns only one of each value. At the same time, it maintains the order of the list. I also have a function, big_list(n) that generates a list of len(n).
The reason to why I reverse the direction of the list is so that when removing values, It removes them from the back of the original list, just to make the modified list more clean and readable when comparing it to the original list.
The function works when I have a relatively small length of the list I'm creating, but when I get to larger lengths, such as 1,000,000 for ex, the execution time takes FOREVER.
If anyone can help me by making my function a lot faster, that would be great!
FYI : I need to use a set somewhere in the function for the assignment I am working on. I still need to remove list items from the back as well.
Thanks in advance!
def big_list(n) :
# Create a list of n 'random' values in the range [-n/2,n/2]
return [ randrange(-n//2, n//2) for i in range(n) ]
def unique(a) :
a = a[::-1]
b = set(a)
for i in b :
while a.count(i) != 1 :
a.remove(i)
a.count(i)
a = a[::-1]
return a

Your algorithm is doing a lot of extra work moving elements around. Consider:
def unique(a):
b = set()
r = []
for x in a:
if x not in b:
r.append(x)
b.insert(x)
return r

Every time you call a.count(i) it loops over the entire list to count up the occurrences. This is an O(n) operation which you repeat over and over. When you factor in the O(n) runtime of the outer for i in b: loop the overall algorithmic complexity is O(n2).
It doesn't help that there's a second unnecessary a.count(i) inside the while loop. That call doesn't do anything but chew up time.
This entire problem can be done in O(n) time. Your best bet would be to avoid list.count() altogether and figure out how you can loop over the list and count elements yourself. If you're clever you can do everything in a single pass, no nested loops (or implicit nested loops) required.

You can find a thorough benchmark of "unique" functions at this address. My personal favorite is
def unique(seq):
# Order preserving
seen = set()
return [x for x in seq if x not in seen and not seen.add(x)]
because it's the fastest and it preserves order, while using sets smartly. I think it's f7, it's given in the comments.

Related

double list in return statement. need explanation in python

So I was trying to complete this kata on code wars and I ran across an interesting solution. The kata states:
"Given an array of integers, find the one that appears an odd number of times.
There will always be only one integer that appears an odd number of times."
and one of the solutions for it was:
def find_it(seq):
return [x for x in seq if seq.count(x) % 2][0]
My question is why is there [0] at the end of the statement. I tried playing around with it and putting [1] instead and when testing, it passed some tests but not others with no obvious pattern.
Any explanation will be greatly appreciated.
The first brackets are a list comprehension, the second is indexing the resulting list. It's equivalent to:
def find_it(seq):
thelist = [x for x in seq if seq.count(x) % 2]
return thelist[0]
The code is actually pretty inefficient, because it builds the whole list just to get the first value that passed the test. It could be implemented much more efficiently with next + a generator expression (like a listcomp, but lazy, with the values produced exactly once, and only on demand):
def find_it(seq):
return next(x for x in seq if seq.count(x) % 2)
which would behave the same, with the only difference being that the exception raised if no values passed the test would be IndexError in the original code, and StopIteration in the new code, and it would operate more efficiently by stopping the search the instant a value passed the test.
Really, you should just give up on using the .count method and count all the elements in a single pass, which is truly O(n) (count solutions can't be, because count itself is O(n) and must be called a number of times roughly proportionate to the input size; even if you dedupe it, in the worst case scenario all elements appear twice and you have to call count n / 2 times):
from collections import Counter
def find_it(it):
# Counter(it) counts all items of any iterable, not just sequence,
# in a single pass, and since 3.6, it's insertion order preserving,
# so you can just iterate the items of the result and find the first
# hit cheaply
return next(x for x, cnt in Counter(it).items() if cnt % 2)
That list comprehension yields a sequence of values that occur an odd number of times. The first value of that sequence will occur an odd number of times. Therefore, getting the first value of that sequence (via [0]) gets you a value that occurs an odd number of times.
Happy coding!
That code [x for x in seq if seq.count(x) % 2] return the list which has 1 value appears in input list an odd numbers of times.
So, to make the output as number, not as list, he indicates 0th index, so it returns 0th index of list with one value.
There is a nice another answer here by ShadowRanger, so I won't duplicate it providing partially only another phrasing of the same.
The expression [some_content][0] is not a double list. It is a way to get elements out of the list by using indexing. So the second "list" is a syntax for choosing an element of a list by its index (i.e. the position number in the list which begins in Python with zero and not as sometimes intuitively expected with one. So [0] addresses the first element in the list to the left of [0].
['this', 'is', 'a', 'list'][0] <-- this an index of 'this' in the list
print( ['this', 'is', 'a', 'list'][0] )
will print
this
to the stdout.
The intention of the function you are showing in your question is to return a single value and not a list.
So to get the single value out of the list which is built by the list comprehension the index [0] is used. The index guarantees that the return value result is taken out of the list [result] using [result][0] as
[result][0] == result.
The same function could be also written using a loop as follows:
def find_it(seq):
for x in seq:
if seq.count(x) % 2 != 0:
return x
but using a list comprehension instead of a loop makes it in Python mostly more effective considering speed. That is the reason why it sometimes makes sense to use a list comprehension and then unpack the found value(s) out of the list. It will be in most cases faster than an equivalent loop, but ... not in this special case where it will slow things down as mentioned already by ShadowRanger.
It seems that your tested sequences not always have only one single value which occurs an odd number of times. This will explain why you experience that sometimes the index [1] works where it shouldn't because it was stated that the tested seq will contain one and only one such value.
What you experienced looking at the function in your question is a failed attempt to make it more effective by using a list comprehension instead of a loop. The actual improvement can be achieved but by using a generator expression and another way of counting as shown in the answer by ShadowRanger:
from collections import Counter
def find_it(it):
return next(x for x, cnt in Counter(it).items() if cnt % 2)

python:find a ascending/decreasing elements in a list without using loop

Suppose I have a list, I need to find two lists in list 'a', in a increasing/decreasing order, respectively.
a=[4,2,6,5,2,6,9,7,10,1,2,1]
The output should be a list :
b=[4,6,9,10] # in an ascending order
and
c=[4,2,1] # in a decreasing order , c[-1] is the first '1' in list a, c[1] is the first '2' in list a.
Is there a possible way to do it without using loop (I have solved it using a loop)? As a have a large dataset, using loop would be slow. So I am looking for a faster way if possible. Thanks a lot.
You should use the .sort() method. If no parameters are entered then it automatically sorts the list in ascending order. For descending just do .sort(reverse=True).
b = a.sort()
#ascending
c = a.sort(reverse=True)
#descending
I hope this is what you are looking for.
Could you precise your problem :
Do you want to find the longest ascending/decreasing sub list ? In this case your problem has something to do with dynamic programming, and I think you'll need more than one loop...
If you don't want your sub-lists to be maximal, maybe you can set a limit on the length of your lists b and c to do it faster.
If you have other hypothesis on your list, for instance, you know its max and its min, you can stop your calcul when you reach the max (only if you want your lists to be strictly decreasing/ascending).
I hope it is usefull for you :)
To precise my question, the following is how I get these two lists:
b=[];c=[];
for i in range(len(a)):
if i==0:
b.append(a[i])
elif a[i]>b[-1]:
b.append(a[i])
for i in range(len(a)):
if i==0:
c.append(a[i])
elif a[i]<c[-1]:
c.append(a[i])

Python list comprehension with function returning a list

I am trying to call a function for a range of values. That function returns a list. The goal is to combine all the returned lists into a list.
Here is a test function that returns a list:
def f(i):
return [chr(ord('a') + i), chr(ord('b') + i), chr(ord('c') + i)]
Here is a list comprehension that does what I need that I came up with after some experimentation and a lot of StackOverflow reading:
y = [a for x in (f(i) for i in range(5)) for a in x]
However, I do not understand why and how it works when a simple loop that solves this problem looks like this:
y = []
for x in (f(i) for i in range(5)):
for a in x:
y.append(a)
Can someone explain?
Thanks!
This may be a better illustration, following Bendik Knapstad's answer:
[
a # element added to the list
for x in (f(i) for i in range(5)) # outer loop
for a in x # inner loop that assigns to element to be added to the list
]
Answering to this:
However, I do not understand why and how it works (list comprehensions) when a simple loop that solves this problem looks like this (for loops)
Yes, they both can work but there are some differences.
First, with list comprehensions, you are able to generate a list (because that's the output) after assigning it to a variable. Whereas in a for loop you must have the list created (regardless if it's empty or not) if you wish to use append later on perform any updating/deleting/re-indexing operation.
Second, simplicity. While for loops might be used in complex tasks where you need to apply a wide variety of functions, and maybe use RNGs, list comprehensions are always preferrable when it comes to dealing with lists and performing rather 'basic' operations (of course you can start nesting them and turn them into something more complex).
Third and finally, speed. List comprehensions tend to perform baster when compared to for loops for simple tasks.
More in-depth information regarding listcomp and for loops can be read in python's official tutorial. https://docs.python.org/3/tutorial/datastructures.html
Nested list comprehensions are hard to read.
But if you look at the two expressions you'll se that they contain the same logic.
In the list comprehension the first a is the part you want to keep in the list. It's equal to the y.append(a) in the for loop.
The for x in (f(i) for i in range(5)) is the same as in your for loop
The same goes for the next line for a in x
So for x in (f(i) for i in range(5)) creates a list x
So if we had the list x already we could write
y= [a for a in x]

Efficient way to add items of list to another list with a map

I have a two lists of floats L1, L2, or lengths a, b respectively. I also have a list F, of length a, whose values are integers of the range [-1,b-1]. I want to update L2 in the following way:
for i in filter(lambda x: F[x]+1, range(len(F))):
L2[F[i]] += L1[i]
Basically, F is a function of L1's index. For each index, i, of L1, if F[i] = -1, we do nothing, otherwise, we take L1's i-th item and add it to L2's F[i]-th item.
I am doing this in a program where the lengths of a and b will grow exponentially as I make my results more accurate. (also, F is roughly 50% -1's) I realize this already takes linear time, but I was wondering if there was some way to improve the constant faster, possibly through list/sum comprehension? Or, if I will need to know the contents of L2 after multiple updates, is there a practical way to store these updates, and do them all at once in a faster manner?
What about the case where I have two lists of lists LL1, LL2, each containing c lists of lengths a, and b respectively, with just one list/map F? If I want LL1[i] to update LL2[i] for all i in [0,c-1], is there a smart way to do this, or is there nothing better than doing each i one by one?
Clarification: converting to numpy structures is completely acceptable, I just lack prior-knowledge about how to utilize numpy efficiently.
Your code is fairly efficient as is. The only improvement that can be made as far as I can see comes from avoiding using a lambda function, which increases overhead to call the function per iteration. Instead, you can use the enumerate function to generate indices and values of F to iterate over, and filter the value of F with a simple if statement:
for i, j in enumerate(F):
if j != -1:
L2[j] += L1[i]

Question on a solution from Google python class day

Hey,
I'm trying to learn a bit about Python so I decided to follow Google's tutorial. Anyway I had a question regarding one of their solution for an exercise.
Where I did it like this way.
# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
# +++your code here+++
return sorted(list1 + list2)
However they did it in a more complicated way. So is Google's solution quicker? Because I noticed in the comment lines that the solution should work in "linear" time, which mine probably isn't?
This is their solution
def linear_merge(list1, list2):
# +++your code here+++
# LAB(begin solution)
result = []
# Look at the two lists so long as both are non-empty.
# Take whichever element [0] is smaller.
while len(list1) and len(list2):
if list1[0] < list2[0]:
result.append(list1.pop(0))
else:
result.append(list2.pop(0))
# Now tack on what's left
result.extend(list1)
result.extend(list2)
return result
this could be another soln?
#
def linear_merge(list1, list2):
tmp = []
while len(list1) and len(list2):
#print list1[-1],list2[-1]
if list1[-1] > list2[-1]:
tmp.append(list1.pop())
else:
tmp.append(list2.pop())
#print "tmp = ",tmp
#print list1,list2
tmp = tmp + list1
tmp = tmp + list2
tmp.reverse()
return tmp
Yours is not linear, but that doesn't mean it's slower. Algorithmic complexity ("big-oh notation") is often only a rough guide and always only tells one part of the story.
However, theirs isn't linear either, though it may appear to be at first blush. Popping from a list requires moving all later items, so popping from the front requires moving all remaining elements.
It is a good exercise to think about how to make this O(n). The below is in the same spirit as the given solution, but avoids its pitfalls while generalizing to more than 2 lists for the sake of exercise. For exactly 2 lists, you could remove the heap handling and simply test which next item is smaller.
import heapq
def iter_linear_merge(*args):
"""Yield non-decreasing items from sorted a and b."""
# Technically, [1, 1, 2, 2] isn't an "increasing" sequence,
# but it is non-decreasing.
nexts = []
for x in args:
x = iter(x)
for n in x:
heapq.heappush(nexts, (n, x))
break
while len(nexts) >= 2:
n, x = heapq.heappop(nexts)
yield n
for n in x:
heapq.heappush(nexts, (n, x))
break
if nexts: # Degenerate case of the heap, not strictly required.
n, x = nexts[0]
yield n
for n in x:
yield n
Instead of the last if-for, the while loop condition could be changed to just "nexts", but it is probably worthwhile to specially handle the last remaining iterator.
If you want to strictly return a list instead of an iterator:
def linear_merge(*args):
return list(iter_linear_merge(*args))
With mostly-sorted data, timsort approaches linear. Also, your code doesn't have to screw around with the lists themselves. Therefore, your code is possibly just a bit faster.
But that's what timing is for, innit?
I think the issue here is that the tutorial is illustrating how to implement a well-known algorithm called 'merge' in Python. The tutorial is not expecting you to actually use a library sorting function in the solution.
sorted() is probably O(nlgn); then your solution cannot be linear in the worst case.
It is important to understand how merge() works because it is useful in many other algorithms. It exploits the fact the input lists are individually sorted, moving through each list sequentially and selecting the smallest option. The remaining items are appended at the end.
The question isn't which is 'quicker' for a given input case but about which algorithm is more complex.
There are hybrid variations of merge-sort which fall back on another sorting algorithm once the input list size drops below a certain threshold.

Categories

Resources