Splitting Up Lists of Lists by Length in Python - python

Given the following problem, what is the most efficient (or reasonably efficient) way to do this in Python:
Problem. Given a list of lists,
L = [list_0, list_1, list_2, list_3, ..., list_n]
where len(list_i) <= 3, let's say, for each list inside of L. How can we split up L into L_1, L_2, L_3, where L_1 has only length 1 lists, L_2 has only length 2 lists, and L_3 has only length 3 lists?
Potential Solutions. Here's the best I could do; I've also included a sample set here as well. It runs in around 8.6 seconds on my PC.
import time
# These 4 lines make a large sample list-of-list to test on.
asc_sample0 = [[i] for i in range(500)]
asc_sample1 = [[i,j] for i in range(500) for j in range(20)]
asc_sample2 = [[i,j,k] for i in range(20) for j in range(10) for k in range(20)]
asc_sample = asc_sample0 + asc_sample1 + asc_sample2
start = time.clock()
cells0 = [i for i in asc if len(i) == 1]
cells1 = [i for i in asc if len(i) == 2]
cells2 = [i for i in asc if len(i) == 3]
print time.clock() - start
I also attempted to "pop" elements off and append to lists cells0, etc., but this took significantly longer. I also attempted to append and then remove that element so I could get through in one loop which worked okay when there were, say, 10^10 lists of size 1, but only a few of size 2 and 3, but, in general, it was not efficient.
I'd mostly appreciate some neat ideas. I know that one of the answers will most likely be "Write this in C", but for now I'd just like to look at Python solutions for this.

An old fashioned solution might work better here:
cells0, cells1, cells2 = [], [], []
for lst in asc_sample:
n = len(lst)
if n == 1:
cells0.append(lst)
elif n == 2:
cells1.append(lst)
else:
cells2.append(lst)

This is definitely one of the best because it runs in parallel. Another thing that you should look at though is the itertools.groupby and the built-in filter method.

result = dict()
for lst in L:
result.setdefault(len(lst), []).append(lst)
print result
Output
{
1: [[0], [1], [2], [3]],
2: [[0, 0], [0, 1], [0, 2]],
3: [[0, 0, 0], [0, 0, 1], [0, 0, 2]]
}

Indexing a list/tuple should be faster than doing key lookups. This is about 30% faster than the version given in the question
cells = [],[],[],[] # first list here isn't used, but it's handy for the second version
for i in asc:
cells[len(i)].append(i)
Slightly faster again by extracting the append methods (On larger lists this is almost twice as fast as the OP)
cells = [],[],[],[]
appends = [x.append for x in cells]
for i in asc:
appends[len(i)](i)

Related

lookup function for repeated elements in an array

I have a list-like python object of positive integers and I want to get which locations on that list have repeated values. For example
if input is [0,1,1] the function should return [1,2] because the value of 1, which is the element at position 1 and 2 of the input array appears twice. Similarly:
[0,13,13] should return [[1, 2]]
[0,1,2,1,3,4,2,2] should return [[1, 3], [2, 6, 7]] because 1 appears twice, at positions [1, 3] of the input array and 2 appears 3 times at positions [2, 6, 7]
[1, 2, 3] should return an empty array []
What I have written is this:
def get_locations(labels):
out = []
label_set = set(labels)
for label in list(label_set):
temp = [i for i, j in enumerate(labels) if j == label]
if len(temp) > 1:
out.append(np.array(temp))
return np.array(out)
While it works ok for small input arrays it gets too slow when size grows. For instance, The code below on my pc, skyrockets from 0.14secs when n=1000 to 12secs when n = 10000
from timeit import default_timer as timer
start = timer()
n = 10000
a = np.arange(n)
b = np.append(a, a[-1]) # append the last element to the end
out = get_locations(b)
end = timer()
print(out)
print(end - start) # Time in seconds
How can I speed this up please? Any ideas highly appreciated
Your nested loop results in O(n ^ 2) in time complexity. You can instead create a dict of lists to map indices to each label, and extract the sub-lists of the dict only if the length of the sub-list is greater than 1, which reduces the time complexity to O(n):
def get_locations(labels):
positions = {}
for index, label in enumerate(labels):
positions.setdefault(label, []).append(index)
return [indices for indices in positions.values() if len(indices) > 1]
so that get_locations([0, 1, 2, 1, 3, 4, 2, 2]) returns:
[[1, 3], [2, 6, 7]]
Your code is slow because of the nested for-loop. You can solve this in a more efficient way by using another data structure:
from collections import defaultdict
mylist = [0,1,2,1,3,4,2,2]
output = defaultdict(list)
# Loop once over mylist, store the indices of all unique elements
for i, el in enumerate(mylist):
output[el].append(i)
# Filter out elements that occur only once
output = {k:v for k, v in output.items() if len(v) > 1}
This produces the following output for your example b:
{1: [1, 3], 2: [2, 6, 7]}
You can turn this result into the desired format:
list(output.values())
> [[1, 3], [2, 6, 7]]
Know however that this relies on the dictionary being insertion ordered, which is only the case as of python 3.6.
Heres a code i implemented. It runs in linear time:
l = [0,1,2,1,3,4,2,2]
dict1 = {}
for j,i in enumerate(l): # O(n)
temp = dict1.get(i) # O(1) most cases
if not temp:
dict1[i] = [j]
else:
dict1[i].append(j) # O(1)
print([item for item in dict1.values() if len(item) > 1]) # O(n)
Output:
[[1, 3], [2, 6, 7]]
This is essentially a time-complexity issue. Your algorithm has nested for loops that iterate through the list twice, so the time complexity is of the order of n^2, where n is the size of the list. So when you multiply the size of the list by 10 (from 1,000 to 10,000), you see an approximate time increase of 10^2 = 100. This is why it goes from 0.14 s to 12 s.
Here is a simple solution with no extra libraries required:
def get_locations(labels):
locations = {}
for index, label in enumerate(labels):
if label in locations:
locations[label].append(index)
else:
locations[label] = [index]
return [locations[i] for i in locations if len(locations[i]) > 1]
Since the for loops are not nested, the time complexity is approximately 2n, so you should see about a 4-times increase in time whenever the problem size is doubled.
you can try using "Counter" function from "collections" module
from collections import Counter
list1 = [1,1,2,3,4,4,4]
Counter(list1)
you will get an output similar to this
Counter({4: 3, 1: 2, 2: 1, 3: 1})

How to find peaks of minimal length efficiently?

I have list/array of integers, call a subarray a peak if it goes up and then goes down. For example:
[5,5,4,5,4]
contains
[4,5,4]
which is a peak.
Also consider
[6,5,4,4,4,4,4,5,6,7,7,7,7,7,6]
which contains
[6,7,7,7,7,7,6]
which is a peak.
The problem
Given an input list, I would like to find all the peaks contained in it of minimal length and report them. In the example above, [5,6,7,7,7,7,7,6] is also a peak but we remove the first element and it remains a peak so we don't report it.
So for input list:
L = [5,5,5,5,4,5,4,5,6,7,8,8,8,8,8,9,9,8]
we would return
[4,5,4] and [8,9,9,8] only.
I am having problems devising a nice algorithm for this. Any help would be hugely appreciated.
Using itertools
Here is a short solution using itertools.groupby to detect peaks. The groups identifying peaks are then unpacked to yield the actual sequence.
from itertools import groupby, islice
l = [1, 2, 1, 2, 2, 0, 0]
fst, mid, nxt = groupby(l), islice(groupby(l), 1, None), islice(groupby(l), 2, None)
peaks = [[f[0], *m[1], n[0]] for f, m, n in zip(fst, mid, nxt) if f[0] < m[0] > n[0]]
print(peaks)
Output
[[1, 2, 1], [1, 2, 2, 0]]
Using a loop (faster)
The above solution is elegant but since three instances of groupby are created, the list is traversed three times.
Here is a solution using a single traversal.
def peaks(lst):
first = 0
last = 1
while last < len(lst) - 1:
if lst[first] < lst[last] == lst[last+1]:
last += 1
elif lst[first] < lst[last] > lst[last+1]:
yield lst[first:last+2]
first = last + 1
last += 2
else:
first = last
last += 1
l = [1, 2, 1, 2, 2, 0, 0]
print(list(peaks(l)))
Output
[[1, 2, 1], [1, 2, 2, 0]]
Notes on benchmark
Upon benchmarking with timeit, I noticed an increase in performance of about 20% for the solution using a loop. For short lists the overhead of groupby could bring that number up to 40%. The benchmark was done on Python 3.6.

Most Pythonic way to iteratively build up a list? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I was trying to do something in Python that uses the following general procedure, and I want to know what the best way to approch this is.
First, an initialization step:
Create an item M.
Create a list L and add M to L.
Second, loop through the following:
Create a new item by modifying the last item added to L.
Add the new item to L.
As a simple example, say I want to create a list of lists where the nth list contains the numbers from 1 to n. I could use the following (silly) procedure.
Initially M is [1] and L=[[1]].
Next, modify [1] by adding 2 to it to create the new item [1,2], then add [1,2] to L so L=[[1],[1,2]].
Next, modify [1,2] by adding 3 to it to create the new item [1,2,3], then add [1,2,3] to L so L=[[1],[1,2],[1,2,3]].
Next, modify [1,2,3] by adding 4 to it to create the new item [1,2,3,4], then add [1,2,3,4] to L so L=[[1],[1,2],[1,2,3],[1,2,3,4]].
etc.
I tried a few things, but most of them would modify not just the last item added but also items added to L in previous steps. For the particular problem I was interested in, I did manage to find a solution that behaves properly (at least for small cases), but it seems inelegant, I’m not sure why it works when other things didn’t, and I’m not even confident that it would still behave as desired for large cases. I’m also not confident that I could adapt my approach to similar problems. It's not a case of me not understanding the problem, since I've coded the same thing in other programming languages without issues.
So I’m wondering how more experienced Python programmers would handle this general task.
(I’m omitting my own code in part because I’m new here and I haven’t figured out how to enter it on stackoverflow, but also because it's long-ish and I don’t want help with the particular problem, but rather with how to handle the more general procedure I described above.)
When adding a list object M to another list, you are only adding a reference; continuing to manipulate the list M means you will see those changes reflected through the other reference(s) too:
>>> M = []
>>> resultlist = []
>>> resultlist.append(M)
>>> M is resultlist[0]
True
>>> M.append(1)
>>> resultlist[0]
[1]
>>> M
[1]
Note that M is resultlist[0] is True; it is the same object.
You'd add a copy of M instead:
resultlist.append(M[:])
The whole slice here ([:] means to slice from start to end) creates a new list with a shallow copy of the contents of M.
The generic way to build produce a series L from a continuously altered starting point M is to use a generator function. Your simple add the next number to M series could be implemented as:
def growing_sequence():
M = []
counter = 0
while True:
M.append(counter)
counter += 1
yield M[:]
This will yield ever longer lists each time you iterate, on demand:
>>> gen = growing_sequence()
>>> next(gen)
[0]
>>> next(gen)
[0, 1]
>>> for i, lst in enumerate(gen):
... print i, lst
... if i == 2: break
...
0 [0, 1, 2]
1 [0, 1, 2, 3]
2 [0, 1, 2, 3, 4]
You can do:
M=[1]
L=[M]
for e in range(5):
li=L[-1][:]
li.append(li[-1]+1)
L.append(li)
Or more tersely:
for e in range(5):
L.append(L[-1][:]+[L[-1][-1]+1])
I think that the best way to do this is with a generator. That way, you don't have to deal with list.append, deep-copying lists or any of that nonsense.
def my_generator(max):
for n in range(max+1):
yield list(range(n+1))
Then, you just have to list-ify it:
>>> list(my_generator(5))
[[0], [0,1], [0,1,2], [0,1,2,3], [0,1,2,3,4], [0,1,2,3,4,5]]
This approach is also more flexible if you wanted to make it an infinite generator. Simply switch the for loop for a while true.
This will be based on iterate from Haskell.
iterate :: (a -> a) -> a -> [a]
iterate f x returns an infinite list of repeated applications of f to x:
iterate f x == [x, f x, f (f x), ...]
In Python:
def iterate(f, x):
while True:
yield x
x = f(x)
Example usage:
>>> import itertools.islice
>>> def take(n, iterable):
... return list(islice(iterable, n))
>>> take(4, iterate(lambda x: x + [len(x) + 1], [1]))
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4]]
To produce a finite list, the type signature (again starting in Haskell just for clarity) could be infiniteFinitely :: (a -> Maybe a) -> a -> [a].
If we were to use list in place of Maybe in Python:
from itertools import takewhile
def iterateFinitely(f, x):
return map(lambda a: a[0], takewhile(len, iterate(lambda y: f(y[0]), [x])))
Example usage:
>>> list(iterateFinitely(lambda x: [x / 2] if x else [], 20))
[20, 10, 5, 2, 1, 0]
Since ending with a falsy value is probably pretty common, you might also add a version of this function that does that.
def iterateUntilFalsy(f, x):
return iterateFinitely(lambda y: [f(y)] if y else [], x)
Example usage:
>>> list(iterateUntilFalsy(lambda x: x / 2, 20))
[20, 10, 5, 2, 1, 0]
>>> list(iterateUntilFalsy(lambda x: x[1:], [1,2,3,4]))
[[1, 2, 3, 4], [2, 3, 4], [3, 4], [4], []]
Try this:
M = [1]
L = [M]
for _ in xrange(3):
L += [L[-1] + [L[-1][-1] + 1]]
After the above code is executed, L will contain [[1], [1, 2], [1, 2, 3], [1, 2, 3, 4]]. Explanation:
The first two lines simply seed the iteration with initial values
The for line states how many loops we want to perform after the initial value has been set, 3 in this case. I'm using _ as the iteration variable because we're not interested in its value, we just want to do a certain number of loops
Now for the interesting part; and remember that in Python a negative index in a list starts counting from the end, so an index of -1 points to the last element.
This: L += … updates the list, appending a new sublist at the end as many times as specified in the loop
This: [L[-1] + …] creates a new sublist by taking the last sublist and adding a new element at the end
And finally this: [L[-1][-1] + 1] obtains the previous last element in the last sublist, adds one to it and returns a single-element list to be concatenated at the end of the previous expression

Complement of list comprehension in python [duplicate]

This question already has answers here:
How can I partition (split up, divide) a list based on a condition?
(41 answers)
Closed 8 years ago.
I'm wondering if there is not a way to compute the complement of a list comprehension in Python.
Something like:
evens = [i in range(10) if i % 2 == 0]
odds = [i in range(10) if i % 2 != 0]
is there a way to get both evens and odds in one call? For a very large list, or a more expensive if statement, I think this would save a lot of time.
I believe this question has been asked before, but I am not finding the link currently.
If you are trying to get more than one predicate and you only want to iterate once over the original generator, then you will have to use a simple for loop.
evens = []
odds = []
for i in xrange(10):
if i % 2 == 0: evens.append(i)
else: odds.append(i)
As #dawg pointed out, the logic inside the loop can be made more concise using clever indexing.
for i in xrange(10):
(evens,odds)[i%2].append(i)
itertools.groupby is what I'd use.
In [1]: import itertools as it
In [2]: key = lambda i: i%2 == 0
In [3]: l = list(range(10))
In [4]: l.sort(key=key)
In [5]: [list(i[1]) for i in it.groupby(l, key=key)]
Out[5]: [[1, 3, 5, 7, 9], [0, 2, 4, 6, 8]]
I would do one of the following:
evens = [i in range(10) if i % 2 == 0]
odds = [i in range(10) if i not in evens]
Or with better performances:
evens = [i in range(10) if i % 2 == 0]
evens_set = set(evens)
odds = [i in range(10) if i not in evens_set]
Working with set is better in performance, as the not in query costs O(1) instead of O(n) in lists
In short, you can get both True and False cases in one call, but you'd still need to split them into two lists. You could do
range_10 = range(10)
odds = range_10[1::2]
evens = range_10[::2]
but the benefit of that would be negligible. (In fact, you'd be creating three lists instead of two). You'd only want to do that if the cost of range(10) was so high that it would offset creating two lists.
Using slicing like I did should be slightly faster than using a test and explicitly appending.

Using information from the previous step in generators?

I have a generator which tries to mimic realtime. This generator makes sure that the user has no access to the future but only to current time.
To simplify my case i use this generator
def generator(n):
for x in range(n):
yield [[x],[x+3]]
if run for n = 5 the generator returns:
[[0], [3]]
[[1], [4]]
[[2], [5]]
[[3], [6]]
[[4], [7]]
I want to be able to combine the elements of each generator iteration with the elements of previous generator iteration to compute sum() of inner lists
case 1:
sum([0]), sum([3])
case 2:
sum([0,1]), sum([3,4])
case 3:
sum([0,1,2]), sum([3,4,5])
...
case LAST
sum([0,1,2,3,4]), sum([3,4,5,6,7])
I don't see how this can be achieved by using:
for x in generator(5):
do sum operation
the values that I will use in the for loop will be gone for the next iteration.
Please do not focus on the numbers and the results, but mainly on the logic and algorithm behind a possible solution. In this case for me it is important to be able to preserve the situation where access to future data is not allowed, only data from the past can be used for calculations, however the data that I consider from the past is already gone when the for iteration is ended!
Any solution? Suggestions?
Thanks in advance!
l1, l2 = [], []
for x1, x2 in generator(5):
l1.extend(x1)
l2.extend(x2)
print sum(l1), sum(l2)
Seems pretty straightforward. It's not like the generator can force you to forget what it gave you. If the operation you want to do is something as simple as sum, you don't even need to keep all the old data, just its sum.
sum1 = sum2 = 0
for x1, x2 in generator(5)
sum1 += x1[0]
sum2 += x2[0]
If the number of lists you need is dynamic, that's easy to handle:
lists = [[] for _ in xrange(numberoflists)]
for subtuple in generator(5):
for element, sublist in zip(subtuple, lists):
sublist.extend(element)
do_whatever_with(map(sum, lists))
I hope I understood this correctly, but are you trying to do something like this?
>>> tmp = [[],[]]
>>> for x in generator(5):
tmp[0] += x[0]
tmp[1] += x[1]
print sum(tmp[0]),sum(tmp[1])
0, 3
1, 7
3, 12
6, 18
10, 25
>>> tmp
[[0, 1, 2, 3, 4], [3, 4, 5, 6, 7]]
Or if you don't want to save the entire list:
>>> tmp = [0,0]
>>> for x in generator(5):
tmp[0] += x[0][0]
tmp[1] += x[1][0]
print tmp[0],tmp[1]
0, 3
1, 7
3, 12
6, 18
10, 25

Categories

Resources