Faster Python List Comprehension - python

I have a bit of code that runs many thousands of times in my project:
def resample(freq, data):
output = []
for i, elem in enumerate(freq):
for _ in range(elem):
output.append(data[i])
return output
eg. resample([1,2,3], ['a', 'b', 'c']) => ['a', 'b', 'b', 'c', 'c', 'c']
I want to speed this up as much as possible. It seems like a list comprehension could be faster. I have tried:
def resample(freq, data):
return [item for sublist in [[data[i]]*elem for i, elem in enumerate(frequencies)] for item in sublist]
Which is hideous and also slow because it builds the list and then flattens it. Is there a way to do this with one line list comprehension that is fast? Or maybe something with numpy?
Thanks in advance!
edit: Answer does not necessarily need to eliminate the nested loops, fastest code is the best

I highly suggest using generators like so:
from itertools import repeat, chain
def resample(freq, data):
return chain.from_iterable(map(repeat, data, freq))
This will probably be the fastest method there is - map(), repeat() and chain.from_iterable() are all implemented in C so you technically can't get any better.
As for a small explanation:
repeat(i, n) returns an iterator that repeats an item i, n times.
map(repeat, data, freq) returns an iterator that calls repeat every time on an element of data and an element of freq. Basically an iterator that returns repeat() iterators.
chain.from_iterable() flattens the iterator of iterators to return the end items.
No list is created on the way, so there is no overhead and as an added benefit - you can use any type of data and not just one char strings.
While I don't suggest it, you are able to convert it into a list() like so:
result = list(resample([1,2,3], ['a','b','c']))

import itertools
def resample(freq, data):
return itertools.chain.from_iterable([el]*n for el, n in zip(data, freq))
Besides faster, this also has the advantage of being lazy, it returns a generator and the elements are generated step by step

No need to create lists at all, just use a nested loop:
[e for i, e in enumerate(data) for j in range(freq[i])]
# ['a', 'b', 'b', 'c', 'c', 'c']
You can just as easily make this lazy by removing the brackets:
(e for i, e in enumerate(data) for j in range(freq[i]))

Related

New list of not repeated elements

I want to create a function that take a lsit as argument, for example:
list = ['a','b','a','d','e','f','a','b','g','b']
and returns a specific number of list elements ( i chose the number) such that no number occurs twice. For example if i chose 3:
new_list = ['a','b','d']
I tried the following:
def func(j, list):
new_list=[]
for i in list:
while(len(new_list)<j):
for k in new_list:
if i != k:
new_list.append(i)
return new_list
But the function went through infinite loop.
def func(j, mylist):
# dedup, preserving order (dict is insertion-ordered as a language guarantee as of 3.7):
deduped = list(dict.fromkeys(mylist))
# Slice off all but the part you care about:
return deduped[:j]
If performance for large inputs is a concern, that's suboptimal (it processes the whole input even if j unique elements are found in first j indices out of an input where j is much smaller than the input), so the more complicated solution can be used for maximum efficiency. First, copy the itertools unique_everseen recipe:
from itertools import filterfalse, islice # At top of file, filterfalse for recipe, islice for your function
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
now wrap it with islice to only pull off as many elements as required and exiting immediately once you have them (without processing the rest of the input at all):
def func(j, mylist): # Note: Renamed list argument to mylist to avoid shadowing built-in
return list(islice(unique_everseen(mylist), j))
Try this.
lst = ['a','b','a','d','e','f','a','b','g','b']
j = 3
def func(j,list_):
new_lst = []
for a in list_:
if a not in new_lst:
new_lst.append(a)
return new_lst[:j]
print(func(j,lst)) # ['a', 'b', 'd']
I don't know why someone does not post a numpy.unique solution
Here is memory efficient way(I think 😉).
import numpy as np
lst = ['a','b','a','d','e','f','a','b','g','b']
def func(j,list_):
return np.unique(list_).tolist()[:j]
print(func(3,lst)) # ['a', 'b', 'd']
list is a reserved word in python.
If order of the elements is not a concern then
def func(j, user_list):
return list(set(user_list))[:j]
it's bad practice to use "list" as variable name
you can solve the problem by just using the Counter lib in python
from collections import Counter
a=['a','b','a','d','e','f','a','b','g','b']
b = list(Counter(a))
print(b[:3])
so your function will be something like that
def unique_slice(list_in, elements):
new_list = list(Counter(list_in))
print("New list: {}".format(new_list))
if int(elements) <= len(new_list):
return new_list[:elements]
return new_list
hope it solves your question
As others have said you should not Shadow built-in name 'list'. Because that could lead to many issues. This is a simple problem where you should add to a new list and check if the element was already added.
The [:] operator in python lets you separate the list along an index.
>>>l = [1, 2, 3, 4]
>>>l[:1]
[1]
>>>l[1:]
[2, 3, 4]
lst = ['a', 'b', 'a', 'd', 'e', 'f', 'a', 'b', 'g', 'b']
def func(number, _list):
out = []
for a in _list:
if a not in out:
out.append(a)
return out[:number]
print(func(4, lst)) # ['a', 'b', 'd', 'e']

How to split up elements of a list separated by ],[ in Python

I have a list that looks like :
mylist=[[["A","B"],["A","C","B"]],[["A","D"]]]
and I want to return :
mylist=[["A","B"],["A","C","B"],["A","D"]]
Using the split() function returns an error of :
list object has no attribute split
Therefore, I am unsure how I should split the elements of this list.
Thanks!
I am not sure why you think splitting will do any good for you; after all, you are -- if anything -- merging the second layer lists. But flattenening by one level can be done by a comprehension:
mylist = [inner for outer in mylist for inner in outer]
# [['A', 'B'], ['A', 'C', 'B'], ['A', 'D']]
One util to (maybe a matter of taste) simplify this is itertools.chain:
from itertools import chain
mylist = list(chain(*mylist))
Use for-loop in order to do this.
Here is an example code:
output = []
for list_element in my_list:
for single_list in list_element:
output.append(single_list)

How can I use pool.starmap and zip to combine and pass an entire list with a single element

I thought about a interesting question and I hope somebody can help me solve this!
I want to use multiprocessing, so I choose to use pool.starmap(myfunction,zip([1,2,3,4,5],['a','b','c','d','e'])) in order to pass multi arguments. I want to combine the entire list [1,2,3,4,5] with the every single element in the second list such as
([1,2,3,4,5],'a'),([1,2,3,4,5],'b').....
instead of only combining the single element in the lists such as
(1,'a'),(2,'b')
I know how to do it in a stupid way which is multiply the list by 5
new_list=[1,2,3,4,5]*5
and then zip the new_list with the second list
I'm now wondering if there is a better way to do this?
After reading your comment I assume you are looking for itertools.repeat:
import itertools
import multiprocessing
def combine(val, char):
return f'{val}{char}'
vals = [1, 2, 3, 4, 5]
chars = ['a', 'b', 'c', 'd', 'e']
pool = multiprocessing.Pool(3)
combs = pool.starmap(combine, zip(itertools.repeat(vals, 5), chars))
print(combs)
This has a smaller memory footprint than the naive approach, which is simply
combs = pool.starmap(combine, zip([vals]*5, chars))
If you instead want to generate all combinations of the valsand the chars elements, you could use itertools.product (which is what I first assumed you wanted):
combs = pool.starmap(combine, itertools.product(vals, chars))
As a sidenote; itertools also contain a starmap function that works more or less the same as the multiprocessing one, except for executing all calls in one process, in order. This, however, can not take advantage of multiple cores.

Returning semi-unique values from a list

Not sure how else to word this, but say I have a list containing the following sequence:
[a,a,a,b,b,b,a,a,a]
and I would like to return:
[a,b,a]
How would one do this in principle?
You can use itertools.groupby, this groups consecutive same elements in the same group and return an iterator of key value pairs where the key is the unique element you are looking for:
from itertools import groupby
[k for k, _ in groupby(lst)]
# ['a', 'b', 'a']
lst = ['a','a','a','b','b','b','a','a','a']
Psidoms way is a lot better, but I may as well write this so you can see how it'd be possible just using basic loops and statements. It's always good to figure out what steps you'd need to take for any problem, as it usually makes coding the simple things a bit easier :)
original = ['a','a','a','b','b','b','a','a','a']
new = [original[0]]
for letter in original[1:]:
if letter != new[-1]:
new.append(letter)
Basically it will append a letter if the previous letter is something different.
Using list comprehension:
original = ['a','a','a','b','b','b','a','a','a']
packed = [original[i] for i in range(len(original)) if i == 0 or original[i] != original[i-1]]
print(packed) # > ['a', 'b', 'a']
Similarly (thanks to pylang) you can use enumerate instead of range:
[ x for i,x in enumerate(original) if i == 0 or x != original[i-1] ]
more_itertools has an implementation of the unique_justseen recipe from itertools:
import more_itertools as mit
list(mit.unique_justseen(["a","a","a","b","b","b","a","a","a"]))
# ['a', 'b', 'a']

Filtering lists

I want to filter repeated elements in my list
for instance
foo = ['a','b','c','a','b','d','a','d']
I am only interested with:
['a','b','c','d']
What would be the efficient way to do achieve this ?
Cheers
list(set(foo)) if you are using Python 2.5 or greater, but that doesn't maintain order.
Cast foo to a set, if you don't care about element order.
Since there isn't an order-preserving answer with a list comprehension, I propose the following:
>>> temp = set()
>>> [c for c in foo if c not in temp and (temp.add(c) or True)]
['a', 'b', 'c', 'd']
which could also be written as
>>> temp = set()
>>> filter(lambda c: c not in temp and (temp.add(c) or True), foo)
['a', 'b', 'c', 'd']
Depending on how many elements are in foo, you might have faster results through repeated hash lookups instead of repeated iterative searches through a temporary list.
c not in temp verifies that temp does not have an item c; and the or True part forces c to be emitted to the output list when the item is added to the set.
>>> bar = []
>>> for i in foo:
if i not in bar:
bar.append(i)
>>> bar
['a', 'b', 'c', 'd']
this would be the most straightforward way of removing duplicates from the list and preserving the order as much as possible (even though "order" here is inherently wrong concept).
If you care about order a readable way is the following
def filter_unique(a_list):
characters = set()
result = []
for c in a_list:
if not c in characters:
characters.add(c)
result.append(c)
return result
Depending on your requirements of speed, maintanability, space consumption, you could find the above unfitting. In that case, specify your requirements and we can try to do better :-)
If you write a function to do this i would use a generator, it just wants to be used in this case.
def unique(iterable):
yielded = set()
for item in iterable:
if item not in yielded:
yield item
yielded.add(item)
Inspired by Francesco's answer, rather than making our own filter()-type function, let's make the builtin do some work for us:
def unique(a, s=set()):
if a not in s:
s.add(a)
return True
return False
Usage:
uniq = filter(unique, orig)
This may or may not perform faster or slower than an answer that implements all of the work in pure Python. Benchmark and see. Of course, this only works once, but it demonstrates the concept. The ideal solution is, of course, to use a class:
class Unique(set):
def __call__(self, a):
if a not in self:
self.add(a)
return True
return False
Now we can use it as much as we want:
uniq = filter(Unique(), orig)
Once again, we may (or may not) have thrown performance out the window - the gains of using a built-in function may be offset by the overhead of a class. I just though it was an interesting idea.
This is what you want if you need a sorted list at the end:
>>> foo = ['a','b','c','a','b','d','a','d']
>>> bar = sorted(set(foo))
>>> bar
['a', 'b', 'c', 'd']
import numpy as np
np.unique(foo)
You could do a sort of ugly list comprehension hack.
[l[i] for i in range(len(l)) if l.index(l[i]) == i]

Categories

Resources