Related
I am trying to get a list of lists that represent all possible ordered pairs from an existing list of lists.
import itertools
list_of_lists=[[0, 1, 2, 3, 4], [5], [6, 7],[8, 9],[10, 11],[12, 13],[14, 15],[16, 17],[18, 19],[20, 21],[22, 23],[24, 25],[26, 27],[28, 29],[30, 31],[32, 33],[34, 35],[36, 37],[38],[39]]
Ideally, we would just use itertools.product in order to get that list of ordered pairs.
scenarios_list=list(itertools.product(*list_of_lists))
However, if I were to do this for a larger list of lists I would get a memory error and so this solution is not scalable for larger lists of lists where there could be numerous different sets of ordered pairs.
So, is there a way to set up a process where we could iterate through these ordered pairs as they are produced where before appending the list to another list, we could test if the list satisfies a certain criteria (for example testing whether there are a certain number of even numbers, sum of list cannot be equal to the maximum, etc). If the criteria is not satisfied then the ordered pair would not be appended and thus not unnecessarily suck up memory when there are only certain ordered pairs that we care about.
Starting with a recursive base implementation of product:
def product(*lsts):
if not lsts:
yield ()
return
first_lst, *rest = lsts
for element in first_lst:
for rec_p in product(*rest):
p = (element,) + rec_p
yield p
[*product([1, 2], [3, 4, 5])]
# [(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)]
Now, you could augment that with a condition by which you filter any p not meeting it:
def product(*lsts, condition=None):
if condition is None:
condition = lambda tpl: True
if not lsts:
yield ()
return
first_lst, *rest = lsts
for element in first_lst:
for rec_p in product(*rest, condition=condition):
p = (element,) + rec_p
if condition(p): # stop overproduction right where it happens
yield p
Now you can - for instance - restrict to only even elements:
[*product([1, 2], [3, 4, 5], condition=lambda tpl: not any(x%2 for x in tpl))]
# [(2, 4)]
Given a list of iterables:
li = [(1,2), (3,4,8), (3,4,7), (9,)]
I want to sort by the third element if present, otherwise leave the order unchanged. So here the desired output would be:
[(1,2), (3,4,7), (3,4,8), (9,)]
Using li.sort(key=lambda x:x[2]) returns an IndexError. I tried a custom function:
def safefetch(li, idx):
try:
return li[idx]
except IndexError:
return # (ie return None)
li.sort(key=lambda x: safefetch(x, 2))
But None in sorting yields a TypeError.
Broader context: I first want to sort by the first element, then the second, then the third, etc. until the length of the longest element, ie I want to run several sorts of decreasing privilege (as in SQL's ORDER BY COL1 , COL2), while preserving order among those elements that aren't relevant. So: first sort everything by first element; then among the ties on el_1 sort on el_2, etc.. until el_n. My feeling is that calling a sort function on the whole list is probably the wrong approach.
(Note that this was an "XY question": for my actual question, just using sorted on tuples is simplest, as Patrick Artner pointed out in the comments. But the question is posed is trickier.)
We can first get the indices for distinct lengths of elements in the list via a defaultdict and then sort each sublist with numpy's fancy indexing:
from collections import defaultdict
# {length -> inds} mapping
d = defaultdict(list)
# collect indices per length
for j, tup in enumerate(li):
d[len(tup)].append(j)
# sort
li = np.array(li, dtype=object)
for inds in d.values():
li[inds] = sorted(li[inds])
# convert back to list if desired
li = li.tolist()
to get li at the end as
[(1, 2), (3, 4, 7), (3, 4, 8), (9,)]
For some other samples:
In [134]: the_sorter([(12,), (3,4,8), (3,4,7), (9,)])
Out[134]: [(9,), (3, 4, 7), (3, 4, 8), (12,)]
In [135]: the_sorter([(12,), (3,4,8,9), (3,4,7), (11, 9), (9, 11), (2, 4, 4, 4)])
Out[135]: [(12,), (2, 4, 4, 4), (3, 4, 7), (9, 11), (11, 9), (3, 4, 8, 9)]
where the_sorter is above procedure wrapped in a function (name lacks imagination...)
def the_sorter(li):
# {length -> inds} mapping
d = defaultdict(list)
# collect indices per length
for j, tup in enumerate(li):
d[len(tup)].append(j)
# sort
li = np.array(li)
for inds in d.values():
li[inds] = sorted(li[inds])
return li.tolist()
Whatever you return as fallback value must be comparable to the other key values that might be returned. In your example that would require a numerical value.
import sys
def safefetch(li, idx):
try:
return li[idx]
except IndexError:
return sys.maxsize # largest int possible
This would put all the short ones at the back of the sort order, but maintain a stable order among them.
Inspired by #Mustafa Aydın here is a solution in Pandas. Would prefer one without the memory overhead of a dataframe, but this might be good enough.
import pandas as pd
li = [(1,2), (3,4,8), (3,4,7), (9,)]
tmp = pd.DataFrame(li)
[tuple(int(el) for el in t if not pd.isna(el)) for t in tmp.sort_values(by=tmp.columns.tolist()).values]
> [(1, 2), (3, 4, 7), (3, 4, 8), (9,)]
I have a list like below:
A = [1, 2, 3, 4]
After using enumerate I have the following list:
A = [(0, 1), (1, 2), (2, 3), (3, 4)]
After checking a condition, I realized that I don't need the elements with index 0 and 2.
It means that my condition returns a list like below which can be different each time:
condA = [(0, 1), (2, 3)]
I know that I can use del or .pop() to remove an element from a list.
However, I was wondering how can I read the numbers like (0) and (2) in my condA list and remove those elements from my original list.
I don't want to enter the 0 and 2 in my code because each time they would be different.
The result would be like this:
A_reduced = [2, 4]
If you want to read the index from the condA list and create the list of that number, the list of indices of elements to be removed will be like:
rm_lst = [x[0] for x in condA]
Now, to remove the elements from your main list:
A = [(0, ((11), (12))), (1, ((452), (54))), (2, ((545), (757))), (3, ((42), (37)))]
A_reduced = [x[1] for x in A if x[0] not in rm_lst]
Final Code:
A = [(0, ((11), (12))), (1, ((452), (54))), (2, ((545), (757))), (3, ((42), (37)))]
condA = [(0, ((11), (452))), (2, ((545), (757)))]
rm_lst = [x[0] for x in condA]
A_reduced = [x[1] for x in A if x[0] not in rm_lst]
print(A_reduced)
If you want to delete elements from list inside loop, you should iterate from last to first:
for i in range(len(A) - 1, -1, -1):
if true: # replace with condition
del A[i]
Upd.
You can also use list comprehension for this, but you should invert you condition (A[i][0] != 11 => A[i][0] == 11):
A = [A[i] for i in range(len(A)) if inverted_condition]
Loop through condA, pop the element off the list A. You need a counter to decrease the indexes, since the size of A is shrinking. Make sure to sort condA:
A = [1, 2, 3, 4]
condA = [(0, 1), (2, 3)]
i = 0
for item in condA:
A.pop(item[0]-i)
i+=1
#result: [2, 4]
IIUC maybe a function is the correct way:
A = [1, 2, 3, 4]
def remove_items(lis, idx):
lis2=lis.copy()
[lis2.pop(i) for i in idx]
return lis2
A_reduced=remove_items(A,[0,2])
Output:
Out[32]: [2, 3]
You can add any list of indexes you want, and it'll drop them from the list. (If they exist)
Edit: Adjusted to your new values, and modified the function so you also keep your original list (if that's needed)
Suppose I have a list a = [-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1] in python what i want is if there is any built in function in python in which we pass a list and it will return which element are present at what what index ranges for example
>>> index_range(a)
{-1 :'0-2,9-11', 1:'3-5,12-14', 2:'6-8'}
I have tried to use Counter function from collection.Counter library but it only outputs the count of the element.
If there is not any built in function can you please guide me how can i achieve this in my own function not the whole code just a guideline.
You can create your custom function using itertools.groupby and collections.defaultdict to get the range of numbers in the form of list as:
from itertools import groupby
from collections import defaultdict
def index_range(my_list):
my_dict = defaultdict(list)
for i, j in groupby(enumerate(my_list), key=lambda x: x[1]):
index_range, numlist = list(zip(*j))
my_dict[numlist[0]].append((index_range[0], index_range[-1]))
return my_dict
Sample Run:
>>> index_range([-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1])
{1: [(3, 5), (12, 14)], 2: [(6, 8)], -1: [(0, 2), (9, 11)]}
In order to get the values as string in your dict, you may either modify the above function, or use the return value of the function in dictionary comprehension as:
>>> result_dict = index_range([-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1])
>>> {k: ','.join('{}:{}'.format(*i) for i in v)for k, v in result_dict.items()}
{1: '3:5,12:14', 2: '6:8', -1: '0:2,9:11'}
You can use a dict that uses list items as keys and their indexes as values:
>>> lst = [-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1]
>>> indexes = {}
>>> for index, item in enumerate(lst):
... indexes.setdefault(value, []).append(index)
>>> indexes
{1: [3, 4, 5, 12, 13, 14], 2: [6, 7, 8], -1: [0, 1, 2, 9, 10, 11]}
You could then merge the index lists into ranges if that's what you need. I can help you with that too if necessary.
My question is similar to this previous SO question.
I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:
list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2] # This is 20M long!
list2 = ... # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
if list[i] == list[i+1]:
del list1[i]
del list2[i]
else:
i = i+1
And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list.
Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.
Python has this groupby in the libraries for you:
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]
You can tweak it using the keyfunc argument, to also process the second list at the same time.
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]
If you want to split those pairs back into separate sequences again:
>>> zip(*_) # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]
You can use collections.deque and its max len argument to set a window size of 2. Then just compare the duplicity of the 2 entries in the window, and append to the results if different.
def remove_adj_dups(x):
"""
:parameter x is something like '1, 1, 2, 3, 3'
from an iterable such as a string or list or a generator
:return 1,2,3, as list
"""
result = []
from collections import deque
d = deque([object()], maxlen=2) # 1st entry is object() which only matches with itself. Kudos to Trey Hunner -->object()
for i in x:
d.append(i)
a, b = d
if a != b:
result.append(b)
return result
I generated a random list with duplicates of 20 million numbers between 0 and 10.
def random_nums_with_dups(number_range=None, range_len=None):
"""
:parameter
:param number_range: use the numbers between 0 and number_range. The smaller this is then the more dups
:param range_len: max len of the results list used in the generator
:return: a generator
Note: If number_range = 2, then random binary is returned
"""
import random
return (random.choice(range(number_range)) for i in range(range_len))
I then tested with
range_len = 2000000
def mytest():
for i in [1]:
return [remove_adj_dups(random_nums_with_dups(number_range=10, range_len=range_len))]
big_result = mytest()
big_result = mytest()[0]
print(len(big_result))
The len was 1800197 (read dups removed), in <5 secs, which includes the random list generator spinning up.
I lack the experience/knowhow to say if it is memory efficient as well. Could someone comment please