Say I have a list of lists, e.g:
x = [[0,1,2,3],[4,5],[6,7,8,9,10]]
And I have the 'flat' indices of the elements I wish to target, i.e, the indices of the elements I want to select from the list if it were flattened into a 1d list:
flattened_indices = [0,1,4,9]
# # # #
flattened_list = [0,1,2,3,4,5,6,7,8,9,10]
How do I convert the 1.d. indices into 2.d. indices that would allow me to recover the elements from the original nested list? I.e. in this example:
2d_indices = [(0,0), (0,1), (1,0), (2,3)]
Here is a way to do that:
from bisect import bisect
import itertools
# Accumulated sum of list lengths
def len_cumsum(x):
return list(itertools.accumulate(map(len, x)))
# Find 2D index from accumulated list of lengths
def find_2d_idx(c, idx):
i1 = bisect(c, idx)
i2 = (idx - c[i1 - 1]) if i1 > 0 else idx
return (i1, i2)
# Test
x = [[0, 1, 2, 3], [4, 5], [6, 7, 8, 9, 10]]
indices = [0, 4, 9]
flattened_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
c = len_cumsum(x)
idx_2d = [find_2d_idx(c, i) for i in indices]
print(idx_2d)
>>> [(0, 0), (1, 0), (2, 3)]
print([x[i1][i2] for i1, i2 in idx_2d])
>>> [0, 4, 9]
If you have many "flat" indices, this is more effective than iterating the nested list for each index.
I guess you could put these index pairs in a dict, then just reference the dict from indices at the end and create a new list:
x = [[0,1,2,3],[4,5],[6,7,8,9,10]]
indices = [0,4,9]
idx_map = {x: (i, j) for i, l in enumerate(x) for j, x in enumerate(l)}
result = [idx_map[x] for x in indices]
print(result)
Which results in:
[(0, 0), (1, 0), (2, 3)]
But this is not optimal since its quadratic runtime to create idx_map. #jdehesa's solution using bisect is much more optimal.
Related
I'm trying to find every item in an numpy array arr that's also in an arbitrary list lst and replace them, but while arr > 0 will generate a boolean array for easy masking, arr in lst only works with all() or any() which isn't what I need.
Example input: array (1, 2, 3, 4, 5), list [2, 4, 6, 8]
Output: array (1, 0, 3, 0, 5)
I managed to get the same result with for loops:
for i in range(len(arr)):
if arr[i] in lst:
arr[i] = 0
Just wondering if there are other ways to do it that set arrays apart from lists.
You can use numpy.isin:
a = np.array((1, 2, 3, 4, 5))
lst = [2, 4, 6, 8]
a[np.isin(a, lst)] = 0
Gives you an a of:
array([1, 0, 3, 0, 5])
You can iterate over lst and still use numpy's indexing.
for element in lst:
arr[arr == element] = 0
You can use this one also.
arr = (1, 2, 3, 4, 5)
lst = [2, 4, 6, 8]
new_arr = tuple('Replace With Anything' if a in lst else a for a in arr)
print(new_arr)
How would you 1) insert 1's between any two adjacent 5's? and then 2) insert a value into the second list at the same index as the 1 that was inserted into the first list?
For example,
list1 = [ 5, 1, 5, 5, 5, 1, etc.]
would become
list1 = [ 5, 1, 5, 1, 5, 1, 5, 1, etc.]
And,
list2 = [ val, val, val, val, val, val, etc.]
would become
list2 = [ val, val, val, Nval, val, Nval, etc.]
(Nval up above = the added value)
I'm a beginner so help is greatly appreciated :O)
You'll want to look at pairs of consecutive values. To do that, let's pair the list with the last item cut off (list[:-1]) with again the list, but with the first item cut off (list[1:]). (The slice notation used here is being introduced in the official Python tutorial and explained in this answer.)
zip(list1[:-1], list1[1:])
(The zip function turns a pair of sequences into a sequence of pairs and is introduced here in the tutorial and documented here.)
Let's see which these pairs are (5, 5):
[pair == (5, 5) for pair in zip(list1[:-1], list1[1:])]
The feature used here is a list comprehension, a way of writing a (new) list by giving a rule to construct it from an existing iterable.
What are the indices of these pairs? Let's number them with enumerate:
[n for (n, pair) in enumerate(zip(list1[:-1], list1[1:])) if pair == (5, 5)]
This is another list comprehension, this time with a condition for the elements (a "predicate"). Note that enumerate returns pairs of numbers and values (which are our original pairs), and that we use implicit unpacking to get them into the loop variables n and pair respectively.
list.insert(i, new_value) takes the index after which the new value shall be inserted. Thus to find the positions where to insert into the original list (and into list2), we need to add 1 to the pair indices:
idxs = [n + 1 for (n, pair) in enumerate(zip(list1[:-1], list1[1:])) if pair == (5, 5)]
for i in reversed(idxs):
list1.insert(idxs, 1)
list2.insert(idxs, 'Nval')
(We insert in reverse order, so as to not move the pairs between which we have yet to insert.)
You can recover the insertion indices with a single list comprehension. You are looking for indices i such that 5 == list1[i-1] == list1[i].
You then need to insert in decreasing order of indices.
list1 = [5, 1, 5, 5, 5, 1]
list2 = [val, val, val, val, val, val]
indices = [i for i in range(1, len(list1)) if 5 == list1[i-1] == list1[i]]
for i in reversed(indices):
list1.insert(i, 1)
list2.insert(i, Nval)
print(list1) # [5, 1, 5, 1, 5, 1, 5, 1]
print(list2) # [val, val, val, Nval, val, Nval, val, val]
You can use itertools.groupby:
import itertools
list1 = [5, 1, 5, 5, 5, 1]
copied = iter(['val' for _ in list1])
grouped = [[a, list(b)] for a, b in itertools.groupby(list1)]
new_result = [list(b) if a != 5 else [[5, 1] if c < len(b) - 1 else [5] for c, _ in enumerate(b)] for a, b in grouped]
final_result = [[i] if not isinstance(i, list) else i for b in new_result for i in b]
new_copied = [[next(copied)] if len(i) == 1 else [next(copied), 'Nval'] for i in final_result]
list2, list2_copied = list(itertools.chain(*final_result)), list(itertools.chain(*new_copied))
Output:
[5, 1, 5, 1, 5, 1, 5, 1]
['val', 'val', 'val', 'Nval', 'val', 'Nval', 'val', 'val']
list1 = [5,1,5,5,5,1,5,5,1,5,5,5,5]
list2 = []
templist = []
for idx,val in enumerate(list1):
if (idx+1) <= (len(list1)-1):
if list1[idx+1] == 5 and list1[idx] == 5:
templist.append(val)
templist.append(1)
list2.append(val)
list2.append(42)
else:
templist.append(val)
list2.append(val)
Gives me as output:
templist [5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1]
list2 [5, 1, 5, 42, 5, 42, 5, 1, 5, 42, 5, 1, 5, 42, 5, 42, 5, 42]
And to finish off:
list1 = templist
One-line solution based on zip and reduce:
from functools import reduce
new_val = 10 # value to use for list2 (Nval)
new_list1 = []
new_list2 = []
reduce(lambda x, y: ((y[0] == 5 and x == 5) and (new_list1.append(1) or new_list2.append(new_val))) or
new_list1.append(y[0]) or new_list2.append(y[1]) or y[0],
zip(list1, list2), (None, new_list1, new_list2))
Given two lists of equal length:
_list = [1, 4, 8, 7, 3, 15, 5, 0, 6]
_list2 = [7, 4, 0, 1, 5, 5, 7, 2, 2]
How do I try getting an output like this:
output = [(0,3), (1,1), (3,0), (6,4), (6,5), (7,2)]
Here the intersection of two lists are obtained and the common elements' indices are arranged in the list:
output = list of (index of an element in _list, where it appears in _list2)
Trying intersection with sets is not an option since the set removes the repeating elements.
Basic-Intermediate: As a generator:
def find_matching_indices(a, b):
for i, x in enumerate(a):
for j, y in enumerate(b):
if x == y:
yield i, j
list(find_matching_indices(list1_, list2_))
# [(0, 3), (1, 1), (3, 0), (3, 6), (6, 4), (6, 5), (7, 2)]
Basic-Intermediate: As a list comprehension:
[(i, j) for i, x in enumerate(list1_) for j, y in enumerate(list2_) if x == y]
# [(0, 3), (1, 1), (3, 0), (3, 6), (6, 4), (6, 5), (7, 2)]
These solutions involve two loops.
Intermediate-Advanced: For fun, a dictionary is another data structure you might consider:
import collections as ct
import more_itertools as mit
def locate_indices(a, b):
"""Return a dictionary of `a` index keys found at `b` indices."""
dd = ct.defaultdict(list)
for i, y in enumerate(a):
idxs = list(mit.locate(b, lambda z: z == y))
if idxs: dd[i].extend(idxs)
return dd
locate_indices(list1_, list2_)
# defaultdict(list, {0: [3], 1: [1], 3: [0, 6], 6: [4, 5], 7: [2]})
Note the index of list a is the key in the dictionary. All indices in list b that share the same value are appended.
A defaultdict was used since it is helpful in building dictionaries with list values. See more on the third-party tool more_itertools.locate(), which simply yields all indices that satisfy the lambda condition - an item in list a is also found in b.
from itertools import product
from collections import defaultdict
def mathcing_indices(*lists):
d = defaultdict(lambda: tuple([] for _ in range(len(lists))))
for l_idx, l in enumerate(lists):
for i, elem in enumerate(l):
d[elem][l_idx].append(i)
return sorted([tup for _, v in d.items() for tup in product(*v)])
This solution builds a dictionary that tracks the indices that values appear at in the input lists. So if the value 5 appears at indices 0 and 2 of the first list and index 3 of the second, the value for 5 in the dictionary would be ([0, 2], [3])
It then uses itertools.product to build all the combinations of those indices.
This looks more complicated than the other answers here, but because it is O(nlogn) and not O(n**2) it is significantly faster, especially for large inputs. Two length 1000 lists of random numbers 0-1000 complete 100 tests in ~.4 seconds using the above algorithm and 6-13 seconds using some of the others here
Here is a solution that runs in O(n log n):
ind1 = numpy.argsort(_list)
ind2 = numpy.argsort(_list2)
pairs = []
i = 0
j = 0
while i<ind1.size and j<ind2.size:
e1 = _list[ind1[i]]
e2 = _list2[ind2[j]]
if e1==e2:
pairs.append((ind1[i],ind2[j]))
i = i + 1
j = j + 1
elif e1<e2:
i = i +1
elif e2<e1:
j = j + 1
print(pairs)
n,m=map(int,input().split())
arr=[i%m for i in (map(int,(input().split())))]
suppose n=5 and m =3 and input array =[3, 2 ,1 ,4, 5] so arr=[0, 2, 1, 1, 2] in this case but now i want to store elements of equal value in a list efficiently i.e [1,1] and [2,2].What's the best way to group them together efficiently?Also i want their indices at the end so
output : [[1,1],[2,2]] from index (2,3) and index(1,4)
what i am looking for is the indices of the original array elements before taking mod that have the same value after performing mod operation.
set saves only unique values
arr=[0, 2, 1, 1, 2]
arr = [(s, arr.count(s)) for s in set(arr)]
# [(0, 1), (1, 2), (2, 2)]
update (thanks to #JonClements)
s = {}
for i, v in enumerate(arr):
s.setdefault(v % 3, []).append(i)
print(s)
# {0: [0], 1: [2, 3], 2: [1, 4]}
I have a sorted list and would like to identify consecutive multiple numbers in that list. The list can contain consecutive multiples of different order, which makes it more difficult.
Some test cases:
[1,3,4,5] -> [[1], [3,4,5]]
[1,3,5,6,7] -> [[1], [3], [5,6,7]]
# consecutive multiples of 1 and 2 (or n)
[1,2,3,7,9,11] -> [[1,2,3], [7,9,11]
[1,2,3,7,10,12,14,25] -> [[1,2,3], [7], [10,12,14], [25]]
# overlapping consecutives !!!
[1,2,3,4,6,8,10] -> [[1,2,3,4], [6,8,10]
Now, I have no idea what I'm doing. What I have done is to group pairwise by the distance between numbers, which was a good start, but then I am having a lot of issues identifying which element in each pair goes where, i.e.
# initial list
[1,3,4,5]
# pairs of same distance
[[1,3], [[3,4], [4,5]]
# algo to get the final result ?
[[1], [3,4,5]]
Any help is greatly appreciated.
EDIT: Maybe mentioning what I want this for would make it more clear.
I want to transform something like:
[1,5,10,11,12,13,14,15,17,20,22,24,26,28,30]
into
1, 5, 10 to 15 by 1, 17, 20 to 30 by 2
Here is a version that incorporates #Bakuriu's optimization:
MINIMAL_MATCH = 3
def find_some_sort_of_weird_consecutiveness(data):
"""
>>> find_some_sort_of_weird_consecutiveness([1,3,4,5])
[[1], [3, 4, 5]]
>>> find_some_sort_of_weird_consecutiveness([1,3,5,6,7])
[[1, 3, 5], [6], [7]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,7,9,11])
[[1, 2, 3], [7, 9, 11]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,7,10,12,14,25])
[[1, 2, 3], [7], [10, 12, 14], [25]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,4,6,8,10])
[[1, 2, 3, 4], [6, 8, 10]]
>>> find_some_sort_of_weird_consecutiveness([1,5,10,11,12,13,14,15,17,20,22,24,26,28,30])
[[1], [5], [10, 11, 12, 13, 14, 15], [17], [20, 22, 24, 26, 28, 30]]
"""
def pair_iter(series):
from itertools import tee
_first, _next = tee(series)
next(_next, None)
for i, (f, n) in enumerate(zip(_first, _next), start=MINIMAL_MATCH - 1):
yield i, f, n
result = []
while len(data) >= MINIMAL_MATCH:
test = data[1] - data[0]
if (data[2] - data[1]) == test:
for i, f, n in pair_iter(data):
if (n - f) != test:
i -= 1
break
else:
i = 1
data, match = data[i:], data[:i]
result.append(match)
for d in data:
result.append([d])
return result
if __name__ == '__main__':
from doctest import testmod
testmod()
It handles all your current test cases. Give me new failing test cases if you have any.
As mentioned in comments below, I am assuming that the shortest sequence is now three elements since a sequence of two is trivial.
See http://docs.python.org/2/library/itertools.html for an explanation of the pairwise iterator.
I'd start out with a difference list.
length_a = len(list1)
diff_v = [list1[j+1] - list1[j] for j in range(length_a-1)]
so [1,2,3,7,11,13,15,17] becomes [1,1,4,4,2,2,2]
now it is easy
You can just keep track of your last output value as you go along:
in_ = [1, 2, 3, 4, 5]
out = [[in[0]]]
for item in in_[1:]:
if out[-1][-1] != item - 1:
out.append([])
out[-1].append(item)
I would group the list by its difference between index and value:
from itertools import groupby
lst = [1,3,4,5]
result = []
for key, group in groupby(enumerate(lst), key = lambda (i, value): value - i):
result.append([value for i, value in group])
print result
[[1], [3, 4, 5]]
What did I do?
# at first I enumerate every item of list:
print list(enumerate(lst))
[(0, 1), (1, 3), (2, 4), (3, 5)]
# Then I subtract the index of each item from the item itself:
print [ value - i for i, value in enumerate(lst)]
[1, 2, 2, 2]
# As you see, consecutive numbers turn out to have the same difference between index and value
# We can use this feature and group the list by the difference of value minus index
print list( groupby(enumerate(lst), key = lambda (i, value): value - i) )
[(1, <itertools._grouper object at 0x104bff050>), (2, <itertools._grouper object at 0x104bff410>)]
# Now you can see how it works. Now I just want to add how to write this in one logical line:
result = [ [value for i, value in group]
for key, group in groupby(enumerate(lst), key = lambda (i, value): value - i)]
print result
[[1], [3, 4, 5]]
Approach for identifying consecutive multiples of n
Let's have a look at this list,
lst = [1,5,10,11,12,13,14,15,17,21,24,26,28,30]
especially at the differences between neighbor elements and the differences of differences of three consecutive elements:
1, 5, 10, 11, 12, 13, 14, 15, 17, 21, 24, 26, 28, 30
4, 5, 1, 1, 1, 1, 1, 2, 4, 3, 2, 2, 2
1, -4, 0, 0, 0, 0, 1, 2, -1, -1, 0, 0
We see, that there are zeros in the third row, whenever there are connective multiples in the first row. If we think of it mathematically, the 2nd derivative of a functions's linear sections is also zero. So lets use this property...
The "2nd derivative" of a list lst can be calculated like this
lst[i+2]-2*lst[i+1]+lst[i]
Note that this definition of the second order difference "looks" two indexes ahead.
Now here is the code detecting the consecutive multiples:
from itertools import groupby
# We have to keep track of the indexes in the list, that have already been used
available_indexes = set(range(len(lst)))
for second_order_diff, grouper in groupby(range(len(lst)-2), key = lambda i: lst[i+2]-2*lst[i+1]+lst[i]):
# store all not-consumed indexes in a list
grp_indexes = [i for i in grouper if i in available_indexes]
if grp_indexes and second_order_diff == 0:
# There are consecutive multiples
min_index, max_index = grp_indexes[0], grp_indexes[-1] + 2
print "Group from ", lst[min_index], "to", lst[max_index], "by", lst[min_index+1]-lst[min_index]
available_indexes -= set(range(min_index, max_index+1))
else:
# The not "consumed" indexes in this group are not consecutive
for i in grp_indexes:
print lst[i]
available_indexes.discard(i)
# The last two elements could be lost without the following two lines
for i in sorted(available_indexes):
print lst[i]
Output:
1
5
Group from 10 to 15 by 1
17
21
Group from 24 to 30 by 2