Related
Suppose I have two NumPy arrays
x = [[1, 2, 8],
[2, 9, 1],
[3, 8, 9],
[4, 3, 5],
[5, 2, 3],
[6, 4, 7],
[7, 2, 3],
[8, 2, 2],
[9, 5, 3],
[10, 2, 3],
[11, 2, 4]]
y = [0, 0, 1, 0, 1, 1, 2, 2, 2, 0, 0]
Note:
(values in x are not sorted in any way. I chose this example to better illustrate the example)
(These are just two examples of x and y. values of x and y can be arbitrarily many different numbers and y can have arbitrarily different numbers, but there are always as many values in x as there are in y)
I want to efficiently split the array x into sub-arrays according to the values in y.
My desired outputs would be
z_0 = [[1, 2, 8],
[2, 9, 1],
[4, 3, 5],
[10, 2, 3],
[11, 2, 4]]
z_1 = [[3, 8, 9],
[5, 2, 3],
[6, 4, 7],]
z_2 = [[7, 2, 3],
[8, 2, 2],
[9, 5, 3]]
Assuming that y starts with zero and is not sorted but grouped, what is the most efficient way to do this?
Note: This question is the unsorted version of this question:
Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array
One way to solve this is to build up a list of filter indexes for each y value and then simply select those elements of x. For example:
z_0 = x[[i for i, v in enumerate(y) if v == 0]]
z_1 = x[[i for i, v in enumerate(y) if v == 1]]
z_2 = x[[i for i, v in enumerate(y) if v == 2]]
Output
array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]])
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]])
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])
If you want to be more generic and support different sets of numbers in y, you could use a comprehension to produce a list of arrays e.g.
z = [x[[i for i, v in enumerate(y) if v == m]] for m in set(y)]
Output:
[array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]]),
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]]),
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])]
If y is also an np.array and the same length as x you can simplify this to use boolean indexing:
z = [x[y==m] for m in set(y)]
Output is the same as above.
Just use list comprehension and boolean indexing
x = np.array(x)
y = np.array(y)
z = [x[y == i] for i in range(y.max() + 1)]
z
Out[]:
[array([[ 1, 2, 8],
[ 2, 9, 1],
[ 4, 3, 5],
[10, 2, 3],
[11, 2, 4]]),
array([[3, 8, 9],
[5, 2, 3],
[6, 4, 7]]),
array([[7, 2, 3],
[8, 2, 2],
[9, 5, 3]])]
Slight variation.
from operator import itemgetter
label = itemgetter(1)
Associate the implied information with the label ... (index,label)
y1 = [thing for thing in enumerate(y)]
Sort on the label
y1.sort(key=label)
Group by label and construct the results
import itertools
d = {}
for key,group in itertools.groupby(y1,label):
d[f'z{key}'] = [x[i] for i,k in group]
Pandas solution:
>>> import pandas as pd
>>> >>> df = pd.DataFrame({'points':[thing for thing in x],'cat':y})
>>> z = df.groupby('cat').agg(list)
>>> z
points
cat
0 [[1, 2, 8], [2, 9, 1], [4, 3, 5], [10, 2, 3], ...
1 [[3, 8, 9], [5, 2, 3], [6, 4, 7]]
2 [[7, 2, 3], [8, 2, 2], [9, 5, 3]]
I have a nested list:
x=[[[0, 1, 2], [0, 1, 2]], [[0, 1], [0, 1, 2]], [[0]], [[0, 1], [0, 1, 2, 3, 4]]]
Now, the goal is to get a nested list with the same structure but with elements replaced by their "global" counting numbers. So, the desired output should look like this:
y=[[[0, 1, 2], [3, 4, 5]], [[6, 7], [8, 9, 10]], [[11]], [[12, 13], [14, 15, 16, 17, 18]]]
I fight with it for the last couple of hours but without success.
Ideally, I'd like to have a universal solution being able to work with an arbitrary depth of nesting.
Any help would be very much appreciated. Thank you in advance!
Here's a recursive solution that does the replacement in-place and relies on the type of the element being replaced. The idea is to keep track of the "global counter" and pass it into the recursive calls so that it knows what to replace elements with:
x = [[[0, 1, 2], [0, 1, 2]], [[0, 1], [0, 1, 2]], [[0]], [[0, 1], [0, 1, 2, 3, 4]]]
def replace(lst, i):
for j in range(len(lst)):
if isinstance(lst[j], list):
lst[j], i = replace(lst[j], i)
else:
lst[j] = i
i += 1
return lst, i - 1
replace(x, 0)
print(x)
# [[[0, 1, 2], [3, 4, 5]], [[6, 7], [8, 9, 10]], [[11]], [[12, 13], [14, 15, 16, 17, 18]]]
Here's another recursive solution. It uses itertools.count and builds a new list. Personally, I like to avoid integer indexing when possible for readability.
from itertools import count
def structured_enumerate(lst, counter=None):
'enumerate elements in nested list, preserve structure'
result = []
if counter is None:
counter = count()
for x in lst:
if isinstance(x, list):
result.append(structured_enumerate(x, counter))
else:
result.append(next(counter))
return result
Demo:
>>> x = [[[0, 1, 2], [0, 1, 2]], [[0, 1], [0, 1, 2]], [[0]], [[0, 1], [0, 1, 2, 3, 4]]]
>>> structured_enumerate(x)
[[[0, 1, 2], [3, 4, 5]],
[[6, 7], [8, 9, 10]],
[[11]],
[[12, 13], [14, 15, 16, 17, 18]]]
~edit~
Here's an attempt at a generic solution that works with any iterable, indexable or not, where you can specifiy iterable types to exclude from iteration.
from itertools import count
def structured_enumerate(iterable, dontiter=(str,), counter=None):
'enumerate elements in nested iterable, preserve structure'
result = []
if counter is None:
counter = count()
for x in iterable:
# check if x should be iterated
try:
iter(x)
is_iterable = True
except TypeError:
is_iterable = False
# strings of length zero and one are a special case
if isinstance(x, str) and len(x) < 2:
is_iterable = False
if is_iterable and not isinstance(x, dontiter):
subresult = structured_enumerate(x, dontiter, counter)
result.append(subresult)
else:
result.append(next(counter))
return result
Demo:
>>> fuzzy = [{0, 0}, '000', [0, [0, 0]], (0,0), 0]
>>> structured_enumerate(fuzzy)
[[0, 1], 2, [3, [4, 5]], [6, 7], 8]
>>> structured_enumerate(fuzzy, dontiter=())
[[0, 1], [2, 3, 4], [5, [6, 7]], [8, 9], 10]
>>> structured_enumerate(fuzzy, dontiter=(tuple, set))
[0, [1, 2, 3], [4, [5, 6]], 7, 8]
I have an integer S and a collection A of integers. I need to find a set of integers from the collection where the sum of those integers is equal to S. It could be 1 integer or 50 - it doesn't matter.
I'm trying to do this like this:
I have an array res and an array grp
res starts with [0], grp is the initially given collection, and S is the sum we're trying to find, S is global
my function takes (res, grp)
I want to do this : (and example)
-
and stop when the sum of the res elements is equal to S
but I suck with recursion and I have no idea what I should be doing
this is my code
S = 7
grp = [0,5,6,4,3]
def sumFinder(res,grp):
if grp == []:
return grp #in case no pair was found [] is returned
if sum(res) == S:
return res
for i in range (0,len(grp)):
print(res)
print(grp[i])
res += [grp[i]]
newgrp = grp[:i]
newgrp += grp[i+1:]
return sumFinder(res,newgrp)
print(sumFinder([0],grp))
UPDATE:
thank you everyone for you answers
thank you englealuze for giving me a beter idea about approaching the problem thanks o you i got to this:
1 - this is for finding the first combination and returning it (this was my goal)
grp = [1,0,1,0,1,2,6,2,3,5,6,7,8,9,2,1,1,9,6,7,4,1,2,3,2,2]
S = 55
grps = []
def findCombination(comb,grp):
for i in range (0,len(grp)):
comb += [grp[i]]
newgrp = grp[:i]
newgrp += grp[i+1:]
if sum(comb) == S:
return comb
if newgrp not in grps:
grps.append(newgrp)
res = findCombination([],newgrp)
if res != None:
return res
print(findCombination([],grp))
2- this is for finding all the combinations: (this is the problem englealuze talked about, but i didn't understand his method that well even though it seems better)
grp = [1,0,1,1,9,6,7,4,1,2,3,2,2]
S = 12
grps = []
combinations = []
def findAllCombinations(comb,grp):
global combinations
for i in range (0,len(grp)):
comb += [grp[i]]
newgrp = grp
newgrp = grp[:i]
newgrp += grp[i+1:]
if sum(comb) == S and tuple(comb) not in combinations:
combinations.append(tuple(comb))
if newgrp not in grps:
grps.append(newgrp)
findAllCombinations([],newgrp)
findAllCombinations([],grp)
print(combinations)
my only problem now is that when S > 50 (in the first one), it takes longer to find the answer
what advice could you guys give me to improve both algorithms?
In stead of just providing code, I will show you how to think about this problem and how to tackle this type of problems in a general sense.
First, let us rephrase your question. In fact what you want is for a given set of numbers, find the combinations within the set which fulfill certain condition. So, you can decompose your question into 2 distinct steps.
Find all combinations of your set
Filter out the combinations that fulfill certain conditions
Let us think about how to solve the first task recursively. Remember, if a problem can be solved in a recursive way, it generally means there are some recursive patterns within your data and it usually can be solved in a very simple and clear way. If you end up with a messy recursive solution, it pretty much means you are in a wrong direction.
Let us see the pattern of your data first. If you have a very small set (1, 2), then the combinations out of this set are
1
2
1, 2
Let us increase one member to the set, (1, 2, 3). For this bigger set, all combiantions are
1 | 1, 3
2 | 2, 3
1, 2 | 1, 2, 3
| 3
Let us look at even bigger set (1, 2, 3, 4). The possible combinations are
1 1, 3 | 1, 3, 4
2 2, 3 | 2, 3, 4
1, 2 1, 2, 3 | 1, 2, 3, 4
3 | 3, 4
| 4
Now you see something interesting, the combinations of one bigger set is the combinations of the smaller set plus the additional element appending to every previous combination and plus the additional element itself.
Assume you already got the solution of all combinations of set with certain size, the solution of a bigger set can be derived from this solution. This naturally forms a recursion. You can translate this plain English directly to a recursive code as below
# assume you have got all combinations of a smaller set -> combinations(smaller(L))
# the solution of your bigger set can be directly derived from it with known new element
def combinations(L):
if L == []:
return []
else:
return next_solution(combinations(smaller(L)), newitem(L))
Notice how we decompose out task of solving a larger problem to solving smaller problems. You need the helper functions as below
# in your case a smaller set is just the new set drop one element
def smaller(L):
return L[1:]
# the new element would be the first element of new set
define newitem(L):
return L[0]
# to derive the new solution from previous combinations, you need three parts
# 1. all the previous combinations -> L
# 2. new item appending to each previous combination -> [i + [newelement] for i in L]
# 3. the new item itself [[newelement]]
def next_solution(L, newelement):
return L + [i + [newelement] for i in L] + [[newelement]]
Now we know how to get all combinations out of a set.
Then to filter them, you cannot just insert the filter directly into your recursive steps, since we rely on our previous solution to build up the result list recursively. The simple way is to filter the list while we obtain the full result of all combinations.
You will end up with a solution as this.
def smaller(L):
return L[1:]
def newitem(L):
return L[0]
def next_solution(L, newelement):
return L + [i + [newelement] for i in L] + [[newelement]]
def filtersum(L, N, f=sum):
return list(filter(lambda x: f(x)==N, L))
def combinations(L):
if L == []:
return []
else:
return next_solution(combinations(smaller(L)), newitem(L))
def filter_combinations(L, N, f=filtersum):
return f(combinations(L), N)
print(filter_combinations([0,5,6,4,3], 7))
# -> [[3, 4], [3, 4, 0]]
You can save some computations by filter out in each recursive call the combinations with sum bigger than your defined value, such as
def combinations(L):
if L == []:
return []
else:
return next_solution(list(filter(lambda x: sum(x) <=5, combinations(smaller(L)))), newitem(L))
print(combinations([1,2,3,4]))
# -> [[4], [3], [3, 2], [2], [4, 1], [3, 1], [3, 2, 1], [2, 1], [1]]
In fact there will be different ways to do recursion, depends on the way how you decompose your problem to smaller problems. There existed some smarter ways, but the approach I showed above is a typical and general approach for solving this type of problems.
I have example of solving other problems with this way
Python: combinations of map tuples
Below codes work (remove 'directly return' in the loop, changed to conditional 'return').
But I don't think it is a good solution. You need to improve your algorithm.
PS: This codes also will only return one match instead of all.
S = 7
grp = [0,3,6,4,6]
result = []
def sumFinder(res,grp, result):
for i in range (0,len(grp)):
temp = list(res) #clone res instead of direct-reference
if grp == [] or not temp:
return grp #in case no pair was found [] is returned
if sum(temp) == S:
result.append(tuple(temp))
return temp
temp.append(grp[i])
newgrp = grp[:i] + grp[i+1:]
sumFinder(list(temp),newgrp, result)
sumFinder([0], grp, result)
print result
Test Cases:
S = 7
grp = [0,3,6,4,6]
result = [(0, 0, 3, 4), (0, 0, 4, 3), (0, 3, 0, 4), (0, 3, 4), (0, 4, 0, 3), (0, 4, 3)]
[Finished in 0.823s]
Can you let me know where did you find this problem? I love to solve this type of problem, Bdw here is my approach:
a=[[[0],[0,5,6,4,3]]]
s=7
def recursive_approach(array_a):
print(array_a)
sub=[]
for mm in array_a:
array_1 = mm[0]
array_2 = mm[1]
if sum(array_2)==s:
return "result is",array_2
else:
track = []
for i in range(len(array_2)):
c = array_2[:]
del c[i]
track.append([array_1 + [array_2[i]], c])
sub.append(track)
print(sub)
return recursive_approach(sub[0])
print(recursive_approach(a))
output:
[[[0], [0, 5, 6, 4, 3]]]
[[[[0, 0], [5, 6, 4, 3]], [[0, 5], [0, 6, 4, 3]], [[0, 6], [0, 5, 4, 3]], [[0, 4], [0, 5, 6, 3]], [[0, 3], [0, 5, 6, 4]]]]
[[[0, 0], [5, 6, 4, 3]], [[0, 5], [0, 6, 4, 3]], [[0, 6], [0, 5, 4, 3]], [[0, 4], [0, 5, 6, 3]], [[0, 3], [0, 5, 6, 4]]]
[[[[0, 0, 5], [6, 4, 3]], [[0, 0, 6], [5, 4, 3]], [[0, 0, 4], [5, 6, 3]], [[0, 0, 3], [5, 6, 4]]], [[[0, 5, 0], [6, 4, 3]], [[0, 5, 6], [0, 4, 3]], [[0, 5, 4], [0, 6, 3]], [[0, 5, 3], [0, 6, 4]]], [[[0, 6, 0], [5, 4, 3]], [[0, 6, 5], [0, 4, 3]], [[0, 6, 4], [0, 5, 3]], [[0, 6, 3], [0, 5, 4]]], [[[0, 4, 0], [5, 6, 3]], [[0, 4, 5], [0, 6, 3]], [[0, 4, 6], [0, 5, 3]], [[0, 4, 3], [0, 5, 6]]], [[[0, 3, 0], [5, 6, 4]], [[0, 3, 5], [0, 6, 4]], [[0, 3, 6], [0, 5, 4]], [[0, 3, 4], [0, 5, 6]]]]
[[[0, 0, 5], [6, 4, 3]], [[0, 0, 6], [5, 4, 3]], [[0, 0, 4], [5, 6, 3]], [[0, 0, 3], [5, 6, 4]]]
[[[[0, 0, 5, 6], [4, 3]], [[0, 0, 5, 4], [6, 3]], [[0, 0, 5, 3], [6, 4]]], [[[0, 0, 6, 5], [4, 3]], [[0, 0, 6, 4], [5, 3]], [[0, 0, 6, 3], [5, 4]]], [[[0, 0, 4, 5], [6, 3]], [[0, 0, 4, 6], [5, 3]], [[0, 0, 4, 3], [5, 6]]], [[[0, 0, 3, 5], [6, 4]], [[0, 0, 3, 6], [5, 4]], [[0, 0, 3, 4], [5, 6]]]]
[[[0, 0, 5, 6], [4, 3]], [[0, 0, 5, 4], [6, 3]], [[0, 0, 5, 3], [6, 4]]]
('result is', [4, 3])
is there an easy way to merge let's say n spectra (i.e. arrays of shape (y_n, 2)) with varying lengths y_n into an array (or list) of shape (y_n_max, 2*x) by filling up y_n with zeros if it is
Basically I want to have all spectra next to each other.
For example
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
into
c = [[1,2,6,7],[2,3,8,9],[4,5,0,0]]
Either Array or List would be fine. I guess it comes down to filling up arrays with zeros?
If you're dealing with native Python lists, then you can do:
from itertools import zip_longest
c = [a + b for a, b in zip_longest(a, b, fillvalue=[0, 0])]
You also could do this with extend and zip without itertools provided a will always be longer than b. If b could be longer than a, the you could add a bit of logic as well.
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
b.extend([[0,0]]*(len(a)-len(b)))
[[x,y] for x,y in zip(a,b)]
Trying to generalize the other solutions to multiple lists:
In [114]: a
Out[114]: [[1, 2], [2, 3], [4, 5]]
In [115]: b
Out[115]: [[6, 7], [8, 9]]
In [116]: c
Out[116]: [[3, 4]]
In [117]: d
Out[117]: [[1, 2], [2, 3], [4, 5], [6, 7], [8, 9]]
In [118]: ll=[a,d,c,b]
zip_longest pads
In [120]: [l for l in itertools.zip_longest(*ll,fillvalue=[0,0])]
Out[120]:
[([1, 2], [1, 2], [3, 4], [6, 7]),
([2, 3], [2, 3], [0, 0], [8, 9]),
([4, 5], [4, 5], [0, 0], [0, 0]),
([0, 0], [6, 7], [0, 0], [0, 0]),
([0, 0], [8, 9], [0, 0], [0, 0])]
intertools.chain flattens the inner lists (or .from_iterable(l))
In [121]: [list(itertools.chain(*l)) for l in _]
Out[121]:
[[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]]
More ideas at Convert Python sequence to NumPy array, filling missing values
Adapting #Divakar's solution to this case:
def divakars_pad(ll):
lens = np.array([len(item) for item in ll])
mask = lens[:,None] > np.arange(lens.max())
out = np.zeros((mask.shape+(2,)), int)
out[mask,:] = np.concatenate(ll)
out = out.transpose(1,0,2).reshape(5,-1)
return out
In [142]: divakars_pad(ll)
Out[142]:
array([[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]])
For this small size the itertools solution is faster, even with an added conversion to array.
With an array as target we don't need the chain flattener; reshape takes care of that:
In [157]: np.array(list(itertools.zip_longest(*ll,fillvalue=[0,0]))).reshape(-1, len(ll)*2)
Out[157]:
array([[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]])
Use the zip built-in function and the chain.from_iterable function from itertools. This has the benefit of being more type agnostic than the other posted solution -- it only requires that your spectra are iterables.
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
c = list(list(chain.from_iterable(zs)) for zs in zip(a,b))
If you want more than 2 spectra, you can change the zip call to zip(a,b,...)
Suppose I have two lists containing the same number of elements which are lists of integers. For instance:
a = [[1, 7, 3, 10, 4], [1, 3, 8], ..., [2, 5, 10, 91, 54, 0]]
b = [[5, 4, 23], [1, 2, 0, 4], ..., [5, 15, 11]]
For each index, I want to pad the shorter list with trailing zeros. The example above should look like:
a = [[1, 7, 3, 10, 4], [1, 3, 8, 0], ..., [2, 5, 10, 91, 54, 0]]
b = [[5, 4, 23, 0, 0], [1, 2, 0, 4], ..., [51, 15, 11, 0, 0, 0]]
Is there an elegant way to perform this comparison and padding build into Python lists or perhaps numpy? I am aware that numpy.pad can perform the padding, but its the iteration and comparison over the lists that has got me stuck.
I'm sure there's an elegant Python one-liner for this sort of thing, but sometimes a straightforward imperative solution will get the job done:
for i in xrange(0, len(a)):
x = len(a[i])
y = len(b[i])
diff = max(x, y)
a[i].extend([0] * (diff - x))
b[i].extend([0] * (diff - y))
print a, b
Be careful with "elegant" solutions too, because they can be very difficult to comprehend (I can't count the number of times I've come back to a piece of code I wrote using reduce() and had to struggle to figure out how it worked).
One line? Yes. Elegant? No.
In [2]: from itertools import izip_longest
In [3]: A, B = map(list, zip(*[map(list, zip(*izip_longest(l1,l2, fillvalue=0)))
for l1,l2 in zip(a,b)]))
In [4]: A
Out[4]: [[1, 7, 3, 10, 4], [1, 3, 8, 0], [2, 5, 10, 91, 54, 0]]
In [5]: B
Out[5]: [[5, 4, 23, 0, 0], [1, 2, 0, 4], [5, 15, 11, 0, 0, 0]]
Note: Creates 2 new lists. Preserves the old lists.
from itertools import repeat
>>> b = [[5, 4, 23], [1, 2, 0, 4],[5, 15, 11]]
>>> a = [[1, 7, 3, 10, 4], [1, 3, 8],[2, 5, 10, 91, 54, 0]]
>>> [y+list(repeat(0, len(x)-len(y))) for x,y in zip(a,b)]
[[5, 4, 23, 0, 0], [1, 2, 0, 4], [5, 15, 11, 0, 0, 0]]
>>> [x+list(repeat(0, len(y)-len(x))) for x,y in zip(a,b)]
[[1, 7, 3, 10, 4], [1, 3, 8, 0], [2, 5, 10, 91, 54, 0]]
a = [[1, 7, 3, 10, 4], [1, 3, 8], [2, 5, 10, 91, 54, 0]]
b = [[5, 4, 23], [1, 2, 0, 4], [5, 15, 11]]
for idx in range(len(a)):
size_diff = len(a[idx]) - len(b[idx])
if size_diff < 0:
a[idx].extend( [0] * abs(size_diff) )
elif size_diff > 0:
b[idx].extend( [0] * size_diff )