Merging arrays of varying size in Python - python

is there an easy way to merge let's say n spectra (i.e. arrays of shape (y_n, 2)) with varying lengths y_n into an array (or list) of shape (y_n_max, 2*x) by filling up y_n with zeros if it is
Basically I want to have all spectra next to each other.
For example
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
into
c = [[1,2,6,7],[2,3,8,9],[4,5,0,0]]
Either Array or List would be fine. I guess it comes down to filling up arrays with zeros?

If you're dealing with native Python lists, then you can do:
from itertools import zip_longest
c = [a + b for a, b in zip_longest(a, b, fillvalue=[0, 0])]

You also could do this with extend and zip without itertools provided a will always be longer than b. If b could be longer than a, the you could add a bit of logic as well.
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
b.extend([[0,0]]*(len(a)-len(b)))
[[x,y] for x,y in zip(a,b)]

Trying to generalize the other solutions to multiple lists:
In [114]: a
Out[114]: [[1, 2], [2, 3], [4, 5]]
In [115]: b
Out[115]: [[6, 7], [8, 9]]
In [116]: c
Out[116]: [[3, 4]]
In [117]: d
Out[117]: [[1, 2], [2, 3], [4, 5], [6, 7], [8, 9]]
In [118]: ll=[a,d,c,b]
zip_longest pads
In [120]: [l for l in itertools.zip_longest(*ll,fillvalue=[0,0])]
Out[120]:
[([1, 2], [1, 2], [3, 4], [6, 7]),
([2, 3], [2, 3], [0, 0], [8, 9]),
([4, 5], [4, 5], [0, 0], [0, 0]),
([0, 0], [6, 7], [0, 0], [0, 0]),
([0, 0], [8, 9], [0, 0], [0, 0])]
intertools.chain flattens the inner lists (or .from_iterable(l))
In [121]: [list(itertools.chain(*l)) for l in _]
Out[121]:
[[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]]
More ideas at Convert Python sequence to NumPy array, filling missing values
Adapting #Divakar's solution to this case:
def divakars_pad(ll):
lens = np.array([len(item) for item in ll])
mask = lens[:,None] > np.arange(lens.max())
out = np.zeros((mask.shape+(2,)), int)
out[mask,:] = np.concatenate(ll)
out = out.transpose(1,0,2).reshape(5,-1)
return out
In [142]: divakars_pad(ll)
Out[142]:
array([[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]])
For this small size the itertools solution is faster, even with an added conversion to array.
With an array as target we don't need the chain flattener; reshape takes care of that:
In [157]: np.array(list(itertools.zip_longest(*ll,fillvalue=[0,0]))).reshape(-1, len(ll)*2)
Out[157]:
array([[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]])

Use the zip built-in function and the chain.from_iterable function from itertools. This has the benefit of being more type agnostic than the other posted solution -- it only requires that your spectra are iterables.
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
c = list(list(chain.from_iterable(zs)) for zs in zip(a,b))
If you want more than 2 spectra, you can change the zip call to zip(a,b,...)

Related

Fast merge elements of two arrays of arrays only if the element is different than zero

So I have two numpy arrays of arrays
a = [[[1, 2, 3, 4], [3, 3, 3, 3], [4, 4, 4, 4]]]
b = [[[0, 0, 4, 0], [0, 0, 0, 0], [0, 1, 0, 1]]]
Both arrays are always of the same size.
The result should be like
c = [[[1, 2, 4, 4], [3, 3, 3, 3], [4, 1, 4, 1]]]
How can I do that in a very fast way in numpy?
Use numpy.where:
import numpy as np
a = np.array([[1, 2, 3, 4], [3, 3, 3, 3], [4, 4, 4, 4]])
b = np.array([[0, 0, 4, 0], [0, 0, 0, 0], [0, 1, 0, 1]])
res = np.where(b == 0, a, b)
print(res)
Output
[[1 2 4 4]
[3 3 3 3]
[4 1 4 1]]
For optimal speed use b criterion directly.
Instead of
np.where(b == 0, a, b)
# array([[1, 2, 4, 4],
# [3, 3, 3, 3],
# [4, 1, 4, 1]])
timeit(lambda:np.where(b==0,a,b))
# 2.6133874990046024
better do
np.where(b,b,a)
# array([[1, 2, 4, 4],
# [3, 3, 3, 3],
# [4, 1, 4, 1]])
timeit(lambda:np.where(b,b,a))
# 1.5850481310044415

How to repeat a numpy array along a new dimension with padding?

Given an two arrays: an input array and a repeat array, I would like to receive an array which is repeated along a new dimension a specified amount of times for each row and padded until the ending.
to_repeat = np.array([1, 2, 3, 4, 5, 6])
repeats = np.array([1, 2, 2, 3, 3, 1])
# I want final array to look like the following:
#[[1, 0, 0],
# [2, 2, 0],
# [3, 3, 0],
# [4, 4, 4],
# [5, 5, 5],
# [6, 0, 0]]
The issue is that I'm operating with large datasets (10M or so) so a list comprehension is too slow - what is a fast way to achieve this?
Here's one with masking based on this idea -
m = repeats[:,None] > np.arange(repeats.max())
out = np.zeros(m.shape,dtype=to_repeat.dtype)
out[m] = np.repeat(to_repeat,repeats)
Sample output -
In [44]: out
Out[44]:
array([[1, 0, 0],
[2, 2, 0],
[3, 3, 0],
[4, 4, 4],
[5, 5, 5],
[6, 0, 0]])
Or with broadcasted-multiplication -
In [67]: m*to_repeat[:,None]
Out[67]:
array([[1, 0, 0],
[2, 2, 0],
[3, 3, 0],
[4, 4, 4],
[5, 5, 5],
[6, 0, 0]])
For large datasets/sizes, we can leverage multi-cores and be more efficient on memory with numexpr module on that broadcasting -
In [64]: import numexpr as ne
# Re-using mask `m` from previous method
In [65]: ne.evaluate('m*R',{'m':m,'R':to_repeat[:,None]})
Out[65]:
array([[1, 0, 0],
[2, 2, 0],
[3, 3, 0],
[4, 4, 4],
[5, 5, 5],
[6, 0, 0]])

How do I repeatedly shift and pad elements in a list to get a list of lists?

I have a list A = [1, 2, 3, ..., n] and want to repeatedly shift the list to get a list of lists. The first row should be A, the second row [2, 3, 4, ...], the third row [3, 4, 5, ...], until the last row [n, 0, 0, ...]. The missing elements in the last columns should be zeros. I was trying to put them individually, but n is >= 100 so manually padding the zeros would take long. How do I do this?
edit same question for numpy arrays, which is what I really have.
>>> a = [1, 2, 3, 4]
>>> [ a[i:] + i*[0] for i in range(len(a))]
[[1, 2, 3, 4], [2, 3, 4, 0], [3, 4, 0, 0], [4, 0, 0, 0]]
How it works
To get the i'th shifted list, we can use: a[i:] + i*[0]. The list comprehension does this repeatedly for all the i's that we need.
Using numpy
+ means something different for numpy arrays than it does for normal python lists. Consequently, the above code needs from tweaks to adapt it to numpy:
>>> import numpy as np
>>> a = np.arange(1, 5)
>>> [ np.concatenate((a[i:], np.zeros(i))) for i in range(len(a))]
[array([1, 2, 3, 4]),
array([2, 3, 4, 0]),
array([3, 4, 0, 0]),
array([4, 0, 0, 0])]
If you want the final result to be a numpy array:
>>> np.array([np.concatenate((a[i:], np.zeros(i))) for i in range(len(a)) ])
array([[ 1., 2., 3., 4.],
[ 2., 3., 4., 0.],
[ 3., 4., 0., 0.],
[ 4., 0., 0., 0.]])
Extra verbose for clarity:
import pprint
A = [1,2,3,4,5]
lists = []
lists.append(A)
for _ in range(len(A)):
last_list = lists[-1] # Grab the last list element from lists
new_list = last_list[1:] + [0] # (See below)
lists.append(new_list) # Add new_list to the end of lists
pprint.pprint(lists)
Output:
[[1, 2, 3, 4, 5],
[2, 3, 4, 5, 0],
[3, 4, 5, 0, 0],
[4, 5, 0, 0, 0],
[5, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]
The new_list = last_list[1:] + [0] just means take the "1th" through last element in last_list and concatenate it with a zero to form a new list called new_list.
Note I say "1th" because python lists are 0-indexed so "1th" is the second element.
A = [1, 2, 3, 4]
matrix = []
for i in range(len(A)):
row = A[i:]
row.extend([0 for i in range(len(A)-len(row))])
matrix.append(row)
print matrix
Output:
[[1, 2, 3, 4], [2, 3, 4, 0], [3, 4, 0, 0], [4, 0, 0, 0]]
The array that you are creating is known as a Hankel matrix. If you don't mind the dependency on scipy, you can use the function scipy.linalg.hankel to create the array with a single function call:
In [21]: from scipy.linalg import hankel
In [22]: A = [1, 2, 3, 4, 5]
In [23]: hankel(A)
Out[23]:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 0],
[3, 4, 5, 0, 0],
[4, 5, 0, 0, 0],
[5, 0, 0, 0, 0]])
If this is the only reason to use scipy, I wouldn't bother--it is a pretty heavy dependency. But if you are already using scipy, then you might as well take advantage of the convenience of hankel.
a = [0,1,2,3,4,5,6,7,8,9]
tam = a.__len__()
cont = 0
rdo = []
for item in a:
rdo.append(a[cont:tam]+(cont)*[0])
cont+=1
print rdo
Output:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9, 0], [2, 3, 4, 5, 6, 7, 8, 9, 0, 0], [3, 4, 5, 6, 7, 8, 9, 0, 0, 0], [4, 5, 6, 7, 8, 9, 0, 0, 0, 0], [5, 6, 7, 8, 9, 0, 0, 0, 0, 0], [6, 7, 8, 9, 0, 0, 0, 0, 0, 0], [7, 8, 9, 0, 0, 0, 0, 0, 0, 0], [8, 9, 0, 0, 0, 0, 0, 0, 0, 0], [9, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

Python — How can I find the square matrix of a lower triangular numpy matrix? (with a symmetrical upper triangle)

I generated a lower triangular matrix, and I want to complete the matrix using the values in the lower triangular matrix to form a square matrix, symmetrical around the diagonal zeros.
lower_triangle = numpy.array([
[0,0,0,0],
[1,0,0,0],
[2,3,0,0],
[4,5,6,0]])
I want to generate the following complete matrix, maintaining the zero diagonal:
complete_matrix = numpy.array([
[0, 1, 2, 4],
[1, 0, 3, 5],
[2, 3, 0, 6],
[4, 5, 6, 0]])
Thanks.
You can simply add it to its transpose:
>>> m
array([[0, 0, 0, 0],
[1, 0, 0, 0],
[2, 3, 0, 0],
[4, 5, 6, 0]])
>>> m + m.T
array([[0, 1, 2, 4],
[1, 0, 3, 5],
[2, 3, 0, 6],
[4, 5, 6, 0]])
You can use the numpy.triu_indices or numpy.tril_indices:
>>> a=np.array([[0, 0, 0, 0],
... [1, 0, 0, 0],
... [2, 3, 0, 0],
... [4, 5, 6, 0]])
>>> irows,icols = np.triu_indices(len(a),1)
>>> a[irows,icols]=a[icols,irows]
>>> a
array([[0, 1, 2, 4],
[1, 0, 3, 5],
[2, 3, 0, 6],
[4, 5, 6, 0]])

How can I find the square matrix of a lower triangular numpy matrix? (with a rotated upper triangle) [duplicate]

This question already has answers here:
Python — How can I find the square matrix of a lower triangular numpy matrix? (with a symmetrical upper triangle)
(2 answers)
Closed 9 years ago.
I generated a lower triangular matrix, and I want to complete the matrix using the values in the lower triangular matrix to form a square matrix.
lower_triangle = numpy.array([
[0,0,0,0],
[1,0,0,0],
[2,3,0,0],
[4,5,6,0]])
I want to generate the following complete matrix, maintaining the zero diagonal:
complete_matrix = numpy.array([
[0, 6, 5, 4],
[1, 0, 3, 2],
[2, 3, 0, 1],
[4, 5, 6, 0]])
Thanks.
How about:
>>> m
array([[0, 0, 0, 0],
[1, 0, 0, 0],
[2, 3, 0, 0],
[4, 5, 6, 0]])
>>> np.rot90(m,2)
array([[0, 6, 5, 4],
[0, 0, 3, 2],
[0, 0, 0, 1],
[0, 0, 0, 0]])
>>> m + np.rot90(m, 2)
array([[0, 6, 5, 4],
[1, 0, 3, 2],
[2, 3, 0, 1],
[4, 5, 6, 0]])
See also fliplr(m)[::-1], etc.
without any addition:
>>> a=np.array([[0, 0, 0, 0],
... [1, 0, 0, 0],
... [2, 3, 0, 0],
... [4, 5, 6, 0]])
>>> irows,icols = np.triu_indices(len(a),1)
>>> a[irows,icols]=a[icols,irows]
>>> a
array([[0, 1, 2, 4],
[1, 0, 3, 5],
[2, 3, 0, 6],
[4, 5, 6, 0]])

Categories

Resources