concatenate arrays of different lengths into one multidimensional array - python

I know that you cant stack or concatenate arrays of different lenghths in NumPy as all matrices need to be rectangular, but is there any other way to achieve this?
For example:
a = [1, 2 ,3]
b = [9, 8]
Stacking them would give:
c = [[1, 2, 3]
[9, 8]]
alternatively if there is no way to create the above how could I write a function to get this: (0 in place of missing element to fill matrix)?
c = [[1, 2, 3]
[9, 8, 0]]

This code worked for me:
a = [1, 2 ,3]
b = [9,8]
while len(b) != len(a):
if len(b) > len(a):
a.append(0)
else:
b.append(0)
final = np.array([a,b])
print(final)
The code is self explanatory, but I will try my best to give a valid explanation:
We take two lists (say a and b) and we compare there lengths, if they are unequal we add element (in this case 0) to the one whose length is lower, this loops until their lengths are equal, then it simply converts them into a 2D array in numpy
Also you can replace 0 with np.NaN if you want NaN values

I think what you are looking for is:
In:
from itertools import zip_longest
a = [1, 2 ,3]
b = [9, 8]
c = np.array(list(zip_longest(*[a,b])),dtype=float).transpose()
print(c)
Out:
[[ 1. 2. 3.]
[ 9. 8. nan]]

Related

Add repeated elements of array indexed by another array

I have a relatively simple problem that I cannot solve without using loops. It is difficult for me to figure out the correct title for this problem.
Lets say we have two numpy arrays:
array_1 = np.array([[0, 1, 2],
[3, 3, 3],
[3, 3, 4],
[3, 6, 2]])
array_2 = np.array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6]])
array_1 represents indices of the rows in array_2 that we want to sum. So for example, 4th row in result array should contain summed all rows in array_2 that have same row indices as all 3s in array_1.
It is much easier to understand it in the code:
result = np.empty(array_2.shape)
for i in range(array_1.shape[0]):
for j in range(array_1.shape[1]):
index = array_1[i, j]
result[index] = result[index] + array_2[i]
Result should be:
[[ 0 0 0]
[ 0 0 0]
[ 3 3 3]
[10 10 10]
[ 2 2 2]
[ 0 0 0]
[ 3 3 3]]
I tried to use np.einsum but I need to use both elements in array as indices and also its rows as indices so I'm not sure if np.einsum is the best path here.
This is the problem I have in graphics. array_1 represent indices of vertices for triangles and array_2 represents normals where index of a row corresponds to the index of the vertex
Any time you're adding something from a repeated index, normal ufuncs like np.add don't work out of the box because they only process a repeated fancy index once. Instead, you have to use the unbuffered version, which is np.add.at.
Here, you have a pair of indices: the row in array_1 is the row index into array_2, and the element of array_1 is the row index into the output.
First, construct the indices explicitly as fancy indices. This will make it much simpler to use them:
output_row = array_1.ravel()
input_row = np.repeat(np.arange(array_1.shape[0]), array_1.shape[1]).ravel()
You can apply input_row directly to array_2, but you need add.at to use output_row:
output = np.zeros_like(array_2)
np.add.at(output, output_row, array_2[input_row])
You really only use the first four rows of array_2, so it could be truncated to
array_2 = array2[:array_1.shape[0]]
In that case, you would want to initialize the output as:
output = np.zeros_like(array_2, shape=(output_row.max() + 1, array2.shape[1]))

How to move items in a numpy array in Python?

I haven't found a simple solution to move elements in a NumPy array.
Given an array, for example:
>>> A = np.arange(10).reshape(2,5)
>>> A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
and given the indexes of the elements (columns in this case) to move, for example [2,4], I want to move them to a certain position and the consecutive places, for example to p = 1, shifting the other elements to the right. The result should be the following:
array([[0, 2, 4, 1, 3],
[5, 7, 9, 6, 8]])
You can create a mask m for the sorting order. First we set the columns < p to -1, then the to be inserted columns to 0, the remaining columns remain at 1. The default sorting kind 'quicksort' is not stable, so to be safe we specify kind='stable' when using argsort to sort the mask and create a new array from that mask:
import numpy as np
A = np.arange(10).reshape(2,5)
p = 1
c = [2,4]
m = np.full(A.shape[1], 1)
m[:p] = -1 # leave up to position p as is
m[c] = 0 # insert columns c
print(A[:,m.argsort(kind='stable')])
#[[0 2 4 1 3]
# [5 7 9 6 8]]

Eliminating redundant numpy rows

If I have an array
arr = [[0,1]
[1,2]
[2,3]
[4,3]
[5,6]
[3,4]
[2,1]
[6,7]]
how could I eliminate redundant rows where columns values may be swapped? In the example above, the code would reduce the array to
arr = [[0,1]
[1,2]
[2,3]
[4,3]
[5,6]
[6,7]]
I have thought about using a combination of slicing arr[:,::-1, np.all, and np.any, but what I have come up so far simply gives me True and False per row when comparing rows but this wouldn't discriminate between similar rows.
j = np.any([np.all(y==x, axis=1) for y in x[:,::-1]], axis=0)
which yields [False, True, False, True, False, True, True, False].
Thanks in advance.
Basically you want to Find Unique Rows, and these answers borrow heavily from the top two answers there - but you need to sort the rows first to eliminate different orders.
If you don't care about order of rows at the end, this is the short way (but slower than below):
np.vstack({tuple(row) for row in np.sort(arr,-1)})
If you do want to maintain order, you can turn each sorted row into a void object and use np.unique with return_index
b = np.ascontiguousarray(np.sort(arr,-1)).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1])))
_, idx = np.unique(b, return_index=True)
unique_arr = arr[idx]
It might be tempting to use set row-wise instead of using np.sort(arr,-1) and np.void to make an object array, but this only works if there are no repeated values in rows. If there are, a row of [1,2,2] will be considered equivalent to a row with [1,1,2] - both will be set(1,2)
A solution without using numpy,
In [27]: result_ = set(([tuple(sorted(row)) for row in arr]))
In [28]: result = [list(i) for i in result_]
In [29]: result
Out[29]: [[0, 1], [1, 2], [6, 7], [5, 6], [2, 3], [3, 4]]
The solution using numpy.lexsort routine:
import numpy as np
arr = np.array([
[0,1], [1,2], [2,3], [4,3], [5,6], [3,4], [2,1], [6,7]
])
order = np.lexsort(arr.T)
a = arr[order] # sorted rows
arr= a[[i for i,r in enumerate(a) if i == len(a)-1 or set(a[i]) != set(a[i+1])]]
print(arr)
The output:
[[0 1]
[1 2]
[2 3]
[3 4]
[5 6]
[6 7]]
After getting the boolean list, you can use the folllowing technique to obtain the list with values where x and y are swapped.
In order to remove same rows, you can use the following block
#This block to remove elements where x and y are swapped provided the list j
j=[True,False..] #Your Boolean List
finalArray=[]
for (bool,value) in zip(j,arr):
if not bool:
finalArray.append(value)
#This code to remove same elements
finalArray= [list(x) for x in set(tuple(x) for x in arr)]

Remove head and tail from numpy array PYTHON

I have a numpy.ndarray, and want to remove first h elements and last t.
As I see, the more general way is by selecting:
h, t = 1, 1
my_array = [0,1,2,3,4,5]
middle = my_array[h:-t]
and the middle is [1,2,3,4]. This is correct, but when I want not to remove anything, I used h = 0 and t = 0 since I was trying to remove nothing, but this returns empty array. I know it is because of t = 0 and I also know that an if condition for this border case would solve it with my_array[h:] but I don't want this solution (my problem is a little more complex, with more dimensions, code will become ugly)
Any ideas?
Instead, use
middle = my_array[h:len(my_array)-t]
For completeness, here's the trial run:
my_array = [0,1,2,3,4,5]
h,t = 0,0
middle = my_array[h:len(my_array)-t]
print(middle)
Output: [0, 1, 2, 3, 4, 5]
This example was just for a standard array. Since your ultimate goal is to work with numpy multidimensional arrays, this problem is actually a bit trickier. When you say you want to remove the first h elements and the last t elements, are we guaranteed that h and t satisfy the proper divisibility criteria so that the result will be a well-formed array?
I actually think the cleanest solution is simply to use this solution, but divide out by the appropriate factor first. For example, in two dimensions:
h = 3
t = 6
a = numpy.array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
d = numpy.prod(numpy.shape(a)[1:])
mid_a = a[int(h/3):int(len(a)-t/3)]
print(mid_a)
Output: array([[4, 5, 6]])
I have included the int casts in the indices because python 3 automatically promotes division to float, even when the numerator evenly divides the denominator.
The i:j can be replaced with a slice object. and ':j' with slice(None,j), etc:
In [55]: alist = [0,1,2,3,4,5]
In [56]: h,t=1,-1; alist[slice(h,t)]
Out[56]: [1, 2, 3, 4]
In [57]: h,t=None,-1; alist[slice(h,t)]
Out[57]: [0, 1, 2, 3, 4]
In [58]: h,t=None,None; alist[slice(h,t)]
Out[58]: [0, 1, 2, 3, 4, 5]
This works for lists and arrays. For multidimensional arrays use a tuple of indices, which can include slice objects
x[i:j, k:l]
x[(slice(i,j), Ellipsis, slice(k,l))]

How to compare two numpy arrays and add missing values to the other with a tweak

I have two numpy arrays of different dimension. I want to add those additional elements of the bigger array to the smaller array, only the 0th element and the 1st element should be given as 0.
For example :
a = [ [2,4],[4,5], [8,9],[7,5]]
b = [ [2,5], [4,6]]
After adding the missing elements to b, b would become as follows :
b [ [2,5], [4,6], [8,0], [7,0] ]
I have tried the logic up to some extent, however some values are getting redundantly added as I am not able to check whether that element has already been added to b or not.
Secondly, I am doing it with the help of an additional array c which is the copy of b and then doing the desired operations to c. If somebody can show me how to do it without the third array c , would be very helpful.
import numpy as np
a = [[2,3],[4,5],[6,8], [9,6]]
b = [[2,3],[4,5]]
a = np.array(a)
b = np.array(b)
c = np.array(b)
for i in range(len(b)):
for j in range(len(a)):
if a[j,0] == b[i,0]:
print "matched "
else:
print "not matched"
c= np.insert(c, len(c), [a[j,0], 0], axis = 0)
print c
#####For explanation#####
#basic set operation to get the missing elements
c = set([i[0] for i in a]) - set([i[0] for i in b])
#c will just store the missing elements....
#then just append the elements
for i in c:
b.append([i, 0])
Output -
[[2, 5], [4, 6], [8, 0], [7, 0]]
Edit -
But as they are numpy arrays you can just do this (and without using c as an intermediate) - just two lines
for i in set(a[:, 0]) - (set(b[:, 0])):
b = np.append(b, [[i, 0]], axis = 0)
Output -
array([[2, 5],
[4, 6],
[8, 0],
[7, 0]])
You can use np.in1d to look for matching rows from b in a to get a mask and based on the mask choose rows from a or set to zeros. Thus, we would have a vectorized approach as shown below -
np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Sample run -
In [47]: a
Out[47]:
array([[2, 4],
[4, 5],
[8, 9],
[7, 5]])
In [48]: b
Out[48]:
array([[8, 7],
[4, 6]])
In [49]: np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Out[49]:
array([[8, 7],
[4, 6],
[2, 0],
[7, 0]])
First we should clear up one misconception. c does not have to be a copy. A new variable assignment is sufficient.
c = b
...
c= np.insert(c, len(c), [a[j,0], 0], axis = 0)
np.insert is not modifying any of its inputs. Rather it makes a new array. And the c=... just assigns that to c, replacing the original assignment. So the original c assignment just makes writing the iteration easier.
Since you are adding this new [a[j,0],0] at the end, you could use concatenate (the underlying function used by insert and stack(s).
c = np.concatenate((c, [a[j,0],0]), axis=0)
That won't make much of a change in the run time. It's better to find all the a[j] and add them all at once.
In this case you want to add a[2,0] and a[3,0]. Leaving aside, for the moment, the question of how we find [2,3], we can do:
In [595]: a=np.array([[2,3],[4,5],[6,8],[9,6]])
In [596]: b=np.array([[2,3],[4,5]])
In [597]: ind = [2,3]
An assign and fill approach would look like:
In [605]: c = np.zeros_like(a) # target array
In [607]: c[0:b.shape[0],:] = b # fill in the b values
In [608]: c[b.shape[0]:,0] = a[ind,0] # fill in the selected a column
In [609]: c
Out[609]:
array([[2, 3],
[4, 5],
[6, 0],
[9, 0]])
A variation would be construct a temporary array with the new a values, and concatenate
In [613]: a1 = np.zeros((len(ind),2),a.dtype)
In [614]: a1[:,0] = a[ind,0]
In [616]: np.concatenate((b,a1),axis=0)
Out[616]:
array([[2, 3],
[4, 5],
[6, 0],
[9, 0]])
I'm using the a1 create and fill approach because I'm too lazy to figure out how to concatenate a[ind,0] with enough 0s to make the same thing. :)
As Divakar shows, np.in1d is a handy way of finding the matches
In [617]: np.in1d(a[:,0],b[:,0])
Out[617]: array([ True, True, False, False], dtype=bool)
In [618]: np.nonzero(~np.in1d(a[:,0],b[:,0]))
Out[618]: (array([2, 3], dtype=int32),)
In [619]: np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
Out[619]: array([2, 3], dtype=int32)
In [620]: ind=np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
If you don't care about the order a[ind,0] can also be gotten with np.setdiff1d(a[:,0],b[:,0]) (the values will be sorted).
Assuming you are working on a single dimensional array:
import numpy as np
a = np.linspace(1, 90, 90)
b = np.array([1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,
21,22,23,24,25,27,28,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,46,47,48,49,50,51,52,53,54,55,56,
57,58,59,60,61,62,63,64,65,67,70,72,73,74,75,76,
77,78,79,80,81,82,84,85,86,87,88,89,90])
m_num = np.setxor1d(a, b).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num), m_num))
This also works in a 2D space:
t1 = np.reshape(a, (10, 9))
t2 = np.reshape(b, (10, 8))
m_num2 = np.setxor1d(t1, t2).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num2), m_num2))

Categories

Resources