Concatenate arrays of different sizes row-wise using numpy - python

I managed to join two csv datasets of same size (i.e. same number of columns) row-wise using np.concatenate.
combined = np.concatenate((price1,price2))
How can I join two csv datasets of different sizes (they contain common headers except that one of the datasets has an additional column) row-wise using numpy?
dataset1's headers : a,b,c,d,e,f,g,h,i,k
dataset2's headers : a,b,c,d,e,f,g,h,i,j (additional column which is not required for analysis),k
Thanks much.

You can use np.delete to remove the extra column and then use np.concatenate
headers = list('abcdefghik')
a = np.arange(len(headers)).reshape(1, -1)
#Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
headers_2 = list('abcdefghijk')
b = np.arange(len(headers_2)*2).reshape(2,-1)
#Output: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
# [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]])
col_to_remove = headers_2.index('j')
np.delete(b, col_to_remove, axis = 1) #note that this does not modify original array, returns a copy.
#Output: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 10],
# [11, 12, 13, 14, 15, 16, 17, 18, 19, 21]])
result = np.concatenate((a, np.delete(b, col_to_remove, axis = 1)))
#Output: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 10],
# [11, 12, 13, 14, 15, 16, 17, 18, 19, 21]])

Related

Remove rows in a 2d-numpy array if they contain a specific element

I have a matrix as 2d-np.array and I would like to remove all rows that contain an element x in a specific column. My goal is to return a matrix without these rows, so it should be smaller.
My function looks like this:
def delete_rows(matrix, x, col):
for i in range(matrix.shape[0]-1):
if(matrix[i,col] == x):
np.delete(matrix, i, axis = 0)
return matrix
Sadly in my test the shape of the matrix stayed the same after removing rows. I think the deleted rows were substituted by rows with 0s.
Any advice on how I can achieve my goal?
EDIT: added condition for the specific column to check
You don't need to use any apply methods for this. It can be solved with basic boolean indexing as follows -
arr[~(arr[:,col] == val),:]
arr[:,col] selects the specific column from array
arr[:,col] == val checks for the value and returns True where it exists, else False
~(arr[:,col] == val) inverses the True and False
arr[~(arr[:,col] == val),:] keeps only the rows which have the boolean index as True and discards all False
Example solution
arr = np.array([[12, 10, 12, 0, 9, 4, 12, 11],
[ 3, 10, 14, 5, 4, 3, 6, 6],
[12, 10, 1, 0, 5, 7, 5, 10],
[12, 8, 14, 14, 12, 3, 14, 10],
[ 9, 14, 3, 8, 1, 10, 9, 6],
[10, 3, 11, 3, 12, 13, 11, 10],
[ 0, 6, 8, 8, 5, 5, 1, 10], #<- this to remove
[13, 6, 1, 10, 7, 10, 10, 13],
[ 3, 3, 8, 10, 13, 0, 0, 10], #<- this to remove
[ 6, 2, 13, 5, 8, 2, 8, 10]])
# ^
# this column to check
#boolean indexing approach
val, col = 8,2 #value to check is 8 and column to check is 2
out = arr[~(arr[:,col] == val),:] #<-----
out
array([[12, 10, 12, 0, 9, 4, 12, 11],
[ 3, 10, 14, 5, 4, 3, 6, 6],
[12, 10, 1, 0, 5, 7, 5, 10],
[12, 8, 14, 14, 12, 3, 14, 10],
[ 9, 14, 3, 8, 1, 10, 9, 6],
[10, 3, 11, 3, 12, 13, 11, 10],
[13, 6, 1, 10, 7, 10, 10, 13],
[ 6, 2, 13, 5, 8, 2, 8, 10]])
If you want to check for the value in all columns then try this -
arr[~(arr == val).any(1),:]
And if you want to keep ONLY rows with the value instead, just remove ~ from the condition.
arr[(arr[:,col] == val),:]
If you want to remove the column as well, using np.delete -
np.delete(arr[~(arr[:,col] == val),], col, axis=1)
Note: You cannot remove both rows and columns at once using np.delete so if you plan to use it, you will need to do np.delete two times once for axis = 0 (rows) and once for axis = 1 (columns)
This can be simply done in one line:
import numpy as np
def delete_rows(matrix, x, col):
return matrix[matrix[:,col]!=x,:]
For example, if we want to remove all the rows that contain 5 in the second column from matrix A:
>>> A = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> print(delete_rows(A, 5, 1))
[[1 2 3]
[7 8 9]]
Assuming you have an array like this:
array([[12, 5, 0, 3, 11, 3, 7, 9, 3, 5],
[ 2, 4, 7, 6, 8, 8, 12, 10, 1, 6],
[ 7, 7, 14, 8, 1, 5, 9, 13, 8, 9],
[ 4, 3, 0, 3, 5, 14, 0, 2, 3, 8],
[ 1, 3, 13, 3, 3, 14, 7, 0, 1, 9],
[ 9, 0, 10, 4, 7, 3, 14, 11, 2, 7],
[12, 2, 0, 0, 4, 5, 5, 6, 8, 4],
[ 1, 4, 9, 10, 10, 8, 1, 1, 7, 9],
[ 9, 3, 6, 7, 11, 14, 2, 11, 0, 14],
[ 3, 5, 12, 9, 10, 4, 11, 4, 6, 4]])
You can remove all rows containing a 3 like this:
row_mask = np.apply_along_axis(np.any, 1, arr == 3)
arr = arr[~row_mask]
Your new array looks like this
array([[ 2, 4, 7, 6, 8, 8, 12, 10, 1, 6],
[ 7, 7, 14, 8, 1, 5, 9, 13, 8, 9],
[12, 2, 0, 0, 4, 5, 5, 6, 8, 4],
[ 1, 4, 9, 10, 10, 8, 1, 1, 7, 9]])

How do I shift specific elements of a tensor with torch.roll?

I have a tensor x, that looks like this:
x = tensor([ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]
[ 11, 12, 13, 14, 15])
I'm trying to switch the first two and last two numbers of each tensor, like this:
x = tensor([ 4, 5, 3, 1, 2],
[ 9, 10, 8, 6, 7],
[ 14, 15, 13, 11, 12])
How could I do this with torch.roll()? How would I switch 3 instead of 1?
Not sure if that can be done with torch.roll alone... However, you can expect the desired result by using a temporary tensor and a pair assignment:
>>> x = torch.arange(1, 16).reshape(3,-1)
tensor([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
>>> tmp = x.clone()
# swap the two sets of columns
>>> x[:,:2], x[:,-2:] = tmp[:,-2:], tmp[:,:2]
Such that tensor x has been mutated as:
>>> x
tensor([[ 4, 5, 3, 1, 2],
[ 9, 10, 8, 6, 7],
[14, 15, 13, 11, 12]])
You can pull off this operation with torch.roll and some indexing:
>>> x = torch.arange(1, 21).reshape(4,-1)
tensor([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]])
>>> rolled = x.roll(-2,0)
tensor([[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
# overwrite columns [1,-1[ from rolled with those from x
>>> rolled[:, 1:-1] = x[:, 1:-1]
Such that at this end you get:
>>> rolled
tensor([[11, 2, 3, 4, 15],
[16, 7, 8, 9, 20],
[ 1, 12, 13, 14, 5],
[ 6, 17, 18, 19, 10]])

Preserving sequential order of numpy 2D arrays

Given this 2D numpy array:
a=numpy.array([[31,22,43],[44,55,6],[17,68,19],[12,11,18],...,[99,98,97]])
given the need to flatten it using numpy.ravel:
b=numpy.ravel(a)
and given the need to later dump it into a pandas dataframe, how can I make sure the sequential order of the values in a is preserved when applying numpy.ravel? e.g., How can I check/ensure that numpy.ravel does not mess up with the original sequential order?
Of course the intended result should be that the numbers coming before and after 17 in b, for instance, are the same as in a.
First of all you need to formulate what "sequential" order means for you, as numpy.ravel() does preserve order. Here is a tip how to formulate what you need: try with a simplest possible toy example:
import numpy as np
X = np.arange(20).reshape(-1,4)
X
#array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15],
# [16, 17, 18, 19]])
X.ravel()
# array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
# 13, 14, 15, 16, 17, 18, 19])
Does it meet your expectation? Or you want to see this order:
Z = X.T
Z
# array([[ 0, 4, 8, 12, 16],
# [ 1, 5, 9, 13, 17],
# [ 2, 6, 10, 14, 18],
# [ 3, 7, 11, 15, 19]])
Z.ravel()
# array([ 0, 4, 8, 12, 16, 1, 5, 9, 13, 17, 2, 6, 10,
# 14, 18, 3, 7, 11, 15, 19])

Shuffle ordering of some rows in numpy array

I want to shuffle the ordering of only some rows in a numpy array. These rows will always be continuous (e.g. shuffling rows 23-80). The number of elements in each row can vary from 1 (such that the array is actually 1D) to 100.
Below is example code to demonstrate how I see the method shuffle_rows() could work. How would I design such a method to do this shuffling efficiently?
import numpy as np
>>> a = np.arange(20).reshape(4, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> shuffle_rows(a, [1, 3]) # including rows 1, 2 and 3 in the shuffling
array([[ 0, 1, 2, 3, 4],
[15, 16, 17, 18, 19],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
You can use np.random.shuffle. This shuffles the rows themselves, not the elements within the rows.
From the docs:
This function only shuffles the array along the first index of a multi-dimensional array
As an example:
import numpy as np
def shuffle_rows(arr,rows):
np.random.shuffle(arr[rows[0]:rows[1]+1])
a = np.arange(20).reshape(4, 5)
print(a)
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19]])
shuffle_rows(a,[1,3])
print(a)
#array([[ 0, 1, 2, 3, 4],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [ 5, 6, 7, 8, 9]])
shuffle_rows(a,[1,3])
print(a)
#array([[ 0, 1, 2, 3, 4],
# [10, 11, 12, 13, 14],
# [ 5, 6, 7, 8, 9],
# [15, 16, 17, 18, 19]])

numpy reshaping multdimensional array through arbitrary axis

so this is a question regarding the use of reshape and how this functions uses each axis on a multidimensional scale.
Suppose I have the following array that contains matrices indexed by the first index. What I want to achieve is to instead index the columns of each matrix with the first index. In order to illustrate this problem, consider the following example where the given numpy array that indexes matrices with its first index is z.
x = np.arange(9).reshape((3, 3))
y = np.arange(9, 18).reshape((3, 3))
z = np.dstack((x, y)).T
Where z looks like:
array([[[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8]],
[[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]]])
And its shape is (2, 3, 3). Here, the first index are the two images and the three x three is a matrix.
The question more specifically phrased then, is how to use reshape to obtain the following desired output:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
Whose shape is (6, 3). This achieves that the dimension of the array indexes the columns of the matrix x and y as presented above. My natural inclination was to use reshape directly on z in the following way:
out = z.reshape(2 * 3, 3)
But its output is the following which indexes the rows of the matrices and not the columns:
array([[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]]
Could reshape be used to obtain the desired output above? Or more general, can you control how each axis is used when you use the reshape function?
Two things:
I know how to solve the problem. I can go through each element of the big matrix (z) transposed and then apply reshape in the way above. This increases computation time a little bit and is not really problematic. But it does not generalize and it does not feel python. So I was wondering if there is a standard enlightened way of doing this.
I was not clear on how to phrase this question. If anyone has suggestion on how to better phrase this problem I am all ears.
Every array has a natural (1D flattened) order to its elements. When you reshape an array, it is as though it were flattened first (thus obtaining the natural order), and then reshaped:
In [54]: z.ravel()
Out[54]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [55]: z.ravel().reshape(2*3, 3)
Out[55]:
array([[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]])
Notice that in the "natural order", 0 and 1 are far apart. However you reshape it, 0 and 1 will not be next to each other along the last axis, which is what you want in the desired array:
desired = np.array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
This requires some reordering, which in this case can be done by swapaxes:
In [53]: z.swapaxes(1,2).reshape(2*3, 3)
Out[53]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
because swapaxes(1,2) places the values in the desired order
In [56]: z.swapaxes(1,2).ravel()
Out[56]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
In [57]: desired.ravel()
Out[57]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
Note that the reshape method also has a order parameter which can be used to control the (C- or F-) order with which the elements are read from the array and placed in the reshaped array. However, I don't think this helps in your case.
Another way to think about the limits of reshape is to say that all reshapes followed by ravel are the same:
In [71]: z.reshape(3,3,2).ravel()
Out[71]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [72]: z.reshape(3,2,3).ravel()
Out[72]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [73]: z.reshape(3*2,3).ravel()
Out[73]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [74]: z.reshape(3*3,2).ravel()
Out[74]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
So if the ravel of the desired array is different, there is no way to obtain it only be reshaping.
The same goes for reshaping with order='F', provided you also ravel with order='F':
In [109]: z.reshape(2,3,3, order='F').ravel(order='F')
Out[109]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
In [110]: z.reshape(2*3*3, order='F').ravel(order='F')
Out[110]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
In [111]: z.reshape(2*3,3, order='F').ravel(order='F')
Out[111]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
It is possible to obtain the desired array using two reshapes:
In [83]: z.reshape(2, 3*3, order='F').reshape(2*3, 3)
Out[83]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
but I stumbled upon this serendipidously.
If I've totally misunderstood your question and x and y are the givens (not z) then you could obtain the desired array using row_stack instead of dstack:
In [88]: z = np.row_stack([x, y])
In [89]: z
Out[89]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
It you look at dstack code you'll discover that
np.dstack((x, y)).T
is effectively:
np.concatenate([i[:,:,None] for i in (x,y)],axis=2).transpose([2,1,0])
It reshapes each component array and then joins them along this new axis. Finally it transposes axes.
Your target is the same as (row stack)
np.concatenate((x,y),axis=0)
So with a bit of reverse engineering we can create it from z with
np.concatenate([i[...,0] for i in np.split(z.T,2,axis=2)],axis=0)
np.concatenate([i.T[:,:,0] for i in np.split(z,2,axis=0)],axis=0)
or
np.concatenate(np.split(z.T,2,axis=2),axis=0)[...,0]
or with a partial transpose we can keep the split-and-rejoin axis first, and just use concatenate:
np.concatenate(z.transpose(0,2,1),axis=0)
or its reshape equivalent
(z.transpose(0,2,1).reshape(-1,3))

Categories

Resources