Numpy array (list of lists), add values at the end - python

I have two array's:
In [32]: a
Out[32]:
array([[1, 2, 3],
[2, 3, 4]])
In [33]: b
Out[33]:
array([[ 8, 9],
[ 9, 10]])
I would like to get the following:
In [35]: c
Out[35]:
array([[ 1, 2, 3, 8, 9],
[ 2, 3, 4, 9, 10]])
i.e. apped the first and second value of b[0] = array([8, 9]) as the last two values of a[0]
and append the first and second value of b[1] = array([9,10]) as the last two values of a[1].
The second answer in this link: How to add multiple extra columns to a NumPy array does not work and I do not understand the accepted answer.

You could try with np.hstack:
a=np.array([[1, 2, 3],
[2, 3, 4]])
b=np.array([[ 8, 9],
[ 9, 10]])
print(np.hstack((a,b)))
output:
[[ 1 2 3 8 9]
[ 2 3 4 9 10]]
Or since the first answer of link you attached is faster than concatenate, and as you can see G.Anderson's timings, the fastest was concatenate, here is an explanation, so you can use that first answer:
#So you create an array of the same shape that the expected concatenate output:
res = np.zeros((2,5),int)
res
[[0 0 0 0 0]
[0 0 0 0 0]]
#Then you assign res[:,:3] to fisrt array, where res[:,:3] that is the first 3 elements of each row
res[:,:3]
[[0 0 0]
[0 0 0]]
res[:,:3]=a #assign
res[:,:3]
[[1, 2, 3]
[2, 3, 4]]
#Then you assign res[:,3:] to fisrt array, where res[:,3:] that is the last two elements of eah row
res[:,3:]
[[0 0]
[0 0]]
res[:,3:]=b #assign
res[:,3:]
[[ 8, 9]
[ 9, 10]]
#And finally:
res
[[ 1 2 3 8 9]
[ 2 3 4 9 10]]

You can do concatenate:
np.concatenate([a,b], axis=1)
Output:
array([[ 1, 2, 3, 8, 9],
[ 2, 3, 4, 9, 10]])

You can use np.append with the axis parameter for joining two arrays on a given axis
np.append(a,b, axis=1)
array([[ 1, 2, 3, 8, 9],
[ 2, 3, 4, 9, 10]])
Adding timings for the top three answers, for completeness sake. Note that these timings will vary based on the machine running the code, and may scale at different rates for different sizes of array
%timeit np.append(a,b, axis=1)
2.81 µs ± 438 ns per loop
%timeit np.concatenate([a,b], axis=1)
2.32 µs ± 375 ns per loop
%timeit np.hstack((a,b))
4.41 µs ± 489 ns per loop

from numpy documentation about numpy.concatenate
Join a sequence of arrays along an existing axis.
and from the question, I understood is that what you want
import numpy as np
a = np.array([[1, 2, 3],
[2, 3, 4]])
b = np.array([[ 8, 9],
[ 9, 10]])
c = np.concatenate((a, b), axis=1)
print ("a: ", a)
print ("b: ", b)
print ("c: ", c)
output:
a: [[1 2 3]
[2 3 4]]
b: [[ 8 9]
[ 9 10]]
c: [[ 1 2 3 8 9]
[ 2 3 4 9 10]]

Related

how to understand such shuffling data code in Numpy

I am learning at Numpy and I want to understand such shuffling data code as following:
# x is a m*n np.array
# return a shuffled-rows array
def shuffle_col_vals(x):
rand_x = np.array([np.random.choice(x.shape[0], size=x.shape[0], replace=False) for i in range(x.shape[1])]).T
grid = np.indices(x.shape)
rand_y = grid[1]
return x[(rand_x, rand_y)]
So I input an np.array object as following:
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
And I get a output of shuffle_col_vals(x1) like comments as following:
array([[ 1, 5, 11, 15],
[ 3, 8, 9, 14],
[ 4, 6, 12, 16],
[ 2, 7, 10, 13]], dtype=int64)
I get confused about the initial way of rand_x and I didn't get such way in numpy.array
And I have been thinking it a long time, but I still don't understand why return x[(rand_x, rand_y)] will get a shuffled-rows array.
If not mind, could anyone explain the code to me?
Thanks in advance.
In indexing Numpy arrays, you can take single elements. Let's use a 3x4 array to be able to differentiate between the axes:
In [1]: x1 = np.array([[1, 2, 3, 4],
...: [5, 6, 7, 8],
...: [9, 10, 11, 12]], dtype=int)
In [2]: x1[0, 0]
Out[2]: 1
If you review Numpy Advanced indexing, you will find that you can do more in indexing, by providing lists for each dimension. Consider indexing with x1[rows..., cols...], let's take two elements.
Pick from the first and second row, but always from the first column:
In [3]: x1[[0, 1], [0, 0]]
Out[3]: array([1, 5])
You can even index with arrays:
In [4]: x1[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]
Out[4]:
array([[1, 2],
[5, 6]])
np.indices creates a row and col array, that if used for indexing, give back the original array:
In [5]: grid = np.indices(x1.shape)
In [6]: np.alltrue(x1[grid[0], grid[1]] == x1)
Out[6]: True
Now if you shuffle the values of grid[0] col-wise, but keep grid[1] as-is, and then use these for indexing, you get an array with the values of the columns shuffled.
Each column index vector is [0, 1, 2]. The code now shuffles these column index vectors for each column individually, and stacks them together into rand_x into the same shape as x1.
Create a single shuffled column index vector:
In [7]: np.random.seed(0)
In [8]: np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
Out[8]: array([2, 1, 0])
The stacking works by (pseudo-code) stacking with [random-index-col-vec for cols in range(x1.shape[1])] and then transposing (.T).
To make it a little clearer we can rewrite i as col and use column_stack instead of np.array([... for col]).T:
In [9]: np.random.seed(0)
In [10]: col_list = [np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
for col in range(x1.shape[1])]
In [11]: col_list
Out[11]: [array([2, 1, 0]), array([2, 0, 1]), array([0, 2, 1]), array([2, 0, 1])]
In [12]: rand_x = np.column_stack(col_list)
In [13]: rand_x
Out[13]:
array([[2, 2, 0, 2],
[1, 0, 2, 0],
[0, 1, 1, 1]])
In [14]: x1[rand_x, grid[1]]
Out[14]:
array([[ 9, 10, 3, 12],
[ 5, 2, 11, 4],
[ 1, 6, 7, 8]])
Details to note:
the example output you give is different from what the function you provide does. It seems to be transposed.
the use of rand_x and rand_y in the sample code can be confusing when being used to the convention of x=column index, y=row index
See output:
import numpy as np
def shuffle_col_val(x):
print("----------------------------\n A rand_x\n")
f = np.random.choice(x.shape[0], size=x.shape[0], replace=False)
print(f, "\nNow I transpose an array.")
rand_x = np.array([f]).T
print(rand_x)
print("----------------------------\n B rand_y\n")
print("Grid gives you two possibilities\n you choose second:")
grid = np.indices(x.shape)
print(format(grid))
rand_y = grid[1]
print("\n----------------------------\n C Our rand_x, rand_y:")
print("\nThe order of values in the column CHANGE:\n has random order\n{}".format(rand_x))
print("\nThe order of values in the row NO CHANGE:\n has normal order 0, 1, 2, 3\n{}".format(rand_y))
return x[(rand_x, rand_y)]
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
print("\n----------------------------\n D Our shuffled-rows: \n{}\n".format(shuffle_col_val(x1)))
Output:
A rand_x
[2 3 0 1]
Now I transpose an array.
[[2]
[3]
[0]
[1]]
----------------------------
B rand_y
Grid gives you two possibilities, you choose second:
[[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]]
----------------------------
C Our rand_x, rand_y:
The order of values in the column CHANGE: has random order
[[2]
[3]
[0]
[1]]
The order of values in the row NO CHANGE: has normal order 0, 1, 2, 3
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
----------------------------
D Our shuffled-rows:
[[ 9 10 11 12]
[13 14 15 16]
[ 1 2 3 4]
[ 5 6 7 8]]

How to select value from array that is closest to value in array using vectorization?

I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.
The catch is the size of the choices is defined at runtime.
import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
If choices was static in size, I would simply use np.where
d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]),
np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))
To get the output:
>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]
Is there a way to do this more dynamically while still preserving the vectorization.
Subtract choices from a, find the index of the minimum of the result, substitute.
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 5 5]
[10 10 10]]
a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 10 5]
[10 1 10]]
>>>
The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.
For the second example:
>>> print b
[[[ 1 5 10]
[ 2 2 7]
[ 1 5 10]]
[[ 3 1 6]
[ 7 3 2]
[ 3 1 6]]
[[ 8 4 1]
[ 0 4 9]
[ 8 4 1]]]
>>> print i
[[0 0 0]
[1 2 1]
[2 0 2]]
>>>
The final assignment uses an Index Array or Integer Array Indexing.
In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.
To explain wwii's excellent answer in a little more detail:
The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:
>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1, 5, 10],
[ 1, 5, 10],
[ 1, 5, 10]],
[[ 3, 1, 6],
[ 3, 1, 6],
[ 3, 1, 6]],
[[ 8, 4, 1],
[ 8, 4, 1],
[ 8, 4, 1]]])
Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:
>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
Which finally allows you to choose those elements from choices:
>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1, 1, 1],
[ 5, 5, 5],
[10, 10, 10]])
For a non-symmetric shape:
Let's say a had shape (2, 5):
>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Then you'd get:
>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1, 5, 10],
[ 0, 4, 9],
[ 1, 3, 8],
[ 2, 2, 7],
[ 3, 1, 6]],
[[ 4, 0, 5],
[ 5, 1, 4],
[ 6, 2, 3],
[ 7, 3, 2],
[ 8, 4, 1]]])
This is hard to read, but what it's saying is, b has shape:
>>> b.shape
(2, 5, 3)
The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:
>>> b[:, :, 0] # = abs(a - 1)
array([[1, 0, 1, 2, 3],
[4, 5, 6, 7, 8]])
>>> b[:, :, 1] # = abs(a - 5)
array([[5, 4, 3, 2, 1],
[0, 1, 2, 3, 4]])
>>> b[:, :, 2] # = abs(a - 10)
array([[10, 9, 8, 7, 6],
[ 5, 4, 3, 2, 1]])
Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.
Hope that helps explain this a little more clearly.
I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -
def searchsorted_app(a, choices):
lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
cl = np.take(choices,lidx) # Or choices[lidx]
cr = np.take(choices,ridx) # Or choices[ridx]
mask = np.abs(a - cl) > np.abs(a - cr)
cl[mask] = cr[mask]
return cl
Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.
Runtime test -
In [160]: # Setup inputs
...: a = np.random.rand(100,100)
...: choices = np.sort(np.random.rand(100))
...:
In [161]: def broadcasting_app(a, choices): # #wwii's solution
...: return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
...:
In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True
In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop
In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop
Related post : Find elements of array one nearest to elements of array two

Python NumPy: Performing different column operations over every N rows

I have a large NumPy array (OriginalArray) with many rows and 8 columns.
I want to create a new array (NewArray) in which each row has the following properties:
Columns 1, 3, 5, and 7 of NewArray are the sum over N rows of columns 1, 3, 5, and 7 of OriginalArray
Columns 2, 4, 6, and 8 of NewArray are the mean over N rows of columns 2, 4, 6, and 8 of OriginalArray
So, the NewArray has 1/N as many rows as the OriginalArray.
For example:
Original Array = [1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 ]
with N = 2
NewArray = [2 1 2 1 2 1 2 1
2 1 2 1 2 1 2 1]
Please excuse the messy formatting. I'm still very new at this (my first question here, actually).
Thanks!
Here's a vectorized approach making heavy usage of slicing -
nrows = a.shape[0]//N # a is input array
out = np.empty((nrows,8))
out[:,::2] = a[:,::2].reshape(-1,N,4).sum(1)
out[:,1::2] = a[:,1::2].reshape(-1,N,4).mean(1)
Sample run -
In [64]: a # Input array
Out[64]:
array([[5, 1, 5, 8, 5, 0, 3, 1],
[0, 7, 8, 7, 0, 3, 5, 1],
[8, 6, 6, 4, 1, 6, 1, 2],
[4, 5, 5, 7, 5, 2, 1, 2]])
In [65]: N = 2 # Summing/averaging length
In [66]: a[:,::2] # Select [1,3,5,7] cols
Out[66]:
array([[5, 5, 5, 3],
[0, 8, 0, 5],
[8, 6, 1, 1],
[4, 5, 5, 1]])
In [67]: a[:,::2].reshape(-1,N,4).sum(1) # Sum N rows by splitting axis
Out[67]:
array([[ 5, 13, 5, 8],
[12, 11, 6, 2]])
In [68]: a[:,1::2] # Select [2,4,6,8] cols
Out[68]:
array([[1, 8, 0, 1],
[7, 7, 3, 1],
[6, 4, 6, 2],
[5, 7, 2, 2]])
In [69]: a[:,1::2].reshape(-1,N,4).mean(1) # Similarly average across N rows
Out[69]:
array([[ 4. , 7.5, 1.5, 1. ],
[ 5.5, 5.5, 4. , 2. ]])
I'm assuming that your original_array (note the PEP8 style) is already formatted in rows and columns. By this I mean, original_array = np.array([[1,1...],[1,...],[1,...],[1,...]])
An easy one-liner to create a single row of new_array would be as follows:
import numpy as np
row = [np.sum(original_array[:,x]) if x%2==1 else np.mean(test[:,x]) for x in range(len(original_array[0]))]
And then to copy the row, simply:
new_array = [row]*N

numpy subtract every row of matrix by vector

So I have a n x d matrix and an n x 1 vector. I'm trying to write a code to subtract every row in the matrix by the vector.
I currently have a for loop that iterates through and subtracts the i-th row in the matrix by the vector. Is there a way to simply subtract an entire matrix by the vector?
Thanks!
Current code:
for i in xrange( len( X1 ) ):
X[i,:] = X1[i,:] - X2
This is where X1 is the matrix's i-th row and X2 is vector. Can I make it so that I don't need a for loop?
That works in numpy but only if the trailing axes have the same dimension. Here is an example of successfully subtracting a vector from a matrix:
In [27]: print m; m.shape
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
Out[27]: (4, 3)
In [28]: print v; v.shape
[0 1 2]
Out[28]: (3,)
In [29]: m - v
Out[29]:
array([[0, 0, 0],
[3, 3, 3],
[6, 6, 6],
[9, 9, 9]])
This worked because the trailing axis of both had the same dimension (3).
In your case, the leading axes had the same dimension. Here is an example, using the same v as above, of how that can be fixed:
In [35]: print m; m.shape
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Out[35]: (3, 4)
In [36]: (m.transpose() - v).transpose()
Out[36]:
array([[0, 1, 2, 3],
[3, 4, 5, 6],
[6, 7, 8, 9]])
The rules for broadcasting axes are explained in depth here.
In addition to #John1024 answer, "transposing" a one-dimensional vector in numpy can be done like this:
In [1]: v = np.arange(3)
In [2]: v
Out[2]: array([0, 1, 2])
In [3]: v = v[:, np.newaxis]
In [4]: v
Out[4]:
array([[0],
[1],
[2]])
From here, subtracting v from every column of m is trivial using broadcasting:
In [5]: print(m)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
In [6]: m - v
Out[6]:
array([[0, 1, 2, 3],
[3, 4, 5, 6],
[6, 7, 8, 9]])
If you were just creating the vector that gets subtracted, you can also create it with
column_vector = np.array([0,1,2], ndmin=2).T
to get a column vector, which is only possible if it has dimension 2 or more.
One dimensional numpy arrays are always rows and cannot be transposed!
Then you can just do
each_column_of_matrix_minus_vector = matrix - column_vector
to subtract column_vector from every column of matrix.

Conditional index in 2d array in python

I have a 2D array, g, like so:
np.array([
[1 2 3 4],
[5 6 7 8],
[9 10 11 12]
])
So g[0] returns the first row, in other words when I give an index of 0, I get the first row. When I use an index of 1, I get the second row:
g[1] = [5 6 7 8]
and so on.
But I want to return all rows where the index of g is NOT a certain value.
Eg. I want to return g[x] for all x where x != 1.
I know how to use conditional indexing with 1D arrays, but what about 2D arrays? I'm confused here because I'm not putting conditions on what indices to retrieve according to the values, but I need a condition dependent on the indices themselves.
You could use np.arange(len(g)) != 1 to create a boolean index:
In [137]: g
Out[137]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
In [138]: g[np.arange(len(g)) != 1]
Out[138]:
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
If you really want to eliminate just one row, you could, alternatively, use np.concatenate to join two basic slices:
In [143]: np.concatenate([g[:1], g[2:]])
Out[143]:
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
For large arrays, the first method appears to be faster, however:
In [150]: g2 = np.tile(g, (10000,1))
In [153]: %timeit g2[np.arange(len(g)) != 1]
100000 loops, best of 3: 6.9 µs per loop
In [152]: %timeit np.concatenate([g2[:1], g2[2:]])
10000 loops, best of 3: 51.8 µs per loop
unutbu's answer works, but I find placing the computation in the indices... icky. :/
I would do something like this:
rowsidontwant = [1, 3]
listofrows = [ g[i] for i in filter(lambda x: not in rowsidontwant, xrange(len(g))) ]
It's a a little more... general. The list of rows may not be what you want, but you can put the data in whatever form you like after that.

Categories

Resources