Related
I have 2 numpy arrays, one 2D and the other 1D, for example like this:
import numpy as np
a = np.array(
[
[1, 2],
[3, 4],
[5, 6]
]
)
b = np.array(
[7, 8, 9, 10]
)
I want to get all possible combinations of the elements in a and b, treating a like a 1D array, so that it leaves the rows in a intact, but also joins the rows in a with the items in b. It would look something like this:
>>> combine1d(a, b)
[ [1 2 7] [1 2 8] [1 2 9] [1 2 10]
[3 4 7] [3 4 8] [3 4 9] [3 4 10]
[5 6 7] [5 6 8] [5 6 9] [5 6 10] ]
I know that there are slow solutions for this (like a for loop), but I need a fast solution to this as I am working with datasets with millions of integers.
Any ideas?
This is one of those cases where it's easier to build a higher dimensional object, and then fix the axes when you're done. The first two dimensions are the length of b and the length of a. The third dimension is the number of elements in each row of a plus 1. We can then use broadcasting to fill in this array.
x, y = a.shape
z, = b.shape
result = np.empty((z, x, y + 1))
result[...,:y] = a
result[...,y] = b[:,None]
At this point, to get the exact answer you asked for, you'll need to swap the first two axes, and then merge those two axes into a single axis.
result.swapaxes(0, 1).reshape(-1, y + 1)
An hour later. . . .
I realized by being a little bit more clever, I didn't need to swap axes. This also has the nice benefit that the result is a contiguous array.
def convert1d(a, b):
x, y = a.shape
z, = b.shape
result = np.empty((x, z, y + 1))
result[...,:y] = a[:,None,:]
result[...,y] = b
return result.reshape(-1, y + 1)
this is very "scotch tape" solution:
import numpy as np
a = np.array(
[
[1, 2],
[3, 4],
[5, 6]
]
)
b = np.array(
[7, 8, 9, 10]
)
z = []
for x in b:
for y in a:
z.append(np.append(y, x))
np.array(z).reshape(3, 4, 3)
You need to use np.c_ to attach to join two dataframe. I also used np.full to generate a column of second array (b). The result are like what follows:
result = [np.c_[a, np.full((a.shape[0],1), x)] for x in b]
result
Output
[array([[1, 2, 7],
[3, 4, 7],
[5, 6, 7]]),
array([[1, 2, 8],
[3, 4, 8],
[5, 6, 8]]),
array([[1, 2, 9],
[3, 4, 9],
[5, 6, 9]]),
array([[ 1, 2, 10],
[ 3, 4, 10],
[ 5, 6, 10]])]
The output might be kind of messy. But it's exactly like what you mentioned as your desired output. To make sure, you cun run below to see what comes from the first element in the result array:
print(result[0])
Output
array([[1, 2, 7],
[3, 4, 7],
[5, 6, 7]])
I have two array's:
In [32]: a
Out[32]:
array([[1, 2, 3],
[2, 3, 4]])
In [33]: b
Out[33]:
array([[ 8, 9],
[ 9, 10]])
I would like to get the following:
In [35]: c
Out[35]:
array([[ 1, 2, 3, 8, 9],
[ 2, 3, 4, 9, 10]])
i.e. apped the first and second value of b[0] = array([8, 9]) as the last two values of a[0]
and append the first and second value of b[1] = array([9,10]) as the last two values of a[1].
The second answer in this link: How to add multiple extra columns to a NumPy array does not work and I do not understand the accepted answer.
You could try with np.hstack:
a=np.array([[1, 2, 3],
[2, 3, 4]])
b=np.array([[ 8, 9],
[ 9, 10]])
print(np.hstack((a,b)))
output:
[[ 1 2 3 8 9]
[ 2 3 4 9 10]]
Or since the first answer of link you attached is faster than concatenate, and as you can see G.Anderson's timings, the fastest was concatenate, here is an explanation, so you can use that first answer:
#So you create an array of the same shape that the expected concatenate output:
res = np.zeros((2,5),int)
res
[[0 0 0 0 0]
[0 0 0 0 0]]
#Then you assign res[:,:3] to fisrt array, where res[:,:3] that is the first 3 elements of each row
res[:,:3]
[[0 0 0]
[0 0 0]]
res[:,:3]=a #assign
res[:,:3]
[[1, 2, 3]
[2, 3, 4]]
#Then you assign res[:,3:] to fisrt array, where res[:,3:] that is the last two elements of eah row
res[:,3:]
[[0 0]
[0 0]]
res[:,3:]=b #assign
res[:,3:]
[[ 8, 9]
[ 9, 10]]
#And finally:
res
[[ 1 2 3 8 9]
[ 2 3 4 9 10]]
You can do concatenate:
np.concatenate([a,b], axis=1)
Output:
array([[ 1, 2, 3, 8, 9],
[ 2, 3, 4, 9, 10]])
You can use np.append with the axis parameter for joining two arrays on a given axis
np.append(a,b, axis=1)
array([[ 1, 2, 3, 8, 9],
[ 2, 3, 4, 9, 10]])
Adding timings for the top three answers, for completeness sake. Note that these timings will vary based on the machine running the code, and may scale at different rates for different sizes of array
%timeit np.append(a,b, axis=1)
2.81 µs ± 438 ns per loop
%timeit np.concatenate([a,b], axis=1)
2.32 µs ± 375 ns per loop
%timeit np.hstack((a,b))
4.41 µs ± 489 ns per loop
from numpy documentation about numpy.concatenate
Join a sequence of arrays along an existing axis.
and from the question, I understood is that what you want
import numpy as np
a = np.array([[1, 2, 3],
[2, 3, 4]])
b = np.array([[ 8, 9],
[ 9, 10]])
c = np.concatenate((a, b), axis=1)
print ("a: ", a)
print ("b: ", b)
print ("c: ", c)
output:
a: [[1 2 3]
[2 3 4]]
b: [[ 8 9]
[ 9 10]]
c: [[ 1 2 3 8 9]
[ 2 3 4 9 10]]
I have some 3-dimensional array and I want to take from each array the value at the same position and then copy it into an array with the name of the position.
E.g I have three 2x2x2 array and I want to take the value at position (1,1,1) of each of those matrices and copy it into an array called 111array. This array then should contain three values
The same should be done for all values and all positions in a matrix
I have a for loop which iterates over all values in one array. But I dont know how to save the result to an array in a correct way, that the array name displays the position number.
My first array is called b.
for i in range(b.shape[0]):
for j in range(b.shape[1]):
for k in range(b.shape[2]):
print(b[i,j,k])
Looking for help!
Looks like someone else beat me to an answer, but here is another way of doing it. I used a dictionary to corral all the arrays and return it from a function.
import numpy as np
b = np.array([0, 1, 2, 3, 4, 5, 6, 7])
b = np.reshape(b, (2, 2, 2))
print(b, type(b))
# [[[0 1],
# [2 3]],
# [[4 5],
# [6 7]]] <class 'numpy.ndarray'>
def myfunc(arr):
for i in range(b.shape[0]):
for j in range(b.shape[1]):
for k in range(b.shape[2]):
# Create a new array name from string parts.
name = "arr"+str(i)+str(j)+str(k)
print(name, b[i, j, k])
# Example: 'arr000', 0.
# Add a new key-value pair to the dictionary.
mydict.update({name: b[i,j,k]})
return(mydict)
mydict = {}
result = myfunc(b)
print(result)
# {'arr000': 0, 'arr001': 1, 'arr010': 2, 'arr011': 3, 'arr100': 4,
# 'arr101': 5, 'arr110': 6, 'arr111': 7}
# You would need to unpack the dictionary to use the arrays separately.
# use "mydict.keys()" to get all array names.
# "for key in keys" to loop through all array names.
# mydict['arr000'] will return the value 0.
Your question tags "numpy" but does not use it in your code snippet. If you are trying to stick with numpy, there is another method called "structured data array". It's similar to a dictionary in that "name" and "value" can be stored as paired sets in a numpy array. This keeps numpy's efficient memory management and fast calculation (C optimization). This matters if you are working with large datasets.
Also if working with numpy, there may be a way to use the index values in variable names.
Later, I will think of examples for both and update my answer if possible.
See if this is what you want. This is based on your example.
import numpy as np
from itertools import product
a = np.arange(8).reshape(2,2,2)
b = a + 1
c = a + 2
indices = product(range(2), repeat=3)
all_arrays = []
for i in indices:
suffix = ''.join(map(str,i))
array_name = 'array'+suffix
value = np.array([a[i],b[i],c[i]])
exec(array_name+'= value')
exec(f'all_arrays.append({array_name})')
for name in all_arrays:
print(name)
print('\n')
print(all_arrays)
print('\n')
print(array111)
print('\n')
print(array101)
Output:
[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]
[5 6 7]
[6 7 8]
[7 8 9]
[array([0, 1, 2]), array([1, 2, 3]), array([2, 3, 4]), array([3, 4, 5]), array([4, 5, 6]), array([5, 6, 7]), array([6, 7, 8]), array([7, 8, 9])]
[7 8 9]
[5 6 7]
As others have pointed out, this seems like a weird request. But just for fun, here's a shorter solution:
In [1]: import numpy as np
...: A = np.arange(8).reshape((2,2,2))
...: B = 10*A
...: C = 100*A
In [2]: A
Out[2]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
In [3]: D = np.concatenate((A[None], B[None], C[None]))
...: for (a,b,c) in np.ndindex((2,2,2)):
...: locals()[f'array{a}{b}{c}'] = D[:,a,b,c]
...:
In [4]: array000
Out[4]: array([0, 0, 0])
In [5]: array001
Out[5]: array([ 1, 10, 100])
In [6]: array010
Out[6]: array([ 2, 20, 200])
In [7]: array011
Out[7]: array([ 3, 30, 300])
In [8]: array100
Out[8]: array([ 4, 40, 400])
In [9]: array101
Out[9]: array([ 5, 50, 500])
In [10]: array110
Out[10]: array([ 6, 60, 600])
In [11]: array111
Out[11]: array([ 7, 70, 700])
I would like to translate a matlab code into a python one. The matlab code is equivalent to the following toy example:
a = [1 2 3; 4 5 6; 7 8 9]
b = a(:, ones(1,3))
It returns
a =
1 2 3
4 5 6
7 8 9
b =
1 1 1
4 4 4
7 7 7
I tried to translate it like this:
from numpy import array
from numpy import ones
a = array([ [1,2,3], [4,5,6], [7,8,9] ])
b = a[:][ones((1,3))]
but it returns the following error message:
Traceback (most recent call last):
File "example_slice.py", line 6, in
b =a[:, ones((1,3))]
IndexError: arrays used as indices must be of integer (or boolean) type
EDIT: maybe ones should be replaced by zeros in this particular case but it is not the problem here. The question deals with the problem of giving a list containing the same index many times to the array a in order to get the same array b as the one computed with Matlab.
The MATLAB code can also be written (more idiomatically and more clearly) as:
b = repmat(a(:,1),1,3);
In NumPy you'd write:
b = np.tile(a[:,None,0],(1,3))
(Note the None needed to preserve the orientation of the vector extracted).
You could use list comprehension with np.full() to create arrays of certain values.
import numpy as np
a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
b = np.array([np.full(len(i), i[0]) for i in a])
print(b)
Output:
[[1 1 1]
[4 4 4]
[7 7 7]]
In [568]: a = np.array([ [1,2,3], [4,5,6], [7,8,9] ])
In [569]: a[:,0]
Out[569]: array([1, 4, 7])
In [570]: a[:,[0,0,0]]
Out[570]:
array([[1, 1, 1],
[4, 4, 4],
[7, 7, 7]])
In [571]: a[:, np.zeros(3, dtype=int)] # int dtype to avoid your error
Out[571]:
array([[1, 1, 1],
[4, 4, 4],
[7, 7, 7]])
====
In [572]: np.zeros(3)
Out[572]: array([0., 0., 0.])
In [573]: np.zeros(3, int)
Out[573]: array([0, 0, 0])
Earlier numpy versions allowed float indices, but newer ones have tightened the requirement.
I am learning at Numpy and I want to understand such shuffling data code as following:
# x is a m*n np.array
# return a shuffled-rows array
def shuffle_col_vals(x):
rand_x = np.array([np.random.choice(x.shape[0], size=x.shape[0], replace=False) for i in range(x.shape[1])]).T
grid = np.indices(x.shape)
rand_y = grid[1]
return x[(rand_x, rand_y)]
So I input an np.array object as following:
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
And I get a output of shuffle_col_vals(x1) like comments as following:
array([[ 1, 5, 11, 15],
[ 3, 8, 9, 14],
[ 4, 6, 12, 16],
[ 2, 7, 10, 13]], dtype=int64)
I get confused about the initial way of rand_x and I didn't get such way in numpy.array
And I have been thinking it a long time, but I still don't understand why return x[(rand_x, rand_y)] will get a shuffled-rows array.
If not mind, could anyone explain the code to me?
Thanks in advance.
In indexing Numpy arrays, you can take single elements. Let's use a 3x4 array to be able to differentiate between the axes:
In [1]: x1 = np.array([[1, 2, 3, 4],
...: [5, 6, 7, 8],
...: [9, 10, 11, 12]], dtype=int)
In [2]: x1[0, 0]
Out[2]: 1
If you review Numpy Advanced indexing, you will find that you can do more in indexing, by providing lists for each dimension. Consider indexing with x1[rows..., cols...], let's take two elements.
Pick from the first and second row, but always from the first column:
In [3]: x1[[0, 1], [0, 0]]
Out[3]: array([1, 5])
You can even index with arrays:
In [4]: x1[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]
Out[4]:
array([[1, 2],
[5, 6]])
np.indices creates a row and col array, that if used for indexing, give back the original array:
In [5]: grid = np.indices(x1.shape)
In [6]: np.alltrue(x1[grid[0], grid[1]] == x1)
Out[6]: True
Now if you shuffle the values of grid[0] col-wise, but keep grid[1] as-is, and then use these for indexing, you get an array with the values of the columns shuffled.
Each column index vector is [0, 1, 2]. The code now shuffles these column index vectors for each column individually, and stacks them together into rand_x into the same shape as x1.
Create a single shuffled column index vector:
In [7]: np.random.seed(0)
In [8]: np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
Out[8]: array([2, 1, 0])
The stacking works by (pseudo-code) stacking with [random-index-col-vec for cols in range(x1.shape[1])] and then transposing (.T).
To make it a little clearer we can rewrite i as col and use column_stack instead of np.array([... for col]).T:
In [9]: np.random.seed(0)
In [10]: col_list = [np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
for col in range(x1.shape[1])]
In [11]: col_list
Out[11]: [array([2, 1, 0]), array([2, 0, 1]), array([0, 2, 1]), array([2, 0, 1])]
In [12]: rand_x = np.column_stack(col_list)
In [13]: rand_x
Out[13]:
array([[2, 2, 0, 2],
[1, 0, 2, 0],
[0, 1, 1, 1]])
In [14]: x1[rand_x, grid[1]]
Out[14]:
array([[ 9, 10, 3, 12],
[ 5, 2, 11, 4],
[ 1, 6, 7, 8]])
Details to note:
the example output you give is different from what the function you provide does. It seems to be transposed.
the use of rand_x and rand_y in the sample code can be confusing when being used to the convention of x=column index, y=row index
See output:
import numpy as np
def shuffle_col_val(x):
print("----------------------------\n A rand_x\n")
f = np.random.choice(x.shape[0], size=x.shape[0], replace=False)
print(f, "\nNow I transpose an array.")
rand_x = np.array([f]).T
print(rand_x)
print("----------------------------\n B rand_y\n")
print("Grid gives you two possibilities\n you choose second:")
grid = np.indices(x.shape)
print(format(grid))
rand_y = grid[1]
print("\n----------------------------\n C Our rand_x, rand_y:")
print("\nThe order of values in the column CHANGE:\n has random order\n{}".format(rand_x))
print("\nThe order of values in the row NO CHANGE:\n has normal order 0, 1, 2, 3\n{}".format(rand_y))
return x[(rand_x, rand_y)]
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
print("\n----------------------------\n D Our shuffled-rows: \n{}\n".format(shuffle_col_val(x1)))
Output:
A rand_x
[2 3 0 1]
Now I transpose an array.
[[2]
[3]
[0]
[1]]
----------------------------
B rand_y
Grid gives you two possibilities, you choose second:
[[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]]
----------------------------
C Our rand_x, rand_y:
The order of values in the column CHANGE: has random order
[[2]
[3]
[0]
[1]]
The order of values in the row NO CHANGE: has normal order 0, 1, 2, 3
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
----------------------------
D Our shuffled-rows:
[[ 9 10 11 12]
[13 14 15 16]
[ 1 2 3 4]
[ 5 6 7 8]]