I have a NumPy array with data [3, 5], and I want to create a function which takes this array, and returns the following (2 x 3 x 2) NumPy array :
[[[3, 3],
[3, 3],
[3, 3]],
[[5, 5],
[5, 5],
[5, 5]]]
However, I have not been able to achieve this, using Numpy's repeat() or tile() functions.
For example:
x = np.array([3, 5])
y = np.repeat(x, [2, 3, 2])
Gives the following error:
ValueError: a.shape[axis] != len(repeats)
And:
x = np.array([3, 5])
y = np.tile(x, [2, 3, 2])
Creates a (2 x 3 x 4) array:
[[[3, 5, 3, 5],
[3, 5, 3, 5],
[3, 5, 3, 5]],
[[3, 5, 3, 5],
[3, 5, 3, 5],
[3, 5, 3, 5]]]
What should my function be?
You could use np. tile, you just miss dividing by the number of elements at repetition axis, in your case it's 1D
x = np.array([3, 5])
y = np.tile(x, [2, 3, 2 // x.shape[0]])
def get_nd(a, shape):
shape = np.array(shape)
a_shape = np.ones_like(shape)
a_shape[-a.ndim:] = a.shape
shape = (shape * 1/a_shape).astype('int')
return np.tile(a, shape)
get_nd(x, (2, 3, 2))
Update
Transpose desired shape, if you are targeting (2, 3, 6), then ask for (6, 3, 2) then transpose the resultant matrix
get_nd(x, (2, 3, 6)).T
Or use the following function instead
def get_nd_rep(a, shape):
shape = np.array(shape)
x_shape = np.ones_like(shape)
x_shape[-a.ndim:] = a.shape
shape = (shape * 1/x_shape).astype('int')
return np.tile(a, shape).T
get_nd_rep(x, (2, 3, 2))
The signature for repeat: np.repeat(a, repeats, axis=None)
It can be used as:
In [345]: np.repeat(x, [2,3])
Out[345]: array([3, 3, 5, 5, 5])
In other words, if the repeats is a list, it should match a in size, saying how many times each element is repeated.
While we could expand the dimensions of x, and try repeats (or tile), a simpler approach is to just expand x, and reshape:
In [349]: np.repeat(x,6)
Out[349]: array([3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5, 5])
In [350]: np.repeat(x,6).reshape(2,3,2)
Out[350]:
array([[[3, 3],
[3, 3],
[3, 3]],
[[5, 5],
[5, 5],
[5, 5]]])
multiple repeats
Another approach is to expand x to 3d, and apply 2 repeats. I had to try several things before I got it right:
In [357]: x[:,None,None]
Out[357]:
array([[[3]],
[[5]]])
In [358]: x[:,None,None].repeat(2,2)
Out[358]:
array([[[3, 3]],
[[5, 5]]])
In [359]: x[:,None,None].repeat(2,2).repeat(3,1)
Out[359]:
array([[[3, 3],
[3, 3],
[3, 3]],
[[5, 5],
[5, 5],
[5, 5]]])
tile
np.tile does something similar (multiple repeats):
In [361]: np.tile(x[:,None,None],(1,3,2))
Out[361]:
array([[[3, 3],
[3, 3],
[3, 3]],
[[5, 5],
[5, 5],
[5, 5]]])
To use repeat or tile I usually need to review the docs and/or try several things. Especially when expanding the dimensions.
broadcasting
The problem is even simpler with broadcasting:
In [370]: res = np.zeros((2,3,2), int)
In [371]: res[:] = x[:,None,None]
In [372]: res
Out[372]:
array([[[3, 3],
[3, 3],
[3, 3]],
[[5, 5],
[5, 5],
[5, 5]]])
though getting the expansion of x dimensions right took a few tries.
from functools import reduce
import numpy as np
import operator as op
DIM = (2, 3, 2) # size of each dimension
CAP = reduce(op.mul, DIM[1:]) # product of DIM
x = np.array([3, 5])
y = np.repeat(x, CAP).reshape(*DIM)
This will generate a 2x3 array of each element of x, repeated.
DIM[0] should be len(x); otherwise, an exception will be raised. This is due to the parameter of np.reshape being incompatible with the shape produced by np.repeat (numpy doc).
This is also the reason why the ValueError was raised in your case.
Related
So I have a Numpy Array with a bunch of numpy arrays inside of them. I want to group them based on the position in their individual array.
For Example:
If Matrix is:
[[1, 2], [2, 3], [4, 5], [6, 7]]
Then the code should return:
[[1, 2, 4, 6], [2, 3, 5, 7]]
This is becuase 1, 2, 4, 6 are all the first elements in their individual arrays, and 2, 3, 5, 7 are the second elements in their individual arrays.
Anyone know some function that could do this. Thanks.
Answer in Python.
Using numpy transpose should do the trick:
a = np.array([[1, 2], [2, 3], [4, 5], [6, 7]])
a_t = a.T
print(a_t)
array([[1, 2, 4, 6],
[2, 3, 5, 7]])
Your data as a list:
In [101]: alist = [[1, 2], [2, 3], [4, 5], [6, 7]]
In [102]: alist
Out[102]: [[1, 2], [2, 3], [4, 5], [6, 7]]
and as a numpy array:
In [103]: arr = np.array(alist)
In [104]: arr
Out[104]:
array([[1, 2],
[2, 3],
[4, 5],
[6, 7]])
A standard idiom for 'transposing' lists is:
In [105]: list(zip(*alist))
Out[105]: [(1, 2, 4, 6), (2, 3, 5, 7)]
with arrays, there's a transpose method:
In [106]: arr.transpose()
Out[106]:
array([[1, 2, 4, 6],
[2, 3, 5, 7]])
The first array is (4,2) shape; its transpose is (2,4).
I am trying to map 2 numpy arrays as [x, y] similar to what zip does for lists and tuples.
I have 2 numpy arrays as follows:
arr1 = [1, 2, 3, 4]
arr2 = [5, 6, 7, 8]
I am looking for an output as np.array([[[1, 5], [2, 6], [3, 7], [4, 8]]])
I tried this but it maps every value and not with same indices. I can add more if conditions here but is there any other way to do so without adding any more if conditions.
res = [arr1, arr2] for a1 in arr1 for a2 in arr2]
You are looking for np.dstack
Stack arrays in sequence depth wise (along third axis).
np.dstack([arr1, arr2])
array([[[1, 5],
[2, 6],
[3, 7],
[4, 8]]])
IIUC, one way is to use numpy.vstack() followed by transpose():
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])
print(np.vstack([arr1, arr2]).transpose())
#array([[1, 5],
# [2, 6],
# [3, 7],
# [4, 8]])
Or you could pass the output of zip to the array constructor:
print(np.array(zip(arr1, arr2)))
#array([[1, 5],
# [2, 6],
# [3, 7],
# [4, 8]])
The built in zip command is the job for you here. It'll do exactly what you're asking.
arr1 = [1,2,3,4]
arr2 = [5,6,7,8]
list(zip(arr1, arr2))
[(1, 5), (2, 6), (3, 7), (4, 8)]
https://docs.python.org/3/library/functions.html#zip
I'm trying to find a way to fill an array with rows of values. It's much easier to express my desired output with an example. Given the input of an N x M matrix, array1,
array1 = np.array([[2, 3, 4],
[4, 8, 3],
[7, 6, 3]])
I would like to output an array of arrays in which each row is an N x N consisting of the values from the respective row. The output would be
[[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]],
[[4, 8, 3],
[4, 8, 3],
[4, 8, 3]],
[[7, 6, 3],
[7, 6, 3],
[7, 6, 3]]]
You can reshape the array from 2d to 3d, then use numpy.repeat() along the desired axis:
np.repeat(array1[:, None, :], 3, axis=1)
#array([[[2, 3, 4],
# [2, 3, 4],
# [2, 3, 4]],
# [[4, 8, 3],
# [4, 8, 3],
# [4, 8, 3]],
# [[7, 6, 3],
# [7, 6, 3],
# [7, 6, 3]]])
Or equivalently you can use numpy.tile:
np.tile(array1[:, None, :], (1,3,1))
Another solution which is sometimes useful is the following
out = np.empty((3,3,3), dtype=array1.dtype)
out[...] = array1[:, None, :]
I have a numpy array say
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I have an array 'replication' of the same size where replication[i,j](>=0) denotes how many times a[i][j] should be repeated along the row. Obiviously, replication array follows the invariant that np.sum(replication[i]) have the same value for all i.
For example, if
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
then the final array after replicating is:
new_a = array([[1, 2, 2, 3],
[4, 5, 6, 6],
[7, 7, 8, 9]])
Presently, I am doing this to create new_a:
##allocate new_a
h = a.shape[0]
w = a.shape[1]
for row in range(h):
ll = [[a[row][j]]*replicate[row][j] for j in range(w)]
new_a[row] = np.array([item for sublist in ll for item in sublist])
However, this seems to be too slow as it involves using lists. Can I do the intended entirely in numpy, without the use of python lists?
You can flatten out your replication array, then use the .repeat() method of a:
import numpy as np
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
new_a = a.repeat(replication.ravel()).reshape(a.shape[0], -1)
print(repr(new_a))
# array([[1, 2, 2, 3],
# [4, 5, 6, 6],
# [7, 7, 8, 9]])
I have a function called gen_data which will make a single pass through a list and construct a 3D array. I then iterate across a list of list, applying the function gen_data, and then concat the results together.
fst = lambda x: x[0]
snd = lambda x: x[1]
def gen_data(data,p=0, batch_size = BATCH_SIZE, n_session = N_SESSION,
x = np.zeros((batch_size,SEQ_LENGTH,vocab_size))
y = np.zeros(batch_size)
for n in range(batch_size):
ptr = n
for i in range(SEQ_LENGTH):
x[n,i,char_to_ix[data[p+ptr+i]]] = 1.
if(return_target):
y[n] = char_to_ix[data[p+ptr+SEQ_LENGTH]]
return x, np.array(y,dtype='int32')
def batch_data(data):
nest = [gen_data(datum) for datum in data]
x = np.concatenate(map(fst,nest))
y = np.concatenate(map(snd,nest))
return (x,y)
What is the best way to combine these functions so I do not need to make multiple passes back through the data to concatenate the results?
To clarify, the goal would be remove the need to zip/concat/splat/list comp in general. To be able to initialize the x tensor to the correct dimensions and then iterate across each datum/SEQ_LENGTH, batch_size in a single pass.
Without testing things, here are a few quick fixes:
def gen_data(data,p=0, batch_size = BATCH_SIZE, n_session = N_SESSION,
x = np.zeros((batch_size,SEQ_LENGTH,vocab_size))
y = np.zeros(batch_size, dtype=int) # initial to desired type
for n in range(batch_size):
ptr = n
for i in range(SEQ_LENGTH):
x[n,i,char_to_ix[data[p+ptr+i]]] = 1.
if(return_target):
y[n] = char_to_ix[data[p+ptr+SEQ_LENGTH]]
return x, y
# y is already an array; don't need this: np.array(y,dtype='int32')
nest = [gen_data(datum) for datum in data] produces, I think,
[(x0,y0), (x1,y1),...] where x is 3d (n,m,y), and y is 1d (n)
x = np.concatenate([n[0] for n in nest]) (I like this format over mapping) look ok to me. Compared to all the list comprehension operations, concatenate is relatively cheap. Look at the guts of np.vstack, etc to see how those use comprehensions along with concatenate.
A small example:
In [515]: def gen():
return np.arange(8).reshape(2,4),np.arange(1,3)
.....:
In [516]: gen()
Out[516]:
(array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([1, 2]))
In [517]: nest=[gen() for _ in range(3)]
In [518]: nest
Out[518]:
[(array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([1, 2])),
(array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([1, 2])),
(array([[0, 1, 2, 3],
[4, 5, 6, 7]]), array([1, 2]))]
In [519]: np.concatenate([x[0] for x in nest])
Out[519]:
array([[0, 1, 2, 3],
[4, 5, 6, 7],
[0, 1, 2, 3],
[4, 5, 6, 7],
[0, 1, 2, 3],
[4, 5, 6, 7]])
In [520]: np.concatenate([x[1] for x in nest])
Out[520]: array([1, 2, 1, 2, 1, 2])
zip* effectively does a 'tanspose' on a nested list, so the arrays could be constructed with:
In [532]: nest1=zip(*nest)
In [533]: np.concatenate(nest1[0])
Out[533]:
array([[0, 1, 2, 3],
[4, 5, 6, 7],
[0, 1, 2, 3],
[4, 5, 6, 7],
[0, 1, 2, 3],
[4, 5, 6, 7]])
In [534]: np.concatenate(nest1[1])
Out[534]: array([1, 2, 1, 2, 1, 2])
Still requires concatenates.
Since nest is a list of tuples, it could serve as input to a structured array:
In [524]: arr=np.array(nest,dtype=[('x','(2,4)int'),('y','(2,)int')])
In [525]: arr['x']
Out[525]:
array([[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[0, 1, 2, 3],
[4, 5, 6, 7]]])
In [526]: arr['y']
Out[526]:
array([[1, 2],
[1, 2],
[1, 2]])
Another possibility is to initial x and y, and iterate. But you are already doing this in gen_data. Only thing new is that I'd be assigning larger blocks.
x = ...
y = ...
for i in range(...):
x[i,...], y[i] = gen(data[i])
I like the comprehension solutions better, but I won't speculate on speeds.
In terms of speed I think it's the low level iteration in gen_data that is the time consumer. Concatenating larger blocks is relatively fast.
Another idea - since you are iterating over the rows of arrays within gen_data, how about passing views to that function, and iterate over those.
def gen_data(data,x=None,y=None):
# accept array or make own
if x is None:
x = np.zeros((3,4),int)
if y is None:
y = np.zeros(3,int)
for n in range(3):
x[n,...] = np.arange(4)+n
y[n] = n
return x,y
with no inputs, generate arrays as before:
In [543]: gen_data(None)
Out[543]:
(array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5]]),
array([0, 1, 2]))
or initial a pair, and iterate over views:
In [544]: x,y = np.zeros((9,4),int),np.zeros(9,int)
In [546]: for i in range(0,9,3):
.....: gen_data(None,x[i:i+3,...],y[i:i+3])
In [547]: x
Out[547]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5]])
In [548]: y
Out[548]: array([0, 1, 2, 0, 1, 2, 0, 1, 2])