How concatenate 2 Numpy array efficiently? - python

I have 2 Numpy array <type 'numpy.ndarray'> with shape of (10,) (10, 6) and I would like to concat the first one with the second. The numpy array provided below,
r1
['467c8100-7f13-4244-81ee-5e2a0f8218a8',
'71a4b5b2-80d6-4c12-912f-fc71be8d923e',
'7a3e0168-e47d-4203-98f2-a54a46c62ae0',
'7dfd43e7-ced1-435f-a0f9-80cfd00ae246',
'85dbc70e-c773-43ee-b434-8f458d295d10',
'a56b2bc3-4a81-469e-bc5f-b3aaa520db05',
'a9e8996f-ff35-4bfb-bbd9-ede5ffecd4d8',
'c3037410-0c2e-40f8-a844-ac0664a05783',
'c5618563-10c0-425b-a11b-2fcf931f0ff7',
'f65e6cea-892e-4335-8e86-bf7f083b5f53']
r2
[[1.55000000e+02, 5.74151515e-01, 1.55000000e+02, 5.74151515e-01, 3.49000000e+02, 1.88383585e+00],
[5.00000000e+00, 1.91871554e-01, 1.03000000e+02, 1.22893828e+00, 2.95000000e+02, 3.21148368e+00],
[7.10000000e+01, 1.15231270e-01, 2.42000000e+02, 5.78527276e-01, 4.09000000e+02, 2.67915246e+00],
[3.60000000e+01, 7.10066720e-01, 2.42000000e+02, 1.80213634e+00, 4.12000000e+02, 4.16314391e+00],
[1.15000000e+02, 1.05120284e+00, 1.30000000e+02, 1.71697773e+00, 2.53000000e+02, 2.73640301e+00],
[4.70000000e+01, 2.19434656e-01, 3.23000000e+02, 4.84093786e+00, 5.75000000e+02, 7.00530186e+00],
[5.50000000e+01, 1.22614463e+00, 1.04000000e+02, 1.55392099e+00, 4.34000000e+02, 4.13661261e+00],
[3.90000000e+01, 3.34816889e-02, 1.10000000e+02, 2.54431753e-01, 2.76000000e+02, 1.52322736e+00],
[3.43000000e+02, 2.93550948e+00, 5.84000000e+02, 5.27968165e+00, 7.45000000e+02, 7.57657633e+00],
[1.66000000e+02, 1.01436635e+00, 2.63000000e+02, 2.69197514e+00, 8.13000000e+02, 7.96477735e+00]]
I tried to concatenate with the command np.concatenate((r1, r2)), it returns with the message of ValueError: all the input arrays must have same number of dimensions which I don't understand. Because, the r1 can possibly concat with the r2 and can form a whole new array and make a new array of 10 x 7 as result.
How to solve this problem ?

Numpy offers an easy way to concatenate along the second axis.
np.c_[r2,r1]

You can reshape r1 to make it two-dimensional and specify the axis along which the arrays should be joined:
import numpy as np
r1 = np.ones((10,))
r2 = np.zeros((10, 6))
np.concatenate((r1.reshape(10, 1), r2), axis=1)

These 2 array have a dtype and shape mismatch:
In [174]: r1.shape
Out[174]: (10,)
In [175]: r1.dtype
Out[175]: dtype('<U36')
In [177]: r2.shape
Out[177]: (10, 6)
In [178]: r2.dtype
Out[178]: dtype('float64')
If you add a dimension to r1, so it is now (10,1), you can concatenate on axis=1. But note the dtype - the floats have been turned into strings:
In [181]: r12 =np.concatenate((r1[:,None], r2), axis=1)
In [182]: r12.shape
Out[182]: (10, 7)
In [183]: r12.dtype
Out[183]: dtype('<U36')
In [184]: r12[0,:]
Out[184]:
array(['467c8100-7f13-4244-81ee-5e2a0f8218a8', '155.0', '0.574151515',
'155.0', '0.574151515', '349.0', '1.88383585'],
dtype='<U36')
A way to mix string and floats is with structured array, for example:
In [185]: res=np.zeros((10,),dtype='U36,(6)f')
In [186]: res.dtype
Out[186]: dtype([('f0', '<U36'), ('f1', '<f4', (6,))])
In [187]: res['f0']=r1
In [188]: res['f1']=r2
In [192]: res.shape
Out[192]: (10,)
In [193]: res[0]
Out[193]: ('467c8100-7f13-4244-81ee-5e2a0f8218a8', [ 155. , 0.57415152, 155. , 0.57415152, 349. , 1.88383579])
We could also make a (10,7) array with dtype=object. But most array operations won't work with such a mix of strings and floats. And the ones that work are slower.
Why do you want to concatenate these arrays? What do you intend to do with the result? That dtype mismatch is more serious than the shape mismatch.

Related

numpy issue with concatenating arrays

I am porting some Matlab code to python and I have the following statement in Matlab:
cross([pt1,1]-[pp,0],[pt2,1]-[pp,0]);
pt1, pt2 and pp are 2D points.
So, my corresponding python code looks as follows:
np.cross(np.c_[pt1 - pp, 1], np.c_[pt2 - pp, 1])
The points are defined as:
pt1 = np.asarray((440.0, 59.0))
pt2 = np.asarray((-2546.23, 591.03))
pp = np.asarray([563., 456.5])
When I execute the statement with the cross product, I get the following error:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
So looking at some other posts, here I thought I would try np.column_stack but I get the same error:
np.cross(np.column_stack((pt1 - pp, 1)), np.column_stack((pt2 - pp, 1)))
This might be what you are looking for:
np.cross(np.append(pt1-pp, 1), np.append(pt2-pp, 1))
If you use np.r_ instead it works:
In [40]: np.cross(np.r_[pt1 - pp, 1], np.r_[pt2 - pp, 1])
Out[40]: array([-5.32030000e+02, -2.98623000e+03, -1.25246611e+06])
Your pt1 and pp are (2,) arrays. To add a 1 to them you need to use a 1d concatenate, np.r_ for 'row', as opposed to columns.
There are lots of ways of constructing a 3 element array:
In [43]: np.r_[pt1 - pp, 1]
Out[43]: array([-123. , -397.5, 1. ])
In [44]: np.append(pt1 - pp, 1)
Out[44]: array([-123. , -397.5, 1. ])
In [45]: np.concatenate((pt1 - pp, [1]))
Out[45]: array([-123. , -397.5, 1. ])
concatenate is the base operation. The others tweak the 1 to produce a 1d array that can be joined with the (2,) shape array to make a (3,).
Concatenate turns all of its inputs into arrays, if they aren't already: np.concatenate((pt1 - pp, np.array([1]))).
Note that np.c_ docs say it is the equivalent of
np.r_['-1,2,0', index expression]
That initial string expression is a bit complicated. The key point is it tries to concatenate 2d arrays (whereas your pt1 is 1d).
It is like column_stack, joiningn(2,1)arrays to make a(2,n)` array.
In [48]: np.c_[pt1, pt2]
Out[48]:
array([[ 440. , -2546.23],
[ 59. , 591.03]])
In [50]: np.column_stack((pt1, pt2))
Out[50]:
array([[ 440. , -2546.23],
[ 59. , 591.03]])
In MATLAB everything has at least 2 dimensions, and because it is Fortran based, the outer dimensions are last. So in a sense its most natural 'vector' shape is n x 1, a column matrix. numpy is built on Python, with a natural interface to its scalars and nested lists. Order is c based; the initial dimensions are outer most. So numpy code can have true scalars (Python numbers without shape or size), or arrays with 0 or more dimensions. A 'vector' most naturally has shape (n,) (a 1 element tuple). It can easily be reshaped to (1,n) or (n,1) if needed.
If you want a (3,1) array (instead of (3,) shaped), you'd need to use some sort of 'vertical' concatenation, joining a (2,1) array with a (1,1):
In [51]: np.r_['0,2,0', pt1-pp, 1]
Out[51]:
array([[-123. ],
[-397.5],
[ 1. ]])
In [53]: np.vstack([(pt1-pp)[:,None], 1])
Out[53]:
array([[-123. ],
[-397.5],
[ 1. ]])
(But np.cross wants (n,3) or (3,) arrays, not (3,1)!)
In [58]: np.cross(np.r_['0,2,0', pt1-pp, 1], np.r_['0,2,0', pt2-pp, 1])
...
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)
To get around this specify an axis:
In [59]: np.cross(np.r_['0,2,0', pt1-pp, 1], np.r_['0,2,0', pt2-pp, 1], axis=0)
Out[59]:
array([[-5.32030000e+02],
[-2.98623000e+03],
[-1.25246611e+06]])
Study np.cross if you want an example of manipulating dimensions. In this axis=0 case it transposes the arrays so they are (1,3) and then does the calculation.

Create ndarray from list with mismatched axis>0 size

I want to save a list of Numpy arrays to a file. The list is of the following shape:
my_list = [np.ones((2, 515, 3)), np.ones((2, 853, 3))]
However, when I try to save it using np.savez, the list tries to get converted into an Numpy array. Doing np.array(my_list, dtype='object') gives the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-6fcbd172df30> in <module>()
----> 1 np.array([np.ones((2, 515, 3)), np.ones((2, 853, 3))], dtype='object')
ValueError: could not broadcast input array from shape (2,515,3) into shape (2)
However, if the axis=0 dimension is mismatched instead of the axis=1 dimension, such as my_list = [np.ones((515, 3)), np.ones((853, 3))], I no longer get this error.
Why does the mistmatched axis dimension affect the ability to Numpy array from objects?
Although there are work-arounds possible to break up the array into a save-able format, I'm mostly interested about why the conversion failure is happening and how to get around it.
In [77]: my_list = [np.ones((2, 515, 3)), np.ones((2, 853, 3))]
Save with the *args parameter, or with a **kwargs dictionary
In [78]: np.savez('test',*my_list)
In [79]: ll = np.load('test.npz')
In [80]: list(ll.keys())
Out[80]: ['arr_0', 'arr_1']
In [81]: ll['arr_0'].shape
Out[81]: (2, 515, 3)
In [82]: ll['arr_1'].shape
Out[82]: (2, 853, 3)
or with named keywarods/dictionary
In [85]: np.savez('test',x=my_list[0],y=my_list[1])
np.savez('test', my_list), first turns my_list into an array - or tries to
In [83]: np.array(my_list)
...
ValueError: could not broadcast input array from shape (2,515,3) into shape (2)
When trying to create an array from a list of arrays there are 3 possible outcomes: a higher dimensional array (if dimensions match), an object array (if dimensions don't match), or this error (if the dimensions sort-of match).
The object dtype case:
In [86]: arr=np.array([np.ones((515, 3)), np.ones((853, 3))])
In [87]: arr.shape
Out[87]: (2,)
In [88]: arr.dtype
Out[88]: dtype('O')
The surest way to create an object array is to preallocate it
In [90]: arr = np.zeros((2,), object)
In [91]: arr[...]=my_list
The shape of arr has to match the nesting of the sublists/arrays in my_list, otherwise you'll get broadcasting errors. arr can be reshaped after loading.

About Numpy,a=np.array([1,2,3,4]),print a.shape[0]. why it will output 4?

import numpy as np
a = np.array([1,2,3,4])
print a.shape[0]
Why it will output 4?
The array [1,2,3,4], it's rows should be 1, I think , so who can explain the reason for me?
because
print(a.shape) # -> (4,)
what you think (or want?) to have is
a = np.array([[1],[2],[3],[4]])
print(a.shape) # -> (4, 1)
or rather (?)
a = np.array([[1, 2 , 3 , 4]])
print(a.shape) # -> (1, 4)
If you'll print a.ndim you'll get 1. That means that a is a one-dimensional array (has rank 1 in numpy terminology), with axis length = 4. It's different from 2D matrix with a single row or column (rank 2).
More on ranks
Related questions:
numpy: 1D array with various shape
Python: Differentiating between row and column vectors
The shape attribute for numpy arrays returns the dimensions of the array. If a has n rows and m columns, then a.shape is (n,m). So a.shape[0] is n and a.shape[1] is m.
numpy arrays returns the dimensions of the array. So, when you create an array using,
a = np.array([1,2,3,4])
you get an array with 4 dimensions. You can check it by printing the shape,
print(a.shape) #(4,)
So, what you get is NOT a 1x4 matrix. If you want that do,
a = numpy.array([1,2,3,4]).reshape((1,4))
print(a.shape)
Or even better,
a = numpy.array([[1,2,3,4]])
a = np.array([1, 2, 3, 4])
by doing this, you get a a as a ndarray, and it is a one-dimension array. Here, the shape (4,) means the array is indexed by a single index which runs from 0 to 3. You can access the elements by the index 0~3. It is different from multi-dimensional arrays.
You can refer to more help from this link Difference between numpy.array shape (R, 1) and (R,).

Append numpy array into an element

I have a Numpy array of shape (5,5,3,2). I want to take the element (1,4) of that matrix, which is also a matrix of shape (3,2), and add an element to it -so it becomes a (4,2) array.
The code I'm using is the following:
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype = object) #So I can have different size sub-matrices
a[2][3] = np.append(a[2][3],[[1.0,1.0]],axis=0) #a[2][3] shape = (3,2)
I'm always obtaining the error:
ValueError: could not broadcast input array from shape (4,2) into shape (3,2)
I understand that the shape returned by the np.append function is not the same as the a[2][3] sub-array, but I thought that the dtype=object would solve my problem. However, I need to do this. Is there any way to go around this limitation?
I also tried to use the insert function but I don't know how could I add the element in the place I want.
Make sure you understand what you have produced. That requires checking the shape and dtype, and possibly looking at the values
In [29]: a = np.random.rand(5,5,3,2)
In [30]: b=np.array(a, dtype=object)
In [31]: a.shape
Out[31]: (5, 5, 3, 2) # a is a 4d array
In [32]: a.dtype
Out[32]: dtype('float64')
In [33]: b.shape
Out[33]: (5, 5, 3, 2) # so is b
In [34]: b.dtype
Out[34]: dtype('O')
In [35]: b[2,3].shape
Out[35]: (3, 2)
In [36]: c=np.append(b[2,3],[[1,1]],axis=0)
In [37]: c.shape
Out[37]: (4, 2)
In [38]: c.dtype
Out[38]: dtype('O')
b[2][3] is also an array. b[2,3] is the proper numpy way of indexing 2 dimensions.
I suspect you wanted b to be a (5,5) array containing arrays (as objects), and you think that you you can simply replace one of those with a (4,2) array. But the b constructor simply changes the floats of a to objects, without changing the shape (or 4d nature) of b.
I could construct a (5,5) object array, and fill it with values from a. And then replace one of those values with a (4,2) array:
In [39]: B=np.empty((5,5),dtype=object)
In [40]: for i in range(5):
...: for j in range(5):
...: B[i,j]=a[i,j,:,:]
...:
In [41]: B.shape
Out[41]: (5, 5)
In [42]: B.dtype
Out[42]: dtype('O')
In [43]: B[2,3]
Out[43]:
array([[ 0.03827568, 0.63411023],
[ 0.28938383, 0.7951006 ],
[ 0.12217603, 0.304537 ]])
In [44]: B[2,3]=c
In [46]: B[2,3].shape
Out[46]: (4, 2)
This constructor for B is a bit crude. I've answered other questions about creating/filling object arrays, but I'm not going to take the time here to streamline this case. It's for illustration purposes only.
In an array of object, any element can be indeed an array (or any kind of object).
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype=object)
# Assign an 1D array to the array element ``a[2][3][0][0]``:
a[2][3][0][0] = np.arange(10)
a[2][3][0][0][9] # 9
However a[2][3] is not an array element, it is a whole array.
a[2][3].ndim # 2
Therefore when you do a[2][3] = (something) you are using broadcasting instead of assigning an element: numpy tries to replace the content of the subarray a[2][3] and fails because of shape mismatch. The memory layout of numpy arrays does not allow to change the shape of subarrays.
Edit: Instead of using numpy arrays you could use nested lists. These nested lists can have arbitrary sizes. Note that the memory is higher and that the access time is higher compared to numpy array.
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype=object)
b = np.append(a[2][3], [[1.0,1.0]],axis=0)
a_list = a.tolist()
a_list[2][3] = b.tolist()
The problem here, is that you try to assign to a[2][3]
Make a new array instead.
new_array = np.append(a[2][3],np.array([[1.0,1.0]]),axis=0)

How do I get the strides from a dtype in numpy?

I think I can do: np.zeros((), dtype=dt).strides, but this doesn't seem efficient when the dtype is a large array type like: ('<f8', (200, 100)). Is there a way of going directly from dtype to strides in numpy?
You can actually get the strides of a sub-array within a structured array without creating the "full" array.
Sub-arrays within a structured array are required to be contiguous and in C-order according to the documentation. Note the sentence just above the first example:
Sub-arrays always have a C-contiguous memory layout.
Therefore, for a structured array with no fields such as the one in your example, you can do (as an unreadable one-liner):
import numpy as np
x = np.dtype(('<f8', (200, 100)))
strides = x.base.itemsize * np.r_[1, np.cumprod(x.shape[::-1][:-1])][::-1]
Avoiding the code golf:
shape = list(x.shape)
# First, let's make the strides for an array with an itemsize of 1 in C-order
tmp_strides = shape[::-1]
tmp_strides[1:] = list(np.cumprod(tmp_strides[:-1]))
tmp_strides[0] = 1
# Now adjust it for the real itemsize:
tmp_strides = x.base.itemsize * np.array(tmp_strides)
# And convert it to a tuple, reversing it back for proper C-order
strides = tuple(tmp_strides[::-1])
This gets more complex when there are multiple fields, however. You'd need to put in approriate checks in general. For example: Does the dtype have a shape attribute? Does it have fields? Do any fields have shape attributes?
I think you are talking about an array with:
In [257]: dt=np.dtype([('f0',float, (200,100))])
In [258]: x=np.zeros((),dtype=dt)
The array itself is 0d with one item.
In [259]: x.strides
Out[259]: ()
That item has shape and strides determined by the dtype:
In [260]: x['f0'].strides
Out[260]: (800, 8)
In [261]: x['f0'].shape
Out[261]: (200, 100)
But is constructing x any different than constructing a plain float array with the same shape?
In [262]: y=np.zeros((200,100),float)
In [263]: y.strides
Out[263]: (800, 8)
You can't get the strides of a potential y without actually constructing it.
Ipython whos command shows x and y take up about the same space:
x ndarray : 1 elems, type `[('f0', '<f8', (200, 100))]`,
160000 bytes (156.25 kb)
y ndarray 200x100: 20000 elems, type `float64`,
160000 bytes (156.25 kb)
An iteresting question is whether such an x['f0'] has all the properties of y. You can probably read all the properties, but may be limited in what ones you can change.
You can parse the dtype:
In [309]: dt=np.dtype([('f0',float, (200,100))])
In [310]: dt.fields
Out[310]: mappingproxy({'f0': (dtype(('<f8', (200, 100))), 0)})
In [311]: dt[0]
Out[311]: dtype(('<f8', (200, 100)))
In [312]: dt[0].shape
Out[312]: (200, 100)
In [324]: dt[0].base
Out[324]: dtype('float64')
I don't see a strides like attribute of dt or dt[0]. There may be some numpy function that calculates the strides, based on shape, but it probably is hidden. You could search the np.lib.stride_tricks module. That's where as_strided is found.
From the (200,100) shape, and float64 taking 8 bytes, it is possible calculate that the normal (default) strides is (8*100, 8).
For dtype that isn't further nested, this seems to work:
In [374]: dt[0]
Out[374]: dtype(('<f8', (200, 100)))
In [375]: tuple(np.array(dt[0].shape[1:]+(1,))*dt[0].base.itemsize)
Out[375]: (800, 8)
Lets make a more complex array with this dtype
In [346]: x=np.zeros((3,1),dtype=dt)
In [347]: x.shape
Out[347]: (3, 1)
In [348]: x.strides
Out[348]: (160000, 160000)
Its strides depends on the shape and itemsize. But the shape and strides of a field are 4d. Can we say they exist without actually accessing the field?
In [349]: x['f0'].strides
Out[349]: (160000, 160000, 800, 8)
strides for an item:
In [350]: x[0,0]['f0'].strides
Out[350]: (800, 8)
How about double nesting?
In [390]: dt1=np.dtype([('f0',np.dtype([('f00',int,(3,4))]), (20,10))])
In [391]: z=np.zeros((),dt1)
In [392]: z['f0']['f00'].shape
Out[392]: (20, 10, 3, 4)
In [393]: z['f0']['f00'].strides
Out[393]: (480, 48, 16, 4)
In [399]: (np.cumprod(np.array((10,3,4,1))[::-1])*4)[::-1]
Out[399]: array([480, 48, 16, 4], dtype=int32)
correction, the striding for a field is a combination of the striding for the array as a whole plus striding for the field. It can be seen with a multifield dtype
In [430]: dt=np.dtype([('f0',float, (3,4)),('f1',int),('f2',int,(2,))])
In [431]: x=np.zeros((3,2),dt)
In [432]: x.shape
Out[432]: (3, 2)
In [433]: x.strides
Out[433]: (216, 108)
In [434]: x['f0'].shape
Out[434]: (3, 2, 3, 4)
In [435]: x['f0'].strides
Out[435]: (216, 108, 32, 8)
(216,108) is striding for the whole array (itemsize is 108), concatenated with the striding for the f0 field (32,8) (itemsize 8).

Categories

Resources