a = np.array([[0, 1, 2, 3], [4, 5, 6, 7]], dtype=int)
b = np.array([[8], [9]], dtype=int)
result wanted:
alist = [[0, 1, 2, 3, 8], [4, 5, 6, 7, 9]] # as np.array
I tried:
np.concatenate(alist,blist)
np.concatenate((alist,blist))
np.concatenate(alist, blist[0])
for a,b in zip(alist,blist): np.concatenate(a,b)
alist = [*map(np.concatenate, alist, blist)])
This got me various error messages I tried to fix by using the next trial. Nothing worked so far.
You are just missing the axis=1 keyword argument.
np.concatenate((a, b), axis=1)
Normally np.concatenate works on axis 0 (going down the array). But in this case you want to concatenate along axis 1 (going across the array). See the glossary for more information.
You can achieve this by using np.hstack, this will concatenate the two arrays, but at the second axis.
a = np.array([[0, 1, 2, 3], [4, 5, 6, 7]], dtype=int)
b = np.array([[8], [9]], dtype=int)
>>> np.hstack((a,b))
array([[0, 1, 2, 3, 8],
[4, 5, 6, 7, 9]])
Related
Say I have some time-series data in the form of a simple array.
X1 = np.array[(1, 2, 3, 4]
The Hankel matrix can be obtained by using scipy.linalg.hankel, which would look something like this:
hankel(X1)
array([[1, 2, 3, 4],
[2, 3, 4, 0],
[3, 4, 0, 0],
[4, 0, 0, 0]])
Now assume I had a larger array in the form of
X2 = np.array([1, 2, 3, 4, 5, 6, 7])
What I want to do is fill in the zeros in this matrix with the numbers that are next in the index (specific to each row). Taking the same Hankel matrix earlier by using the first four values in the array X2, I'd like to see the following output:
hankel(X2[:4])
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7]])
How would I do this? I'd ideally like to use this for larger data.
Appreciate any tips or pointers given. Thanks!
If you have a matrix with the appropriate index values into your dataset, you can use integer array indexing directly into your dataset.
To create the index matrix, you can simply use the upper-left quadrant of a double-sized Hankel array. There are likely simpler ways to create the index matrix, but this does the trick.
>>> X = np.array([9, 8, 7, 6, 5, 4, 3])
>>> N = 4 # the size of the "window"
>>> indices = scipy.linalg.hankel(np.arange(N*2))[:N, :N]
>>> indices
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
>>> X[indices]
array([[9, 8, 7, 6],
[8, 7, 6, 5],
[7, 6, 5, 4],
[6, 5, 4, 3]])
So I've created a numpy array:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
I'm trying to delete the end element of this array's subarray:
a[0] = (a[0])[:-1]
And encounter this issue:
a[0] = (a[0])[:-1]
ValueError: could not broadcast input array from shape (2) into shape (3)
Why can't I change it ?
How do I do it?
Given:
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
You can do:
>>> a[:,0:2]
array([[1, 2],
[4, 5],
[7, 8]])
Or:
>>> np.delete(a,2,1)
array([[1, 2],
[4, 5],
[7, 8]])
Then in either case, assign that back to a since the result is a new array.
So:
>>> a=a[:,0:2]
>>> a
array([[1, 2],
[4, 5],
[7, 8]])
If you wanted only to delete 3 in the first row, that is a different problem. You can only do that if you have have an array of python lists since the sublists are not the same length.
Example:
>>> a = np.array([[1,2],[4,5,6],[7,8,9]])
>>> a
array([list([1, 2]), list([4, 5, 6]), list([7, 8, 9])], dtype=object)
If you do that, just stick to Python. You will have lost all the speed and other advantages of Numpy.
If by 'universal' you mean the last element of each row of a N x M array, just use .shape to find the dimensions:
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a.shape
(3, 4)
>>> np.delete(a,a.shape[1]-1,1)
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
Or,
>>> a[:,0:a.shape[1]-1]
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
>>> a = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> type(a)
<class 'numpy.ndarray'>
>>> a.shape
(3, 3)
The variable a is matrix (2D array). It has certain number of rows and columns. In a matrix all the rows must be of same length. As so, in the above example, the matrix cannot be formed if the first row has length 2 and others 3. So deleting the last element of only the first(or any other subset) sub-array is not possible.
Instead you have to delete the last element of all the sub-arrays at the same time.
That can be done as
>>> a[:,0:2]
array([[1, 2],
[4, 5],
[7, 8]])
Or,
>>> np.delete(a,2,1)
array([[1, 2],
[4, 5],
[7, 8]])
This also applies to the elements of other positions. Deleting can be done of any element of the sub-arrays keeping in mind that all the sub-arrays should have same length.
However you can manipulate the last element(or any other) of any sub-array unless the shape remains constant.
>>> a[0][-1] = 19
>>> a
array([[ 1, 2, 19],
[ 4, 5, 6],
[ 7, 8, 9]])
In case you try to form a matrix with rows of unequal length, a 1D array of lists is formed on which no Numpy operations like vector processing, slicing, etc. works (the list operation works)
>>> b = np.array([[1,2,3],[1,2,3]])
>>> c = np.array([[1,2],[1,2,3]])
>>> b
array([[1, 2, 3],
[1, 2, 3]])
>>> b.shape
(2, 3)
>>> c
array([list([1, 2]), list([1, 2, 3])], dtype=object)
>>> c.shape
(2,)
>>> print(type(b),type(c))
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
Both are ndarray, but you can see the second variable c has is a 1D array of lists.
>>> b+b
array([[2, 4, 6],
[2, 4, 6]])
>>> c+c
array([list([1, 2, 1, 2]), list([1, 2, 3, 1, 2, 3])], dtype=object)
Similarly, b+b operation performs the element-wise addition of b with b, but c+c performs the concatenation operation among the two lists.
For Further Ref
How to make a multidimension numpy array with a varying row size?
Here is how:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
a = a[:-1]
print(a)
Output:
[[1 2 3]
[4 5 6]]
I have two arrays, values and indexes
>>> values
array([[5, 4, 2, 4, 6],
[7, 9, 7, 3, 6]])
>>> indexes
array([[2, 4],
[0, 3],
[0, 1],
[1, 3]])
What i would like is a fast way (as my arrays are very large) to get, for each value of values the sum of the elements corresponding to all index collections that are in indexes.
I.e I want, for the first value [5, 4, 2, 4, 6] to get
>>> values[0][indexes.flatten()].reshape(indexes.shape)
array([[2, 6],
[5, 4],
[5, 4],
[4, 4]])
>>> values[0][indexes.flatten()].reshape(indexes.shape).sum(axis=1)
array([8, 9, 9, 8])
using this technique and looping over all values is the fastest I could come up with. Is there a better way? Thank you in advance for your time.
Approach #1
Simply index into columns and sum along the last axis -
values[:,indexes].sum(axis=-1)
Sample run -
In [39]: values
Out[39]:
array([[5, 4, 2, 4, 6],
[7, 9, 7, 3, 6]])
In [40]: indexes
Out[40]:
array([[2, 4],
[0, 3],
[0, 1],
[1, 3]])
In [41]: values[:,indexes].sum(axis=-1)
Out[41]:
array([[ 8, 9, 9, 8],
[13, 10, 16, 12]])
Approach #2
If there are no duplicates in each row of indexes, we can simply use matrix-multiplication to get the sum-reductions and this would be much faster -
m,n = indexes.shape[0], values.shape[1]
mask = np.zeros((n,m),dtype=bool) # faster with float dtype
mask[indexes, np.arange(m)[:,None]] = 1
out = values.dot(mask)
I have a numpy array say
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I have an array 'replication' of the same size where replication[i,j](>=0) denotes how many times a[i][j] should be repeated along the row. Obiviously, replication array follows the invariant that np.sum(replication[i]) have the same value for all i.
For example, if
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
then the final array after replicating is:
new_a = array([[1, 2, 2, 3],
[4, 5, 6, 6],
[7, 7, 8, 9]])
Presently, I am doing this to create new_a:
##allocate new_a
h = a.shape[0]
w = a.shape[1]
for row in range(h):
ll = [[a[row][j]]*replicate[row][j] for j in range(w)]
new_a[row] = np.array([item for sublist in ll for item in sublist])
However, this seems to be too slow as it involves using lists. Can I do the intended entirely in numpy, without the use of python lists?
You can flatten out your replication array, then use the .repeat() method of a:
import numpy as np
a = array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
replication = array([[1, 2, 1],
[1, 1, 2],
[2, 1, 1]])
new_a = a.repeat(replication.ravel()).reshape(a.shape[0], -1)
print(repr(new_a))
# array([[1, 2, 2, 3],
# [4, 5, 6, 6],
# [7, 7, 8, 9]])
I want to extract the second and the 3rd to the fifth columns of the NumPy array, how would I go about it?
A = array([[0, 1, 2, 3, 4, 5, 6], [4, 5, 6, 7, 4, 5, 6]])
A[:, [1, 4:6]]
This obviously doesn't work.
Assuming I've understood you -- it's usually a good idea to explicitly specify the output you want, because it's not obvious -- you could use numpy.r_:
In [27]: A
Out[27]:
array([[0, 1, 2, 3, 4, 5, 6],
[4, 5, 6, 7, 4, 5, 6]])
In [28]: A[:, [1,3,4,5]]
Out[28]:
array([[1, 3, 4, 5],
[5, 7, 4, 5]])
In [29]: A[:, r_[1, 3:6]]
Out[29]:
array([[1, 3, 4, 5],
[5, 7, 4, 5]])
In [37]: A[1:, r_[1, 3:6]]
Out[37]: array([[5, 7, 4, 5]])
which you can then flatten or reshape as you like. r_ is basically a convenience function to generate the right indices, e.g.
In [30]: r_[1, 3:6]
Out[30]: array([1, 3, 4, 5])
Perhaps you are looking for this?
In [10]: A[1:, [1]+range(3,6)]
Out[10]: array([[5, 7, 4, 5]])
Note this gives you the second, fourth, fifth and six columns of all rows but the first.
The second element is A[:,1]. Elements 3-5 (I'm assuming you want inclusive) are A[:,2:5]. You won't be able to extract them with a single call. To get them as an array, you could do
import numpy as np
A = np.array([[0, 1, 2, 3, 4, 5, 6], [4, 5, 6, 7, 4, 5, 6]])
my_cols = np.hstack((A[:,1][...,np.newaxis], A[:,2:5]))
The np.newaxis stuff is just to make A[:,1] a 2D array, consistent with A[:,2:5].
Hope this helps.