I need to accomplish the following task:
from:
a = array([[1,3,4],[1,2,3]...[1,2,1]])
(add one element to each row) to:
a = array([[1,3,4,x],[1,2,3,x]...[1,2,1,x]])
I have tried doing stuff like a[n] = array([1,3,4,x])
but numpy complained of shape mismatch. I tried iterating through a and appending element x to each item, but the changes are not reflected.
Any ideas on how I can accomplish this?
Appending data to an existing array is a natural thing to want to do for anyone with python experience. However, if you find yourself regularly appending to large arrays, you'll quickly discover that NumPy doesn't easily or efficiently do this the way a python list will. You'll find that every "append" action requires re-allocation of the array memory and short-term doubling of memory requirements. So, the more general solution to the problem is to try to allocate arrays to be as large as the final output of your algorithm. Then perform all your operations on sub-sets (slices) of that array. Array creation and destruction should ideally be minimized.
That said, It's often unavoidable and the functions that do this are:
for 2-D arrays:
np.hstack
np.vstack
np.column_stack
np.row_stack
for 3-D arrays (the above plus):
np.dstack
for N-D arrays:
np.concatenate
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
b = np.array([10,20,30])
c = np.hstack((a, np.atleast_2d(b).T))
returns c:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 20],
[ 1, 2, 1, 30]])
One way to do it (may not be the best) is to create another array with the new elements and do column_stack. i.e.
>>>a = array([[1,3,4],[1,2,3]...[1,2,1]])
[[1 3 4]
[1 2 3]
[1 2 1]]
>>>b = array([1,2,3])
>>>column_stack((a,b))
array([[1, 3, 4, 1],
[1, 2, 3, 2],
[1, 2, 1, 3]])
Appending a single scalar could be done a bit easier as already shown (and also without converting to float) by expanding the scalar to a python-list-type:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack ((a, [[x]] * len (a) ))
returns b as:
array([[ 1, 3, 4, 10],
[ 1, 2, 3, 10],
[ 1, 2, 1, 10]])
Appending a row could be done by:
c = np.vstack ((a, [x] * len (a[0]) ))
returns c as:
array([[ 1, 3, 4],
[ 1, 2, 3],
[ 1, 2, 1],
[10, 10, 10]])
np.insert can also be used for the purpose
import numpy as np
a = np.array([[1, 3, 4],
[1, 2, 3],
[1, 2, 1]])
x = 5
index = 3 # the position for x to be inserted before
np.insert(a, index, x, axis=1)
array([[1, 3, 4, 5],
[1, 2, 3, 5],
[1, 2, 1, 5]])
index can also be a list/tuple
>>> index = [1, 1, 3] # equivalently (1, 1, 3)
>>> np.insert(a, index, x, axis=1)
array([[1, 5, 5, 3, 4, 5],
[1, 5, 5, 2, 3, 5],
[1, 5, 5, 2, 1, 5]])
or a slice
>>> index = slice(0, 3)
>>> np.insert(a, index, x, axis=1)
array([[5, 1, 5, 3, 5, 4],
[5, 1, 5, 2, 5, 3],
[5, 1, 5, 2, 5, 1]])
If x is just a single scalar value, you could try something like this to ensure the correct shape of the array that is being appended/concatenated to the rightmost column of a:
import numpy as np
a = np.array([[1,3,4],[1,2,3],[1,2,1]])
x = 10
b = np.hstack((a,x*np.ones((a.shape[0],1))))
returns b as:
array([[ 1., 3., 4., 10.],
[ 1., 2., 3., 10.],
[ 1., 2., 1., 10.]])
target = []
for line in a.tolist():
new_line = line.append(X)
target.append(new_line)
return array(target)
Related
The code that I have in place goes something as follows:
import numpy as np
z = np.array([
[1, 2],
[3]
])
x = np.array([
[4, 5]
])
print(np.multiply(x,z))
The output of this program creates a list of lists. This is different than the regular broadcasting rules that apply on arrays with equal dimensions. Is there a name for this property? Also why does it explicitly mention the word list in the output?
[[list([1, 2, 1, 2, 1, 2, 1, 2]) list([3, 3, 3, 3, 3])]]
[Finished in 0.244s]
This is just normal cell-by-cell multiplication. Because your z array is not a true matrix (it does not have a square shape), Numpy interprets it as a row of two objects:
>>> z
array([[1, 2], [3]], dtype=object)
>>> z.shape
(2,)
From here here you multiply normally - the first object is multiplied by 4, the second by 5:
>>> [1, 2]*4
[1, 2, 1, 2, 1, 2, 1, 2]
>>> [3]*5
[3, 3, 3, 3, 3]
just normal Python list multiplication - this is the result you get. Indeed, your result is not a "list of lists". It's an array of shape (1, 2) of dtype=object, so a row of two objects (which happen to be lists):
>>> np.multiply(x,z)
array([[[1, 2, 1, 2, 1, 2, 1, 2], [3, 3, 3, 3, 3]]], dtype=object)
>>> np.multiply(x,z).shape
(1, 2)
I have the following numpy array
import numpy as np
a = np.array([1,2,6,8])
I want to create another numpy array from a such that it contains all the different possible sums of TWO elements of a. It's easy to show then that there are int(a.size*(a.size-1)/2) different possible sums, composed from:
a[0] + a[1]
a[0] + a[2]
a[0] + a[3]
a[1] + a[2]
a[1] + a[3]
a[2] + a[3]
How can I construct a numpy array with the above sums as elements without using a double for loop (the only way I can think of it). For the above example, the output should be [3,7,9,8,10,14]
MWE
eff = int(a.size*(a.size-1)/2)
c = np.empty((0, eff))
You can use triu_indices:
i0,i1 = np.triu_indices(4,1)
a[i0]
# array([1, 1, 1, 2, 2, 6])
a[i1]
# array([2, 6, 8, 6, 8, 8])
a[i0]+a[i1]
# array([ 3, 7, 9, 8, 10, 14])
For more terms we need to build our own "nd_triu_idx". Here is how to do it for 3 terms out of a list of 5:
n = 5
full = np.mgrid[:n,:n,:n]
nd_triu_idx = full[:,(np.diff(full,axis=0)>0).all(axis=0)]
nd_triu_idx
# array([[0, 0, 0, 0, 0, 0, 1, 1, 1, 2],
# [1, 1, 1, 2, 2, 3, 2, 2, 3, 3],
# [2, 3, 4, 3, 4, 4, 3, 4, 4, 4]])
To fully generalize the number of terms use something like
k = 4
full = np.mgrid[k*(slice(n),)]
etc.
You can do combinations on your array of size 2 and sum each one:
import numpy as np
from itertools import combinations
a = np.array([1,2,6,8])
print(list(map(sum, combinations(a, 2))))
# [3, 7, 9, 8, 10, 14]
Or using numpy:
import numpy as np
a = np.array([1,2,6,8,1])
b = a + a[:,None]
print(b[np.triu_indices(4, 1)])
# [ 3 7 9 8 10 14]
What about computing the cartesian product of exponentiated version of a?
>>> a = np.array([1, 2, 6, 8])[:, None]
>>> b = np.exp(a)
>>> np.unique(np.tril(np.log(np.dot(b, b.T)), k=-1))[1:]
array([ 3., 7., 8., 9., 10., 14.])
There are those two numpy arrays:
a = np.array([
[
[1,2,3,0,0],
[4,5,6,0,0],
[7,8,9,0,0]
],
[
[1,3,5,0,0],
[2,4,6,0,0],
[1,1,1,0,0]
]
])
b = np.array([
[
[1,2],
[2,3],
[3,4]
],
[
[4,1],
[5,2],
[6,3]
]
])
with shapes:
"a" shape: (2, 3, 5), "b" shape: (2, 3, 2)
I want to replace the last two elements from array a with those from array b, e.g.
c = np.array([
[
[1,2,3,1,2],
[4,5,6,2,3],
[7,8,9,3,4]
],
[
[1,3,5,4,1],
[2,4,6,5,2],
[1,1,1,6,3]
]
])
However, np.hstack((a[:,:,:-2], b)) throws a Value Error:
all the input array dimensions except for the concatenation axis must
match exactly
and in general doesn't look like it's the correct function to use. Append doesn't work either.
Is there a method in numpy that can do that or do I need to iterate over the arrays with a for loop and manipulate them manually?
You could use the direct indices like so:
a[:, :, 3:] = b
Non-overwriting method:
a[:,:,-2:] fetches the zeros at the end; use a[:,:,:3].
According to the documentation, np.hstack(x) is equivalent to np.concatenate(x, axis=1). Since you want to join the matrices on their innermost rows, you should use axis=2.
Code:
>>> np.concatenate((a[:,:,:3], b), axis=2)
array([[[1, 2, 3, 1, 2],
[4, 5, 6, 2, 3],
[7, 8, 9, 3, 4]],
[[1, 3, 5, 4, 1],
[2, 4, 6, 5, 2],
[1, 1, 1, 6, 3]]])
I have a very huge numpy array like this:
np.array([1, 2, 3, 4, 5, 6, 7 , ... , 12345])
I need to create subgroups of n elements (in the example n = 3) in another array like this:
np.array([[1, 2, 3],[4, 5, 6], [6, 7, 8], [...], [12340, 12341, 12342], [12343, 12344, 12345]])
I did accomplish that using normal python lists, just appending the subgroups to another list. But, I'm having a hard time trying to do that in numpy.
Any ideas how can I do that?
Thanks!
You can use np.reshape(-1, 3), where the -1 means "whatever's left".
>>> array = np.arange(1, 12346)
>>> array
array([ 1, 2, 3, ..., 12343, 12344, 12345])
>>> array.reshape(-1, 3)
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
...,
[12337, 12338, 12339],
[12340, 12341, 12342],
[12343, 12344, 12345]])
You can use np.reshape():
From the documentation (link in title):
numpy.reshape(a, newshape, order='C')
Gives a new shape to an array without changing its data.
Here is an example of how you can apply it to your situation:
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 12345])
>>> a.reshape((int(len(a)/3), 3))
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 12345]], dtype=object)
Note that obviously, the length of the array (len(a)) has to be a multiple of 3 to be able to reshape it into a 2-dimensional numpy array, because they must be rectangular.
Suppose I have a numpy array as below
a = np.asarray([[1,2,3],[1,4,3],[2,5,4],[2,7,5]])
array([[1, 2, 3],
[1, 4, 3],
[2, 5, 4],
[2, 7, 5]])
How can I flatten column 2 and 3 for each unique element in column 1 like below:
array([[1, 2, 3, 4, 3],
[2, 5, 4, 7, 5],])
Thank you for your help.
Another option using list comprehension:
np.array([np.insert(a[a[:,0] == k, 1:].flatten(), 0, k) for k in np.unique(a[:,0])])
# array([[1, 2, 3, 4, 3],
# [2, 5, 4, 7, 5]])
import numpy as np
a = np.asarray([[1,2,3],[1,4,3],[2,5,4],[2,7,5]])
d = {}
for row in a:
d[row[0]] = np.concatenate( (d.get(row[0], []), row[1:]) )
r = np.array([np.concatenate(([key], d[key])) for key in d])
print(r)
This prints:
[[ 1. 2. 3. 4. 3.]
[ 2. 5. 4. 7. 5.]]
Since as posted in the comments, we know that each unique element in column-0 would have a fixed number of rows and by which I assumed it was meant same number of rows, we can use a vectorized approach to solve the case. We sort the rows based on column-0 and look for shifts along it, which would signify group change and thus give us the exact number of rows associated per unique element in column-0. Let's call it L. Finally, we slice sorted array to select columns-1,2 and group L rows together by reshaping. Thus, the implementation would be -
sa = a[a[:,0].argsort()]
L = np.unique(sa[:,0],return_index=True)[1][1]
out = np.column_stack((sa[::L,0],sa[:,1:].reshape(-1,2*L)))
For more performance boost, we can use np.diff to calculate L, like so -
L = np.where(np.diff(sa[:,0])>0)[0][0]+1
Sample run -
In [103]: a
Out[103]:
array([[1, 2, 3],
[3, 7, 8],
[1, 4, 3],
[2, 5, 4],
[3, 8, 2],
[2, 7, 5]])
In [104]: sa = a[a[:,0].argsort()]
...: L = np.unique(sa[:,0],return_index=True)[1][1]
...: out = np.column_stack((sa[::L,0],sa[:,1:].reshape(-1,2*L)))
...:
In [105]: out
Out[105]:
array([[1, 2, 3, 4, 3],
[2, 5, 4, 7, 5],
[3, 7, 8, 8, 2]])