How does one add rows to a numpy array?
I have an array A:
A = array([[0, 1, 2], [0, 2, 0]])
I wish to add rows to this array from another array X if the first element of each row in X meets a specific condition.
Numpy arrays do not have a method 'append' like that of lists, or so it seems.
If A and X were lists I would merely do:
for i in X:
if i[0] < 3:
A.append(i)
Is there a numpythonic way to do the equivalent?
Thanks,
S ;-)
You can do this:
newrow = [1, 2, 3]
A = numpy.vstack([A, newrow])
What is X? If it is a 2D-array, how can you then compare its row to a number: i < 3?
EDIT after OP's comment:
A = array([[0, 1, 2], [0, 2, 0]])
X = array([[0, 1, 2], [1, 2, 0], [2, 1, 2], [3, 2, 0]])
add to A all rows from X where the first element < 3:
import numpy as np
A = np.vstack((A, X[X[:,0] < 3]))
# returns:
array([[0, 1, 2],
[0, 2, 0],
[0, 1, 2],
[1, 2, 0],
[2, 1, 2]])
As this question is been 7 years before, in the latest version which I am using is numpy version 1.13, and python3, I am doing the same thing with adding a row to a matrix, remember to put a double bracket to the second argument, otherwise, it will raise dimension error.
In here I am adding on matrix A
1 2 3
4 5 6
with a row
7 8 9
same usage in np.r_
A = [[1, 2, 3], [4, 5, 6]]
np.append(A, [[7, 8, 9]], axis=0)
>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
#or
np.r_[A,[[7,8,9]]]
Just to someone's intersted, if you would like to add a column,
array = np.c_[A,np.zeros(#A's row size)]
following what we did before on matrix A, adding a column to it
np.c_[A, [2,8]]
>> array([[1, 2, 3, 2],
[4, 5, 6, 8]])
If you want to prepend, you can just flip the order of the arguments, i.e.:
np.r_([[7, 8, 9]], A)
>> array([[7, 8, 9],
[1, 2, 3],
[4, 5, 6]])
If no calculations are necessary after every row, it's much quicker to add rows in python, then convert to numpy. Here are timing tests using python 3.6 vs. numpy 1.14, adding 100 rows, one at a time:
import numpy as np
from time import perf_counter, sleep
def time_it():
# Compare performance of two methods for adding rows to numpy array
py_array = [[0, 1, 2], [0, 2, 0]]
py_row = [4, 5, 6]
numpy_array = np.array(py_array)
numpy_row = np.array([4,5,6])
n_loops = 100
start_clock = perf_counter()
for count in range(0, n_loops):
numpy_array = np.vstack([numpy_array, numpy_row]) # 5.8 micros
duration = perf_counter() - start_clock
print('numpy 1.14 takes {:.3f} micros per row'.format(duration * 1e6 / n_loops))
start_clock = perf_counter()
for count in range(0, n_loops):
py_array.append(py_row) # .15 micros
numpy_array = np.array(py_array) # 43.9 micros
duration = perf_counter() - start_clock
print('python 3.6 takes {:.3f} micros per row'.format(duration * 1e6 / n_loops))
sleep(15)
#time_it() prints:
numpy 1.14 takes 5.971 micros per row
python 3.6 takes 0.694 micros per row
So, the simple solution to the original question, from seven years ago, is to use vstack() to add a new row after converting the row to a numpy array. But a more realistic solution should consider vstack's poor performance under those circumstances. If you don't need to run data analysis on the array after every addition, it is better to buffer the new rows to a python list of rows (a list of lists, really), and add them as a group to the numpy array using vstack() before doing any data analysis.
You can also do this:
newrow = [1,2,3]
A = numpy.concatenate((A,newrow))
import numpy as np
array_ = np.array([[1,2,3]])
add_row = np.array([[4,5,6]])
array_ = np.concatenate((array_, add_row), axis=0)
I use 'np.vstack' which is faster, EX:
import numpy as np
input_array=np.array([1,2,3])
new_row= np.array([4,5,6])
new_array=np.vstack([input_array, new_row])
I use numpy.insert(arr, i, the_object_to_be_added, axis) in order to insert object_to_be_added at the i'th row(axis=0) or column(axis=1)
import numpy as np
a = np.array([[1, 2, 3], [5, 4, 6]])
# array([[1, 2, 3],
# [5, 4, 6]])
np.insert(a, 1, [55, 66], axis=1)
# array([[ 1, 55, 2, 3],
# [ 5, 66, 4, 6]])
np.insert(a, 2, [50, 60, 70], axis=0)
# array([[ 1, 2, 3],
# [ 5, 4, 6],
# [50, 60, 70]])
Too old discussion, but I hope it helps someone.
If you can do the construction in a single operation, then something like the vstack-with-fancy-indexing answer is a fine approach. But if your condition is more complicated or your rows come in on the fly, you may want to grow the array. In fact the numpythonic way to do something like this - dynamically grow an array - is to dynamically grow a list:
A = np.array([[1,2,3],[4,5,6]])
Alist = [r for r in A]
for i in range(100):
newrow = np.arange(3)+i
if i%5:
Alist.append(newrow)
A = np.array(Alist)
del Alist
Lists are highly optimized for this kind of access pattern; you don't have convenient numpy multidimensional indexing while in list form, but for as long as you're appending it's hard to do better than a list of row arrays.
You can use numpy.append() to append a row to numpty array and reshape to a matrix later on.
import numpy as np
a = np.array([1,2])
a = np.append(a, [3,4])
print a
# [1,2,3,4]
# in your example
A = [1,2]
for row in X:
A = np.append(A, row)
Related
I have a list of coefficients and a list of times
a = np.array([0,1,2,3])
t = np.array([1,2,3])
I would like to perform some multiplicative operation on the two where each coefficient is multiplied by each of the times to result in an array like:
array([[0, 0, 0],
[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
I can do this with a for loop like:
np.array([i * t for i in a])
However I was wondering whether there was a more efficient numpythonic way of performing this operation without the for loop as in reality I have much bigger arrays and multiple sets of coefficients?
Try this (uses broadcasting).
>>> import numpy as np
>>> a = np.array([0, 1, 2, 3])
>>> t = np.array([1, 2, 3])
>>> res1 = t * a[:, None]
>>> res1
array([[0, 0, 0],
[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
My prefered way is:
a[:, None] * t
But there is also a special method for that:
np.outer(a, t)
I have the following grid(actually a dataframe):
params = pd.DataFrame(np.array([(alpha, gamma) for alpha in np.linspace(0,1,10) for gamma in np.linspace(0,2,10)]),
columns = ['alpha','gamma'])
Then I use apply
params['res'] = params.apply(lambda row: func(x=params['alpha'],y=params['gamma'],axis=1)
How do I unpack the above into a matrix/dataframe below?
pd.DataFrame(elements of params['res'],
index = np.linspace(0,1,10),
columns = np.linspace(0,2,10))
You can first trasform the res Series to a numpy array, then use the reshape method:
result_df = pd.DataFrame(params['res'].to_numpy().reshape(10,10),
index = np.linspace(0,1,10),
columns = np.linspace(0,2,10))
Using a smaller size example:
# Simulating res
res = np.random.randint(0,10, 9)
res
array([9, 3, 5, 9, 3, 1, 4, 0, 6])
res.reshape(3,3)
array([[9, 3, 5],
[9, 3, 1],
[4, 0, 6]])
If this is not your expected result, you can traspose it:
res.reshape(3,3).T
array([[9, 9, 4],
[3, 3, 0],
[5, 1, 6]])
You are looking for pivot:
params.pivot(index='alpha', columns='gamma', values='res')
If the question only ask the fastest way to make a 10 by 10 numpy 2d array/matrix from the res column, then I think this solution is the one:
params.res.values.reshape(10,10)
I have a NumPy array with each row representing some (x, y, z) coordinate like so:
a = array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
I also have another NumPy array with unique values of the z-coordinates of that array like so:
b = array([1, 2])
How can I apply a function, let's call it "f", to each of the groups of rows in a which correspond to the values in b? For example, the first value of b is 1 so I would get all rows of a which have a 1 in the z-coordinate. Then, I apply a function to all those values.
In the end, the output would be an array the same shape as b.
I'm trying to vectorize this to make it as fast as possible. Thanks!
Example of an expected output (assuming that f is count()):
c = array([2, 2])
because there are 2 rows in array a which have a z value of 1 in array b and also 2 rows in array a which have a z value of 2 in array b.
A trivial solution would be to iterate over array b like so:
for val in b:
apply function to a based on val
append to an array c
My attempt:
I tried doing something like this, but it just returns an empty array.
func(a[a[:, 2]==b])
The problem is that the groups of rows with the same Z can have different sizes so you cannot stack them into one 3D numpy array which would allow to easily apply a function along the third dimension. One solution is to use a for-loop, another is to use np.split:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2],
[4, 3, 1]])
a_sorted = a[a[:,2].argsort()]
inds = np.unique(a_sorted[:,2], return_index=True)[1]
a_split = np.split(a_sorted, inds)[1:]
# [array([[0, 0, 1],
# [4, 5, 1],
# [4, 3, 1]]),
# array([[1, 1, 2],
# [4, 5, 2]])]
f = np.sum # example of a function
result = list(map(f, a_split))
# [19, 15]
But imho the best solution is to use pandas and groupby as suggested by FBruzzesi. You can then convert the result to a numpy array.
EDIT: For completeness, here are the other two solutions
List comprehension:
b = np.unique(a[:,2])
result = [f(a[a[:,2] == z]) for z in b]
Pandas:
df = pd.DataFrame(a, columns=list('XYZ'))
result = df.groupby(['Z']).apply(lambda x: f(x.values)).tolist()
This is the performance plot I got for a = np.random.randint(0, 100, (n, 3)):
As you can see, approximately up to n = 10^5 the "split solution" is the fastest, but after that the pandas solution performs better.
If you are allowed to use pandas:
import pandas as pd
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').agg(f)
Here f can be any custom function working on grouped data.
Numeric example:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').size()
z
1 2
2 2
dtype: int64
Remark that .size is the way to count number of rows per group.
To keep it into pure numpy, maybe this can suit your case:
tmp = np.array([a[a[:,2]==i] for i in b])
tmp
array([[[0, 0, 1],
[4, 5, 1]],
[[1, 1, 2],
[4, 5, 2]]])
which is an array with each group of arrays.
c = np.array([])
for x in np.nditer(b):
c = np.append(c, np.where((a[:,2] == x))[0].shape[0])
Output:
[2. 2.]
I try to have a matrix like:
M= [[1,1,..,1],
[2,2,..,2],
...
[40000, 40000, ..,40000]
It's what I tried:
data = np.mat((40000,8))
print(data.shape)
for i in range(data.shape[0]):
data[i,:] = i
print(data[:5])
The above code prints:
(1, 2)
[[0 0]]
I know how to fill a matrix with constant values, but I couldn't find a similar question for this case.
Use a simple array and don't forget that Python starts indexing at 0:
data = np.zeros((40000,8))
for i in range(data.shape[0]):
data[i,:] = i+1
Here's a way using numpy:
rows = 10
cols = 3
l = np.arange(1,rows)
np.tile(l,cols).reshape(cols,rows-1).T
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7],
[8, 8, 8],
[9, 9, 9]])
Matthieu Brucher's answer will perfectly do for your case. If you are looking at numbers much higher than 4000 and if time is an issue, you might want to get rid of the for-loop and create a list of lists with list comprehension before turning it into a numpy array:
a = [[i]*8 for i in range(1,4001)]
m = np.asarray(a)
In my case, this solution was ~7 times faster.
To use numpy broadcast over iterations u can do,
import numpy as np
M = np.ones((40000,8), dtype=np.int).T * np.arange(1, 40001)
M = M.T
print(M)
This should be faster than any above iterations.
If that's what u are looking for
Very simple:
data = np.arange(1, 40001).repeat(8).reshape(-1,8)
Though this is pure numpy as well, this is considerably slower than #yatu's solution.
Suppose I have an array, I want to have a matrix from that array by a matrix of index.
import numpy as np
arr = np.array([1,5])
mtxidx = np.array([[0,1,0],[0,1,1],[0,0,0]])
How can I get a matrix [[1,5,1],[1,5,5],[1,1,1]] ?
An initial thought is simply say
arr(mtxidx)
however it doesn't work
Is there any function/method that do this elegantly?
"Fancy" indexing works for me (NB in your question you are trying to call the array object (round brackets) but NumPy "ndarray" objects are not callable):
In [61]: arr[mtxidx]
Out[61]:
array([[1, 5, 1],
[1, 5, 5],
[1, 1, 1]])
Your initial thought was pretty close, simply replacing the parenthesis with [] would make it work.
arr[mtxidx]
A list comprehension would work as well.
>>> np.array([arr[row] for row in mtxidx])
array([[1, 5, 1],
[1, 5, 5],
[1, 1, 1]])
I upvote the fancy indexing proposed by #xnx but if you would have done something in same range but involving an operation (or ..anything else) you can also try this :
arr = np.array([1,5])
mtxidx = np.array([[0,1,0],[0,1,1],[0,0,0]])
def func(v):
return arr[v]
vfunc = np.vectorize(func)
vfunc(mtxidx)
# array([[1, 5, 1],
# [1, 5, 5],
# [1, 1, 1]])