best way to create numpy array from FOR loop [duplicate] - python

This question already has answers here:
Cartesian product of x and y array points into single array of 2D points
(17 answers)
Closed 6 months ago.
Is there a better way to create a multidimensional array in numpy using a FOR loop, rather than creating a list? This is the only method I could come up with:
import numpy as np
a = []
for x in range(1,6):
for y in range(1,6):
a.append([x,y])
a = np.array(a)
print(f'Type(a) = {type(a)}. a = {a}')
EDIT: I tried doing something like this:
a = np.array([range(1,6),range(1,6)])
a.shape = (5,2)
print(f'Type(a) = {type(a)}. a = {a}')
however, the output is not the same. I'm sure I'm missing something basic.

You could preallocate the array before assigning the respective values:
a = np.empty(shape=(25, 2), dtype=int)
for x in range(1, 6):
for y in range(1, 6):
index = (x-1)*5+(y-1)
a[index] = x, y

Did you had a look at numpy.ndindex? This could do the trick:
a = np.ndindex(6,6)
You could have some more information on Is there a Python equivalent of range(n) for multidimensional ranges?

You can replace double for-loop with itertools.product.
from itertools import product
import numpy as np
np.array(list(product(range(1,6), range(1,6))))
For me creating array from list looks natural here. Don't know how to skip them in that case.

Sometimes it is difficult to predict the total element number and shape at stage of selecting array elements due to some if statement inside a loop.
In this case, put all selected elements in flat array:
a = np.empty((0), int)
for x in range(1,6): # x-coordinate
for y in range(1,6): # y-coordinate
if x!=y: # `if` statement
a = np.append(a, [x, y])
Then, given the lengths of one array dimension (in our case there are 2 coordinates) one can use -1 for the unknown dimension:
a.shape = (-1, 2)
a
array([[1, 2],
[1, 3],
[1, 4],
[1, 5],
[2, 1],
[2, 3],
[2, 4],
[2, 5],
[3, 1],
[3, 2],
[3, 4],
[3, 5],
[4, 1],
[4, 2],
[4, 3],
[4, 5],
[5, 1],
[5, 2],
[5, 3],
[5, 4]])

Related

How to delete particular array in 2 dimensional NumPy array by value?

Let the 2-dimensional array is as below:
In [1]: a = [[1, 2], [3, 4], [5, 6], [1, 2], [7, 8]]
a = np.array(a)
a, type(a)
Out [1]: (array([[1, 2],
[3, 4],
[5, 6],
[1, 2],
[7, 8]]),
numpy.ndarray)
I have tried to do this procedure:
In [2]: a = a[a != [1, 2])
a = np.reshape(a, (int(a.size/2), 2) # I have to do this since on the first line in In [2] change the dimension to 1 [3, 4, 5, 6, 7, 8] (the initial array is 2-dimensional array)
a
Out[2]: array([[3, 4],
[5, 6],
[7, 8]])
My question is, is there any function in NumPy that can directly do that?
Updated Question
Here's the semi-full source code that I've been working on:
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = pd.DataFrame(data.target)
bucket = df[df['Target'] == 0]
bucket = bucket.iloc[:,[0,1]].values
lp, rp = leftestRightest(bucket)
bucket = np.array([x for x in bucket if list(x) != lp])
bucket = np.array([x for x in bucket if list(x) != rp])
Notes:
leftestRightest(arg) is a function that returns 2 one-dimensional NumPy arrays of size 2 (which are lp and rp). For instances, lp = [1, 3], rp = [2, 4] and the parameter is 2-dimensional NumPy array
There should be a more delicate approach, but here what I have come up with:
np.array([x for x in a if list(x) != [1,2]])
Output
[[3, 4], [5, 6], [7, 8]]
Note that I wouldn't recommend working with list comprehensions in the large array since it would be highly time-consuming.
You're approach is correct, but the mask needs to be single-dimensional:
a[(a != [1, 2]).all(-1)]
Output:
array([[3, 4],
[5, 6],
[7, 8]])
Alternatively, you can collect the elements and infer the dimension with -1:
a[a != [1, 2]].reshape(-1, 2)
the boolean condition creates a 2D array of True/False. You have to apply and operation across the columns to make sure the match is not a partial match. Consider a row [5,2] in your above array, the script you wrote will add 5 and ignore 2 in the resultant 1D array. It can be done as follows:
a[np.all(a != [1, 2],axis=1)]

NumPy apply function to groups of rows corresponding to another numpy array

I have a NumPy array with each row representing some (x, y, z) coordinate like so:
a = array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
I also have another NumPy array with unique values of the z-coordinates of that array like so:
b = array([1, 2])
How can I apply a function, let's call it "f", to each of the groups of rows in a which correspond to the values in b? For example, the first value of b is 1 so I would get all rows of a which have a 1 in the z-coordinate. Then, I apply a function to all those values.
In the end, the output would be an array the same shape as b.
I'm trying to vectorize this to make it as fast as possible. Thanks!
Example of an expected output (assuming that f is count()):
c = array([2, 2])
because there are 2 rows in array a which have a z value of 1 in array b and also 2 rows in array a which have a z value of 2 in array b.
A trivial solution would be to iterate over array b like so:
for val in b:
apply function to a based on val
append to an array c
My attempt:
I tried doing something like this, but it just returns an empty array.
func(a[a[:, 2]==b])
The problem is that the groups of rows with the same Z can have different sizes so you cannot stack them into one 3D numpy array which would allow to easily apply a function along the third dimension. One solution is to use a for-loop, another is to use np.split:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2],
[4, 3, 1]])
a_sorted = a[a[:,2].argsort()]
inds = np.unique(a_sorted[:,2], return_index=True)[1]
a_split = np.split(a_sorted, inds)[1:]
# [array([[0, 0, 1],
# [4, 5, 1],
# [4, 3, 1]]),
# array([[1, 1, 2],
# [4, 5, 2]])]
f = np.sum # example of a function
result = list(map(f, a_split))
# [19, 15]
But imho the best solution is to use pandas and groupby as suggested by FBruzzesi. You can then convert the result to a numpy array.
EDIT: For completeness, here are the other two solutions
List comprehension:
b = np.unique(a[:,2])
result = [f(a[a[:,2] == z]) for z in b]
Pandas:
df = pd.DataFrame(a, columns=list('XYZ'))
result = df.groupby(['Z']).apply(lambda x: f(x.values)).tolist()
This is the performance plot I got for a = np.random.randint(0, 100, (n, 3)):
As you can see, approximately up to n = 10^5 the "split solution" is the fastest, but after that the pandas solution performs better.
If you are allowed to use pandas:
import pandas as pd
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').agg(f)
Here f can be any custom function working on grouped data.
Numeric example:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').size()
z
1 2
2 2
dtype: int64
Remark that .size is the way to count number of rows per group.
To keep it into pure numpy, maybe this can suit your case:
tmp = np.array([a[a[:,2]==i] for i in b])
tmp
array([[[0, 0, 1],
[4, 5, 1]],
[[1, 1, 2],
[4, 5, 2]]])
which is an array with each group of arrays.
c = np.array([])
for x in np.nditer(b):
c = np.append(c, np.where((a[:,2] == x))[0].shape[0])
Output:
[2. 2.]

Numpy - Try to split an array according to a monotonic condition

I have a numpy array that is a sequence of (x, y) coordinates. I'm trying to split it according to a monotonic condition. To exemplify this:
cords = np.array([[1,1],[2,3],[2,4],[2,5],[4,3],[4,5],[4,6],[4,7],[5,7],[5,5]])
I would like split the array and make sure for each sub array x is monotonic (appear once). The results should be:
cord1 = np.array([[1,1],[2,3],[4,3],[5,7])
cord2 = np.array([[2,4],[4,5],[5,5])
cord3 = np.array([[2,5],[4,6]])
cord4 = np.array([[4,7]])
Any help is appreciated.
You will have to do this iteratively by extracting monotonic coordinates progressively from the remainder of your array:
import numpy as np
cords = np.array([[1,1],[2,3],[2,4],[2,5],[4,3],[4,5],[4,6],[4,7],[5,7],[5,5]])
result = []
while cords.size>0:
mask = np.insert(cords[:-1,0] != cords[1:,0],0,[True])
result.append(cords[mask,:])
cords = cords[mask==False,:]
Output:
for mono in result: print(list(map(list,mono)))
[[1, 1], [2, 3], [4, 3], [5, 7]]
[[2, 4], [4, 5], [5, 5]]
[[2, 5], [4, 6]]
[[4, 7]]
note: this assumes that the points are in order of their x coordinate. You will need to sort them beforehand if that is not the case.

How to apply slice operator to get n-th column of an array in python? [duplicate]

This question already has an answer here:
Python: slicing a multi-dimensional array
(1 answer)
Closed 3 years ago.
I have 2-D matrix of size n, I want to get the entire n-1th column values into another list. For example,
a = [[1, 2], [3, 4], [5, 6]]
a[:][0] // return [1,2]
how to get 1,3,5 for above a 2-D array into a list using slice operator
To my knowledge, the array slice operator is not suited for what you're looking for.
I would recommend python's list comprehensions.
a = [[1, 2], [3, 4], [5, 6]]
result = [x[0] for x in a]
print(result)
You can do this using the numpy library:
import numpy
a = np.array([[1, 2], [3, 4], [5, 6]])
result = a[:, 0] # Returns a 1-D numpy array [1, 3, 5]
More advanced indexing and slicing options can be found here.

Numpy - add row to array

How does one add rows to a numpy array?
I have an array A:
A = array([[0, 1, 2], [0, 2, 0]])
I wish to add rows to this array from another array X if the first element of each row in X meets a specific condition.
Numpy arrays do not have a method 'append' like that of lists, or so it seems.
If A and X were lists I would merely do:
for i in X:
if i[0] < 3:
A.append(i)
Is there a numpythonic way to do the equivalent?
Thanks,
S ;-)
You can do this:
newrow = [1, 2, 3]
A = numpy.vstack([A, newrow])
What is X? If it is a 2D-array, how can you then compare its row to a number: i < 3?
EDIT after OP's comment:
A = array([[0, 1, 2], [0, 2, 0]])
X = array([[0, 1, 2], [1, 2, 0], [2, 1, 2], [3, 2, 0]])
add to A all rows from X where the first element < 3:
import numpy as np
A = np.vstack((A, X[X[:,0] < 3]))
# returns:
array([[0, 1, 2],
[0, 2, 0],
[0, 1, 2],
[1, 2, 0],
[2, 1, 2]])
As this question is been 7 years before, in the latest version which I am using is numpy version 1.13, and python3, I am doing the same thing with adding a row to a matrix, remember to put a double bracket to the second argument, otherwise, it will raise dimension error.
In here I am adding on matrix A
1 2 3
4 5 6
with a row
7 8 9
same usage in np.r_
A = [[1, 2, 3], [4, 5, 6]]
np.append(A, [[7, 8, 9]], axis=0)
>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
#or
np.r_[A,[[7,8,9]]]
Just to someone's intersted, if you would like to add a column,
array = np.c_[A,np.zeros(#A's row size)]
following what we did before on matrix A, adding a column to it
np.c_[A, [2,8]]
>> array([[1, 2, 3, 2],
[4, 5, 6, 8]])
If you want to prepend, you can just flip the order of the arguments, i.e.:
np.r_([[7, 8, 9]], A)
>> array([[7, 8, 9],
[1, 2, 3],
[4, 5, 6]])
If no calculations are necessary after every row, it's much quicker to add rows in python, then convert to numpy. Here are timing tests using python 3.6 vs. numpy 1.14, adding 100 rows, one at a time:
import numpy as np
from time import perf_counter, sleep
def time_it():
# Compare performance of two methods for adding rows to numpy array
py_array = [[0, 1, 2], [0, 2, 0]]
py_row = [4, 5, 6]
numpy_array = np.array(py_array)
numpy_row = np.array([4,5,6])
n_loops = 100
start_clock = perf_counter()
for count in range(0, n_loops):
numpy_array = np.vstack([numpy_array, numpy_row]) # 5.8 micros
duration = perf_counter() - start_clock
print('numpy 1.14 takes {:.3f} micros per row'.format(duration * 1e6 / n_loops))
start_clock = perf_counter()
for count in range(0, n_loops):
py_array.append(py_row) # .15 micros
numpy_array = np.array(py_array) # 43.9 micros
duration = perf_counter() - start_clock
print('python 3.6 takes {:.3f} micros per row'.format(duration * 1e6 / n_loops))
sleep(15)
#time_it() prints:
numpy 1.14 takes 5.971 micros per row
python 3.6 takes 0.694 micros per row
So, the simple solution to the original question, from seven years ago, is to use vstack() to add a new row after converting the row to a numpy array. But a more realistic solution should consider vstack's poor performance under those circumstances. If you don't need to run data analysis on the array after every addition, it is better to buffer the new rows to a python list of rows (a list of lists, really), and add them as a group to the numpy array using vstack() before doing any data analysis.
You can also do this:
newrow = [1,2,3]
A = numpy.concatenate((A,newrow))
import numpy as np
array_ = np.array([[1,2,3]])
add_row = np.array([[4,5,6]])
array_ = np.concatenate((array_, add_row), axis=0)
I use 'np.vstack' which is faster, EX:
import numpy as np
input_array=np.array([1,2,3])
new_row= np.array([4,5,6])
new_array=np.vstack([input_array, new_row])
I use numpy.insert(arr, i, the_object_to_be_added, axis) in order to insert object_to_be_added at the i'th row(axis=0) or column(axis=1)
import numpy as np
a = np.array([[1, 2, 3], [5, 4, 6]])
# array([[1, 2, 3],
# [5, 4, 6]])
np.insert(a, 1, [55, 66], axis=1)
# array([[ 1, 55, 2, 3],
# [ 5, 66, 4, 6]])
np.insert(a, 2, [50, 60, 70], axis=0)
# array([[ 1, 2, 3],
# [ 5, 4, 6],
# [50, 60, 70]])
Too old discussion, but I hope it helps someone.
If you can do the construction in a single operation, then something like the vstack-with-fancy-indexing answer is a fine approach. But if your condition is more complicated or your rows come in on the fly, you may want to grow the array. In fact the numpythonic way to do something like this - dynamically grow an array - is to dynamically grow a list:
A = np.array([[1,2,3],[4,5,6]])
Alist = [r for r in A]
for i in range(100):
newrow = np.arange(3)+i
if i%5:
Alist.append(newrow)
A = np.array(Alist)
del Alist
Lists are highly optimized for this kind of access pattern; you don't have convenient numpy multidimensional indexing while in list form, but for as long as you're appending it's hard to do better than a list of row arrays.
You can use numpy.append() to append a row to numpty array and reshape to a matrix later on.
import numpy as np
a = np.array([1,2])
a = np.append(a, [3,4])
print a
# [1,2,3,4]
# in your example
A = [1,2]
for row in X:
A = np.append(A, row)

Categories

Resources