Apply Variable Index Across Columns of An Array [duplicate] - python

This question already has an answer here:
Pythonic way for numpy array of array (with index of rows)
(1 answer)
Closed 5 years ago.
I have a 2x3 array like the following:
import numpy as np
y = np.array([[1,2,3], [4,5,6]])
I want to index one element from each column. For example, the 1st element in column 1, the 2nd element in column 2, and the 1st element in column 3. The output should look like this:
ans = [1,5,3]
I tried to use
y[0,1,0]
and
np.take(y, [0,1,0,1], axis=1)
but neither worked. Can you help?

In [448]: y = np.array([[1,2,3], [4,5,6]])
In [450]: idx = [0,1,0]
idx selects the row for successive columns, so you need to pair it with a column indexing list (or array):
In [454]: y[idx,[0,1,2]]
Out[454]: array([1, 5, 3])
In [455]: y[idx, np.arange(y.shape[1])]
Out[455]: array([1, 5, 3])
It may help to visualize this by taking the 'transpose' of the 2 lists:
In [456]: list(zip([0,1,0],[0,1,2]))
Out[456]: [(0, 0), (1, 1), (0, 2)]
In [457]: [y[i,j] for i,j in Out[456]]
Out[457]: [1, 5, 3]
You'd have to do it this way with lists, but numpy does the pairing for you.

you should access it this way :
>>> y
array([[1, 2, 3],
[4, 5, 6]])
>>> y[0,0]
1
>>> y[1,1]
5
>>> y[0,2]
3
>>> [y[0,0], y[1,1],y[0,2]]
[1, 5, 3]
numpy array is reference with coordinates [x,y] in your array.
Then you can find some strategy to solve a specific problem, with a specific logic :
>>> y
array([[1, 2, 3],
[4, 5, 6]])
>>> res=[y[j%2,j] for j in range(y.shape[1])]
>>> res
[1, 5, 3]
>>> y = np.array([[1,1,1,1,1,1,1,1],[4,5,6,7,8,9,10,11]])
>>> res=[y[j%2,j] for j in range(y.shape[1])]
>>> res
[1, 5, 1, 7, 1, 9, 1, 11]
EDIT :
>>> idxCol=[0,1,0]
>>> res=[y[idxCol[i],i] for i in range(len(idxCol))]
>>> res
[1, 5, 3]

Related

NumPy apply function to groups of rows corresponding to another numpy array

I have a NumPy array with each row representing some (x, y, z) coordinate like so:
a = array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
I also have another NumPy array with unique values of the z-coordinates of that array like so:
b = array([1, 2])
How can I apply a function, let's call it "f", to each of the groups of rows in a which correspond to the values in b? For example, the first value of b is 1 so I would get all rows of a which have a 1 in the z-coordinate. Then, I apply a function to all those values.
In the end, the output would be an array the same shape as b.
I'm trying to vectorize this to make it as fast as possible. Thanks!
Example of an expected output (assuming that f is count()):
c = array([2, 2])
because there are 2 rows in array a which have a z value of 1 in array b and also 2 rows in array a which have a z value of 2 in array b.
A trivial solution would be to iterate over array b like so:
for val in b:
apply function to a based on val
append to an array c
My attempt:
I tried doing something like this, but it just returns an empty array.
func(a[a[:, 2]==b])
The problem is that the groups of rows with the same Z can have different sizes so you cannot stack them into one 3D numpy array which would allow to easily apply a function along the third dimension. One solution is to use a for-loop, another is to use np.split:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2],
[4, 3, 1]])
a_sorted = a[a[:,2].argsort()]
inds = np.unique(a_sorted[:,2], return_index=True)[1]
a_split = np.split(a_sorted, inds)[1:]
# [array([[0, 0, 1],
# [4, 5, 1],
# [4, 3, 1]]),
# array([[1, 1, 2],
# [4, 5, 2]])]
f = np.sum # example of a function
result = list(map(f, a_split))
# [19, 15]
But imho the best solution is to use pandas and groupby as suggested by FBruzzesi. You can then convert the result to a numpy array.
EDIT: For completeness, here are the other two solutions
List comprehension:
b = np.unique(a[:,2])
result = [f(a[a[:,2] == z]) for z in b]
Pandas:
df = pd.DataFrame(a, columns=list('XYZ'))
result = df.groupby(['Z']).apply(lambda x: f(x.values)).tolist()
This is the performance plot I got for a = np.random.randint(0, 100, (n, 3)):
As you can see, approximately up to n = 10^5 the "split solution" is the fastest, but after that the pandas solution performs better.
If you are allowed to use pandas:
import pandas as pd
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').agg(f)
Here f can be any custom function working on grouped data.
Numeric example:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').size()
z
1 2
2 2
dtype: int64
Remark that .size is the way to count number of rows per group.
To keep it into pure numpy, maybe this can suit your case:
tmp = np.array([a[a[:,2]==i] for i in b])
tmp
array([[[0, 0, 1],
[4, 5, 1]],
[[1, 1, 2],
[4, 5, 2]]])
which is an array with each group of arrays.
c = np.array([])
for x in np.nditer(b):
c = np.append(c, np.where((a[:,2] == x))[0].shape[0])
Output:
[2. 2.]

How to create n arrays with given names?

I have some 3-dimensional array and I want to take from each array the value at the same position and then copy it into an array with the name of the position.
E.g I have three 2x2x2 array and I want to take the value at position (1,1,1) of each of those matrices and copy it into an array called 111array. This array then should contain three values
The same should be done for all values and all positions in a matrix
I have a for loop which iterates over all values in one array. But I dont know how to save the result to an array in a correct way, that the array name displays the position number.
My first array is called b.
for i in range(b.shape[0]):
for j in range(b.shape[1]):
for k in range(b.shape[2]):
print(b[i,j,k])
Looking for help!
Looks like someone else beat me to an answer, but here is another way of doing it. I used a dictionary to corral all the arrays and return it from a function.
import numpy as np
b = np.array([0, 1, 2, 3, 4, 5, 6, 7])
b = np.reshape(b, (2, 2, 2))
print(b, type(b))
# [[[0 1],
# [2 3]],
# [[4 5],
# [6 7]]] <class 'numpy.ndarray'>
def myfunc(arr):
for i in range(b.shape[0]):
for j in range(b.shape[1]):
for k in range(b.shape[2]):
# Create a new array name from string parts.
name = "arr"+str(i)+str(j)+str(k)
print(name, b[i, j, k])
# Example: 'arr000', 0.
# Add a new key-value pair to the dictionary.
mydict.update({name: b[i,j,k]})
return(mydict)
mydict = {}
result = myfunc(b)
print(result)
# {'arr000': 0, 'arr001': 1, 'arr010': 2, 'arr011': 3, 'arr100': 4,
# 'arr101': 5, 'arr110': 6, 'arr111': 7}
# You would need to unpack the dictionary to use the arrays separately.
# use "mydict.keys()" to get all array names.
# "for key in keys" to loop through all array names.
# mydict['arr000'] will return the value 0.
Your question tags "numpy" but does not use it in your code snippet. If you are trying to stick with numpy, there is another method called "structured data array". It's similar to a dictionary in that "name" and "value" can be stored as paired sets in a numpy array. This keeps numpy's efficient memory management and fast calculation (C optimization). This matters if you are working with large datasets.
Also if working with numpy, there may be a way to use the index values in variable names.
Later, I will think of examples for both and update my answer if possible.
See if this is what you want. This is based on your example.
import numpy as np
from itertools import product
a = np.arange(8).reshape(2,2,2)
b = a + 1
c = a + 2
indices = product(range(2), repeat=3)
all_arrays = []
for i in indices:
suffix = ''.join(map(str,i))
array_name = 'array'+suffix
value = np.array([a[i],b[i],c[i]])
exec(array_name+'= value')
exec(f'all_arrays.append({array_name})')
for name in all_arrays:
print(name)
print('\n')
print(all_arrays)
print('\n')
print(array111)
print('\n')
print(array101)
Output:
[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]
[5 6 7]
[6 7 8]
[7 8 9]
[array([0, 1, 2]), array([1, 2, 3]), array([2, 3, 4]), array([3, 4, 5]), array([4, 5, 6]), array([5, 6, 7]), array([6, 7, 8]), array([7, 8, 9])]
[7 8 9]
[5 6 7]
As others have pointed out, this seems like a weird request. But just for fun, here's a shorter solution:
In [1]: import numpy as np
...: A = np.arange(8).reshape((2,2,2))
...: B = 10*A
...: C = 100*A
In [2]: A
Out[2]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
In [3]: D = np.concatenate((A[None], B[None], C[None]))
...: for (a,b,c) in np.ndindex((2,2,2)):
...: locals()[f'array{a}{b}{c}'] = D[:,a,b,c]
...:
In [4]: array000
Out[4]: array([0, 0, 0])
In [5]: array001
Out[5]: array([ 1, 10, 100])
In [6]: array010
Out[6]: array([ 2, 20, 200])
In [7]: array011
Out[7]: array([ 3, 30, 300])
In [8]: array100
Out[8]: array([ 4, 40, 400])
In [9]: array101
Out[9]: array([ 5, 50, 500])
In [10]: array110
Out[10]: array([ 6, 60, 600])
In [11]: array111
Out[11]: array([ 7, 70, 700])

Assigning to slices of 2D NumPy array

I want to assign 0 to different length slices of a 2d array.
Example:
import numpy as np
arr = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
idxs = np.array([0,1,2,0])
Given the above array arr and indices idxs how can you assign to different length slices. Such that the result is:
arr = np.array([[0,2,3,4],
[0,0,3,4],
[0,0,0,4],
[0,2,3,4]])
These don't work
slices = np.array([np.arange(i) for i in idxs])
arr[slices] = 0
arr[:, :idxs] = 0
You can use broadcasted comparison to generate a mask, and index into arr accordingly:
arr[np.arange(arr.shape[1]) <= idxs[:, None]] = 0
print(arr)
array([[0, 2, 3, 4],
[0, 0, 3, 4],
[0, 0, 0, 4],
[0, 2, 3, 4]])
This does the trick:
import numpy as np
arr = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
idxs = [0,1,2,0]
for i,j in zip(range(arr.shape[0]),idxs):
arr[i,:j+1]=0
import numpy as np
arr = np.array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
idxs = np.array([0, 1, 2, 0])
for i, idx in enumerate(idxs):
arr[i,:idx+1] = 0
Here is a sparse solution that may be useful in cases where only a small fraction of places should be zeroed out:
>>> idx = idxs+1
>>> I = idx.cumsum()
>>> cidx = np.ones((I[-1],), int)
>>> cidx[0] = 0
>>> cidx[I[:-1]]-=idx[:-1]
>>> cidx=np.cumsum(cidx)
>>> ridx = np.repeat(np.arange(idx.size), idx)
>>> arr[ridx, cidx]=0
>>> arr
array([[0, 2, 3, 4],
[0, 0, 3, 4],
[0, 0, 0, 4],
[0, 2, 3, 4]])
Explanation: We need to construct the coordinates of the positions we want to put zeros in.
The row indices are easy: we just need to go from 0 to 3 repeating each number to fill the corresponding slice.
The column indices start at zero and most of the time are incremented by 1. So to construct them we use cumsum on mostly ones. Only at the start of each new row we have to reset. We do that by subtracting the length of the corresponding slice such as to cancel the ones we have summed in that row.

Extracting required indices from an array of tuples

import numpy as np
from scipy import signal
y = np.array([[2, 1, 2, 3, 2, 0, 1, 0],
[2, 1, 2, 3, 2, 0, 1, 0]])
maximas = signal.argrelmax(y, axis=1)
print maximas
(array([0, 0, 1, 1], dtype=int64), array([3, 6, 3, 6], dtype=int64))
The maximas produced the index of tuples: (0,3) and (0,6) are for row one [2, 1, 2, 3, 2, 0, 1, 0]; and (1,6) and (1,6) are for another row [2, 1, 2, 3, 2, 0, 1, 0].
The following prints all the results, but I want to extract only the first maxima of both rows, i.e., [3,3] using the tuples. So, the tuples I need are (0,3) and (1,3).
How can I extract them from the array of tuples, i.e., 'maximas'?
>>> print y[kk]
[3 1 3 1]
Given the tuple maximas, here's one possible NumPy way:
>>> a = np.column_stack(maximas)
>>> a[np.unique(a[:,0], return_index=True)[1]]
array([[0, 3],
[1, 3]], dtype=int64)
This stacks the coordinate lists returned by signal.argrelmax into an array a. The return_index parameter of np.unique is used to find the first index of each row number. We can then retrieve the relevant rows from a using these first indexes.
This returns an array, but you could turn it into a list of lists with tolist().
To return the first column index of the maximum in each row, you just need to take the indices returned by np.unique from maximas[0] and use them to index maximas[1]. In one line, it's this:
>>> maximas[1][np.unique(maximas[0], return_index=True)[1]]
array([3, 3], dtype=int64)
To retrieve the corresponding values from each row of y, you can use np.choose:
>>> cols = maximas[1][np.unique(maximas[0], return_index=True)[1]]
>>> np.choose(cols, y.T)
array([3, 3])
Well, a pure Python approach will be to use itertools.groupby(group on the row's index) and a list comprehension:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> [max(g, key=lambda x: y[x])
for k, g in groupby(zip(*maximas), itemgetter(0))]
[(0, 3), (1, 3)]

Read flat list into multidimensional array/matrix in python

I have a list of numbers that represent the flattened output of a matrix or array produced by another program, I know the dimensions of the original array and want to read the numbers back into either a list of lists or a NumPy matrix. There could be more than 2 dimensions in the original array.
e.g.
data = [0, 2, 7, 6, 3, 1, 4, 5]
shape = (2,4)
print some_func(data, shape)
Would produce:
[[0,2,7,6],
[3,1,4,5]]
Cheers in advance
Use numpy.reshape:
>>> import numpy as np
>>> data = np.array( [0, 2, 7, 6, 3, 1, 4, 5] )
>>> shape = ( 2, 4 )
>>> data.reshape( shape )
array([[0, 2, 7, 6],
[3, 1, 4, 5]])
You can also assign directly to the shape attribute of data if you want to avoid copying it in memory:
>>> data.shape = shape
If you dont want to use numpy, there is a simple oneliner for the 2d case:
group = lambda flat, size: [flat[i:i+size] for i in range(0,len(flat), size)]
And can be generalized for multidimensions by adding recursion:
import operator
def shape(flat, dims):
subdims = dims[1:]
subsize = reduce(operator.mul, subdims, 1)
if dims[0]*subsize!=len(flat):
raise ValueError("Size does not match or invalid")
if not subdims:
return flat
return [shape(flat[i:i+subsize], subdims) for i in range(0,len(flat), subsize)]
For those one liners out there:
>>> data = [0, 2, 7, 6, 3, 1, 4, 5]
>>> col = 4 # just grab the number of columns here
>>> [data[i:i+col] for i in range(0, len(data), col)]
[[0, 2, 7, 6],[3, 1, 4, 5]]
>>> # for pretty print, use either np.array or np.asmatrix
>>> np.array([data[i:i+col] for i in range(0, len(data), col)])
array([[0, 2, 7, 6],
[3, 1, 4, 5]])
Without Numpy we can do as below as well..
l1 = [1,2,3,4,5,6,7,8,9]
def convintomatrix(x):
sqrt = int(len(x) ** 0.5)
matrix = []
while x != []:
matrix.append(x[:sqrt])
x = x[sqrt:]
return matrix
print (convintomatrix(l1))
[list(x) for x in zip(*[iter(data)]*shape[1])]
(found this post searching for how this works)

Categories

Resources