How to create n arrays with given names? - python

I have some 3-dimensional array and I want to take from each array the value at the same position and then copy it into an array with the name of the position.
E.g I have three 2x2x2 array and I want to take the value at position (1,1,1) of each of those matrices and copy it into an array called 111array. This array then should contain three values
The same should be done for all values and all positions in a matrix
I have a for loop which iterates over all values in one array. But I dont know how to save the result to an array in a correct way, that the array name displays the position number.
My first array is called b.
for i in range(b.shape[0]):
for j in range(b.shape[1]):
for k in range(b.shape[2]):
print(b[i,j,k])
Looking for help!

Looks like someone else beat me to an answer, but here is another way of doing it. I used a dictionary to corral all the arrays and return it from a function.
import numpy as np
b = np.array([0, 1, 2, 3, 4, 5, 6, 7])
b = np.reshape(b, (2, 2, 2))
print(b, type(b))
# [[[0 1],
# [2 3]],
# [[4 5],
# [6 7]]] <class 'numpy.ndarray'>
def myfunc(arr):
for i in range(b.shape[0]):
for j in range(b.shape[1]):
for k in range(b.shape[2]):
# Create a new array name from string parts.
name = "arr"+str(i)+str(j)+str(k)
print(name, b[i, j, k])
# Example: 'arr000', 0.
# Add a new key-value pair to the dictionary.
mydict.update({name: b[i,j,k]})
return(mydict)
mydict = {}
result = myfunc(b)
print(result)
# {'arr000': 0, 'arr001': 1, 'arr010': 2, 'arr011': 3, 'arr100': 4,
# 'arr101': 5, 'arr110': 6, 'arr111': 7}
# You would need to unpack the dictionary to use the arrays separately.
# use "mydict.keys()" to get all array names.
# "for key in keys" to loop through all array names.
# mydict['arr000'] will return the value 0.
Your question tags "numpy" but does not use it in your code snippet. If you are trying to stick with numpy, there is another method called "structured data array". It's similar to a dictionary in that "name" and "value" can be stored as paired sets in a numpy array. This keeps numpy's efficient memory management and fast calculation (C optimization). This matters if you are working with large datasets.
Also if working with numpy, there may be a way to use the index values in variable names.
Later, I will think of examples for both and update my answer if possible.

See if this is what you want. This is based on your example.
import numpy as np
from itertools import product
a = np.arange(8).reshape(2,2,2)
b = a + 1
c = a + 2
indices = product(range(2), repeat=3)
all_arrays = []
for i in indices:
suffix = ''.join(map(str,i))
array_name = 'array'+suffix
value = np.array([a[i],b[i],c[i]])
exec(array_name+'= value')
exec(f'all_arrays.append({array_name})')
for name in all_arrays:
print(name)
print('\n')
print(all_arrays)
print('\n')
print(array111)
print('\n')
print(array101)
Output:
[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]
[5 6 7]
[6 7 8]
[7 8 9]
[array([0, 1, 2]), array([1, 2, 3]), array([2, 3, 4]), array([3, 4, 5]), array([4, 5, 6]), array([5, 6, 7]), array([6, 7, 8]), array([7, 8, 9])]
[7 8 9]
[5 6 7]

As others have pointed out, this seems like a weird request. But just for fun, here's a shorter solution:
In [1]: import numpy as np
...: A = np.arange(8).reshape((2,2,2))
...: B = 10*A
...: C = 100*A
In [2]: A
Out[2]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
In [3]: D = np.concatenate((A[None], B[None], C[None]))
...: for (a,b,c) in np.ndindex((2,2,2)):
...: locals()[f'array{a}{b}{c}'] = D[:,a,b,c]
...:
In [4]: array000
Out[4]: array([0, 0, 0])
In [5]: array001
Out[5]: array([ 1, 10, 100])
In [6]: array010
Out[6]: array([ 2, 20, 200])
In [7]: array011
Out[7]: array([ 3, 30, 300])
In [8]: array100
Out[8]: array([ 4, 40, 400])
In [9]: array101
Out[9]: array([ 5, 50, 500])
In [10]: array110
Out[10]: array([ 6, 60, 600])
In [11]: array111
Out[11]: array([ 7, 70, 700])

Related

Scipy's linear_sum_assignment giving incorrect result

When I tried using scipy.optimize.linear_sum_assignment as shown, it gives the assignment vector [0 2 3 1] with a total cost of 15.
However, from the cost matrix c, you can see that for the second task, the 5th agent has a cost of 1. So the expected assignment should be [0 3 None 2 1] (total cost of 9)
Why is linear_sum_assignment not returning the optimal assignments?
from scipy.optimize import linear_sum_assignment
c = [
[1, 5, 9, 5],
[5, 8, 3, 2],
[3, 2, 6, 8],
[7, 3, 5, 4],
[2, 1, 9, 9],
]
results = linear_sum_assignment(c)
print(results[1]) # [0 2 3 1]
linear_sum_assignment returns a tuple of two arrays. These are the row indices and column indices of the assigned values. For your example (with c converted to a numpy array):
In [51]: c
Out[51]:
array([[1, 5, 9, 5],
[5, 8, 3, 2],
[3, 2, 6, 8],
[7, 3, 5, 4],
[2, 1, 9, 9]])
In [52]: row, col = linear_sum_assignment(c)
In [53]: row
Out[53]: array([0, 1, 3, 4])
In [54]: col
Out[54]: array([0, 2, 3, 1])
The corresponding index pairs from row and col give the selected entries. That is, the indices of the selected entries are (0, 0), (1, 2), (3, 3) and (4, 1). It is these pairs that are the "assignments".
The sum associated with this assignment is 9:
In [55]: c[row, col].sum()
Out[55]: 9
In the original version of the question (but since edited),
it looks like you wanted to know the row index for each column, so you expected [0, 4, 1, 3]. The values that you want are in row, but the order is not what you expect, because the indices in col are not simply [0, 1, 2, 3]. To get the result in the form that you expected, you have to reorder the values in row based on the order of the indices in col. Here are two ways to do that.
First:
In [56]: result = np.zeros(4, dtype=int)
In [57]: result[col] = row
In [58]: result
Out[58]: array([0, 4, 1, 3])
Second:
In [59]: result = row[np.argsort(col)]
In [60]: result
Out[60]: array([0, 4, 1, 3])
Note that the example in the linear_sum_assignment docstring is potentially misleading; because it displays only col_ind in the python session, it gives the impression that col_ind is "the answer". In general, however, the answer involves both of the returned arrays.

how to understand such shuffling data code in Numpy

I am learning at Numpy and I want to understand such shuffling data code as following:
# x is a m*n np.array
# return a shuffled-rows array
def shuffle_col_vals(x):
rand_x = np.array([np.random.choice(x.shape[0], size=x.shape[0], replace=False) for i in range(x.shape[1])]).T
grid = np.indices(x.shape)
rand_y = grid[1]
return x[(rand_x, rand_y)]
So I input an np.array object as following:
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
And I get a output of shuffle_col_vals(x1) like comments as following:
array([[ 1, 5, 11, 15],
[ 3, 8, 9, 14],
[ 4, 6, 12, 16],
[ 2, 7, 10, 13]], dtype=int64)
I get confused about the initial way of rand_x and I didn't get such way in numpy.array
And I have been thinking it a long time, but I still don't understand why return x[(rand_x, rand_y)] will get a shuffled-rows array.
If not mind, could anyone explain the code to me?
Thanks in advance.
In indexing Numpy arrays, you can take single elements. Let's use a 3x4 array to be able to differentiate between the axes:
In [1]: x1 = np.array([[1, 2, 3, 4],
...: [5, 6, 7, 8],
...: [9, 10, 11, 12]], dtype=int)
In [2]: x1[0, 0]
Out[2]: 1
If you review Numpy Advanced indexing, you will find that you can do more in indexing, by providing lists for each dimension. Consider indexing with x1[rows..., cols...], let's take two elements.
Pick from the first and second row, but always from the first column:
In [3]: x1[[0, 1], [0, 0]]
Out[3]: array([1, 5])
You can even index with arrays:
In [4]: x1[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]
Out[4]:
array([[1, 2],
[5, 6]])
np.indices creates a row and col array, that if used for indexing, give back the original array:
In [5]: grid = np.indices(x1.shape)
In [6]: np.alltrue(x1[grid[0], grid[1]] == x1)
Out[6]: True
Now if you shuffle the values of grid[0] col-wise, but keep grid[1] as-is, and then use these for indexing, you get an array with the values of the columns shuffled.
Each column index vector is [0, 1, 2]. The code now shuffles these column index vectors for each column individually, and stacks them together into rand_x into the same shape as x1.
Create a single shuffled column index vector:
In [7]: np.random.seed(0)
In [8]: np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
Out[8]: array([2, 1, 0])
The stacking works by (pseudo-code) stacking with [random-index-col-vec for cols in range(x1.shape[1])] and then transposing (.T).
To make it a little clearer we can rewrite i as col and use column_stack instead of np.array([... for col]).T:
In [9]: np.random.seed(0)
In [10]: col_list = [np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
for col in range(x1.shape[1])]
In [11]: col_list
Out[11]: [array([2, 1, 0]), array([2, 0, 1]), array([0, 2, 1]), array([2, 0, 1])]
In [12]: rand_x = np.column_stack(col_list)
In [13]: rand_x
Out[13]:
array([[2, 2, 0, 2],
[1, 0, 2, 0],
[0, 1, 1, 1]])
In [14]: x1[rand_x, grid[1]]
Out[14]:
array([[ 9, 10, 3, 12],
[ 5, 2, 11, 4],
[ 1, 6, 7, 8]])
Details to note:
the example output you give is different from what the function you provide does. It seems to be transposed.
the use of rand_x and rand_y in the sample code can be confusing when being used to the convention of x=column index, y=row index
See output:
import numpy as np
def shuffle_col_val(x):
print("----------------------------\n A rand_x\n")
f = np.random.choice(x.shape[0], size=x.shape[0], replace=False)
print(f, "\nNow I transpose an array.")
rand_x = np.array([f]).T
print(rand_x)
print("----------------------------\n B rand_y\n")
print("Grid gives you two possibilities\n you choose second:")
grid = np.indices(x.shape)
print(format(grid))
rand_y = grid[1]
print("\n----------------------------\n C Our rand_x, rand_y:")
print("\nThe order of values in the column CHANGE:\n has random order\n{}".format(rand_x))
print("\nThe order of values in the row NO CHANGE:\n has normal order 0, 1, 2, 3\n{}".format(rand_y))
return x[(rand_x, rand_y)]
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
print("\n----------------------------\n D Our shuffled-rows: \n{}\n".format(shuffle_col_val(x1)))
Output:
A rand_x
[2 3 0 1]
Now I transpose an array.
[[2]
[3]
[0]
[1]]
----------------------------
B rand_y
Grid gives you two possibilities, you choose second:
[[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]]
----------------------------
C Our rand_x, rand_y:
The order of values in the column CHANGE: has random order
[[2]
[3]
[0]
[1]]
The order of values in the row NO CHANGE: has normal order 0, 1, 2, 3
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
----------------------------
D Our shuffled-rows:
[[ 9 10 11 12]
[13 14 15 16]
[ 1 2 3 4]
[ 5 6 7 8]]

Replace values in array of indexes corresponding to another array

I have an array A of size [1, x] of values and an array B of size [1, y] (y > x) of indexes corresponding to array A. I want as result an array C of size [1,y] filled with values of A.
Here is an example of inputs and outputs:
>>> A = [6, 7, 8]
>>> B = [0, 2, 0, 0, 1]
>>> C = #Some operations
>>> C
[6, 8, 6, 6, 7]
Of course I could solve it like that:
>>> C = []
>>> for val in B:
>>> C.append(A[val])
But I was actually expected a nicer way to do it. Especially because I want to use it as an argument of another function. An expression looking like A[B] (but a working one) would be ideal. I don't mind solution using NumPy or pandas.
Simple with a list comprehension:
A = [6, 7, 8]
B = [0, 2, 0, 0, 1]
C = [A[i] for i in B]
print(C)
This yields
[6, 8, 6, 6, 7]
For fetching multiple items operator.itemgetter comes in handy:
from operator import itemgetter
A = [6, 7, 8]
B = [0, 2, 0, 0, 1]
itemgetter(*B)(A)
# (6, 8, 6, 6, 7)
Also as you've mentioned numpy, this could be done directly by indexing the array as you've specified, i.e. A[B]:
import numpy as np
A = np.array([6, 7, 8])
B = np.array([0, 2, 0, 0, 1])
A[B]
# array([6, 8, 6, 6, 7])
Another option is to use np.take:
np.take(A,B)
# array([6, 8, 6, 6, 7])
This is one way, using numpy ndarrays:
import numpy as np
A = [6, 7, 8]
B = [0, 2, 0, 0, 1]
C = list(np.array(A)[B]) # No need to convert B into an ndarray
# list() is for converting ndarray back into a list,
# (if that's what you finally want)
print (C)
Explanation
Given a numpy ndarray (np.array(A)), we can index into it using an
array of integers (which happens to be exactly what your preferred
form of solution is): The array of integers that you use for
indexing into the ndarray, need not be another ndarray. It can even
be a list, and that suits us too, since B happens to a list. So,
what we have is:
np.array(A)[B]
The result of such an indexing would be another ndarray, having the
same shape (dimensions) as the array of indexes. So, in our case, as
we are indexing into an ndarray using a list of integer indexes, the
result of that indexing would be a one-dimensional ndarray of the
same length as the list of indexes.
Finally, if we want to convert the above result, from a
one-dimensional ndarray back into a list, we can pass it as an
argument to list():
list(np.array(A)[B])
You could do it with list comprehension:
>>> A = [6, 7, 8]
>>> B = [0, 2, 0, 0, 1]
>>> C = [A[x] for x in B]
>>> print(C)
[6, 8, 6, 6, 7]
I think you need a generator (list comprehension):
A = [1, 2, 3]
B = [0, 2, 0, 0, 1]
C = [A[i] for i in B]
Once you're using numpy.array you're able to do exactly what you want with syntax you expect:
>>> a = array([6, 7, 8])
>>> b = array([0, 2, 0, 0, 1])
>>> a[b]
array([6, 8, 6, 6, 7])

Search Numpy array with multiple values

I have numpy 2d array having duplicate values.
I am searching the array like this.
In [104]: import numpy as np
In [105]: array = np.array
In [106]: a = array([[1, 2, 3],
...: [1, 2, 3],
...: [2, 5, 6],
...: [3, 8, 9],
...: [4, 8, 9],
...: [4, 2, 3],
...: [5, 2, 3])
In [107]: num_list = [1, 4, 5]
In [108]: for i in num_list :
...: print(a[np.where(a[:,0] == num_list)])
...:
[[1 2 3]
[1 2 3]]
[[4 8 9]
[4 2 3]]
[[5 2 3]]
The input is list having number similar to column 0 values.
The end result I want is the resulting rows in any format like array, list or tuple for example
array([[1, 2, 3],
[1, 2, 3],
[4, 8, 9],
[4, 2, 3],
[5, 2, 3]])
My code works fine but doesn't seem pythonic. Is there any better searching strategy with multiple values?
like a[np.where(a[:,0] == l)] where only one time lookup is done to get all the values.
my real array is large
Approach #1 : Using np.in1d -
a[np.in1d(a[:,0], num_list)]
Approach #2 : Using np.searchsorted -
num_arr = np.sort(num_list) # Sort num_list and get as array
# Get indices of occurrences of first column in num_list
idx = np.searchsorted(num_arr, a[:,0])
# Take care of out of bounds cases
idx[idx==len(num_arr)] = 0
out = a[a[:,0] == num_arr[idx]]
You can do
a[numpy.in1d(a[:, 0], num_list), :]

How can I find the indices in a numpy array that meet multiple conditions?

I have an array in Python like so:
Example:
>>> scores = numpy.asarray([[8,5,6,2], [9,4,1,4], [2,5,3,8]])
>>> scores
array([[8, 5, 6, 2],
[9, 4, 1, 4],
[2, 5, 3, 8]])
I want to find all [row, col] indices in scores where the value is:
1) the minimum in its row
2) larger than a threshold
3) at most .8 times the next largest value in the row
I would like to do it as efficiently as possible, preferably without any loops. I've been struggling with this for a while, so any help you can provide would be greatly appreciated!
It should go something along the lines of
In [1]: scores = np.array([[8,5,6,2], [9,4,1,4], [2,5,3,8]]); threshold = 1.1; scores
Out[1]:
array([[8, 5, 6, 2],
[9, 4, 1, 4],
[2, 5, 3, 8]])
In [2]: part = np.partition(scores, 2, axis=1); part
Out[2]:
array([[2, 5, 6, 8],
[1, 4, 4, 9],
[2, 3, 5, 8]])
In [3]: row_mask = (part[:,0] > threshold) & (part[:,0] <= 0.8 * part[:,1]); row_mask
Out[3]: array([ True, False, True], dtype=bool)
In [4]: rows = row_mask.nonzero()[0]; rows
Out[4]: array([0, 2])
In [5]: cols = np.argmin(scores[row_mask], axis=1); cols
Out[5]: array([3, 0])
At that moment if you're looking for actual coordinate pairs, you can just zip them:
In [6]: coords = zip(rows, cols); coords
Out[6]: [(0, 3), (2, 0)]
Or if you're planning to look those elements up, you can use them as is:
In [7]: scores[rows, cols]
Out[7]: array([2, 2])
I think that you're going to have a hard time doing this with out any for loops (or at least something that performs such a loop but might be disguising it as something else), seeing as how the operation is only dependent on the row and you want to do it for each row. It's not the most efficient (and what is may depend on how frequently conditions 2 and 3 are true) but this will work:
import heapq
threshold = 1.5
ratio = .8
scores = numpy.asarray([[8,5,6,2], [9,4,1,4], [2,5,3,8]])
found_points = []
for i,row in enumerate(scores):
lowest,second_lowest = heapq.nsmallest(2,row)
if lowest > threshold and lowest <= ratio*second_lowest:
found_points.append([i,numpy.where(row == lowest)[0][0]])
You get (for the example):
found_points = [[0, 3], [2, 0]]

Categories

Resources