I'm trying to merge two 2D NumPy arrays with a specific condition. Let's say we have:
A=[[100.121,200.129,1,2,3],
[105.343,203.347,2,2,1],
[107.426,201.657,1,3,1],
[100.121,300.010,1,1,1]]
and
B=[[107.426,201,675,80],
[100.121,200.129,70],
[100.121,300.010,90]]
I want to obtain:
C=[[100.121,200.129,1,2,3,70],
[105.343,203.347,2,2,1,0],
[107.426,201.657,1,3,1,80],
[100.121,300.010,1,1,1,90]]
So, when the values of the first and the second column match, take the third column in B and append it to A.
How can I do this?
Thanks.
You can try this
A=[[100,200,1,2,3],
[105,203,2,2,1],
[107,201,1,3,1]]
B=[[107,201,80],
[100,200,70],
[105,203,50]]
temp_B = list(zip(*B))
d_B = dict(zip(temp_B[0], temp_B[2]))
A = [i + [d_B.get(i[0],0)] for i in A]
# [[100, 200, 1, 2, 3, 70], [105, 203, 2, 2, 1, 50], [107, 201, 1, 3, 1, 80]]
Related
I have a 2d numpy array called arm_resets that has positive integers. The first column has all positive integers < 360. For all columns other than the first, I need to replace all values over 360 with the value that is in the same row in the 1st column. I thought this would be a relatively easy thing to do, here's what I have:
i = 300
over_360 = arm_resets[:, [i]] >= 360
print(arm_resets[:, [i]][over_360])
print(arm_resets[:, [0]][over_360])
arm_resets[:, [i]][over_360] = arm_resets[:, [0]][over_360]
print(arm_resets[:, [i]][over_360])
And here's what prints:
[3600 3609 3608 ... 3600 3611 3605]
[ 0 9 8 ... 0 11 5]
[3600 3609 3608 ... 3600 3611 3605]
Since all numbers that are being shown in the first print (first 3 and last 3) are above 360, they should be getting replaced by the 2nd print in the 3rd print. Why is this not working?
edit: reproducible example:
df = pd.DataFrame({"start":[1,2,5,6],"freq":[1,5,6,9]})
periods = 6
arm_resets = df[["start"]].values
freq = df[["freq"]].values
arm_resets = np.pad(arm_resets,((0,0),(0,periods-1)))
for i in range(1,periods):
arm_resets[:,[i]] = arm_resets[:,[i-1]] + freq
#over_360 = arm_resets[:,[i]] >= periods
#arm_resets[:,[i]][over_360] = arm_resets[:,[0]][over_360]
arm_resets
Given commented out code here's what prints:
array([[ 1, 2, 3, 4, 5, 6],
[ 2, 7, 12, 17, 22, 27],
[ 3, 9, 15, 21, 27, 33],
[ 4, 13, 22, 31, 40, 49]])
What I would expect:
array([[ 1, 2, 3, 4, 5, 1],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3],
[ 4, 4, 4, 4, 4, 4]])
Now if it helps, the final 2d array I'm actually trying to create is a 1/0 array that indicates which are filled in, so in this example I'd want this:
array([[ 0, 1, 1, 1, 1, 1],
[ 0, 0, 1, 0, 0, 0],
[ 0, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 1, 0]])
The code I use to achieve this from the above arm_resets is this:
fin = np.zeros((len(arm_resets),periods),dtype=int)
for i in range(len(arm_resets)):
fin[i,a[i]] = 1
The slice arm_resets[:, [i]] is a fancy index, and therefore makes a copy of the ith column of the data. arm_resets[:, [i]][over_360] = ... therefore calls __setitem__ on a temporary array that is discarded as soon as the statement executes. If you want to assign to the mask, call __setitem__ on the sliced object directly:
arm_resets[over_360, [i]] = ...
You also don't need to make the index into a list. It's generally better to use simple indices, especially when doing assignments, since they create views rather than copies:
arm_resets[over_360, i] = ...
With slicing, even the following should work, since it calls __setitem__ on a view:
arm_resets[:, i][over_360] = ...
This index does not help you process each row of the data, since i is a column. In fact, you can process the entire matrix in one step, without looping, if you use indices rather than a boolean mask. The reason that indices are useful is that you can match the item from the correct row in the first column:
rows, cols = np.nonzero(arm_resets[:, 1:] >= 360)
arm_resets[rows, cols] = arm_resets[rows, 1]
You can use np.where()
first_col = arm_resets[:,0] # first col
first_col = first_col.reshape(first_col.size,1) #Transfor in 2d array
arm_resets = np.where(arm_resets >= 360,first_col,arm_resets)
You can see in detail how np.where work here, but basically it compare arm_resets >= 360, if true it put first_col value in place (there another detail here with broadcasting) if false it put arm_resets value.
Edit: As suggested by Mad Physicist. You can use arm_resets[:,0,None] directly instead of creating first_col variable.
arm_resets = np.where(arm_resets >= 360,arm_resets[:,0,None],arm_resets)
I have two two dimensional arrays a and b (#columns of a <= #columns in b). I would like to find an efficient way of matching a row in array a to a contiguous part of a row in array b.
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24],
[ 28, 900]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
The output should be np.array([[0,0], [0,1], [1,0], [2,2], [3,1]]). Row 0 in array a matches Row 0 in array b (first two positions). Row 1 in array a matches row 0 in array b (third and fourth positions).
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, and then compare those patches against each row off a, all of it in a vectorized manner. Then, get the matching indices with np.argwhere -
# a and b from posted question
In [325]: from skimage.util.shape import view_as_windows
In [428]: w = view_as_windows(b,(1,a.shape[1]))
In [429]: np.argwhere((w == a).all(-1).any(-2))[:,::-1]
Out[429]:
array([[0, 0],
[1, 0],
[0, 1],
[3, 1],
[2, 2]])
Alternatively, we could get the indices by the order of rows in a by pushing forward the first axis of a while performing broadcasted comparisons -
In [444]: np.argwhere((w[:,:,0] == a[:,None,None,:]).all(-1).any(-1))
Out[444]:
array([[0, 0],
[0, 1],
[1, 0],
[2, 2],
[3, 1]])
Another way I can think of is to loop over each row in a and perform a 2D correlation between the b which you can consider as a 2D signal a row in a.
We would find the results which are equal to the sum of squares of all values in a. If we subtract our correlation result with this sum of squares, we would find matches with a zero result. Any rows that give you a 0 result would mean that the subarray was found in that row. If you are using floating-point numbers for example, you may want to compare with some small threshold that is just above 0.
If you can use SciPy, the scipy.signal.correlate2d method is what I had in mind.
import numpy as np
from scipy.signal import correlate2d
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
EPS = 1e-8
result = []
for (i, row) in enumerate(a):
out = correlate2d(b, row[None,:], mode='valid') - np.square(row).sum()
locs = np.where(np.abs(out) <= EPS)[0]
unique_rows = np.unique(locs)
for res in unique_rows:
result.append((i, res))
We get:
In [32]: result
Out[32]: [(0, 0), (0, 1), (1, 0), (2, 2)]
The time complexity of this could be better, especially since we're looping over each row of a to find any subarrays in b.
import numpy as np
m = []
k = []
a = np.array([[1,2,3,4,5,6],[50,51,52,40,20,30],[60,71,82,90,45,35]])
for i in range(len(a)):
m.append(a[i, -1:])
for j in range(len(a[i])-1):
n = abs(m[i] - a[i,j])
k.append(n)
k.append(m[i])
print(k)
Expected Output in k:
[5,4,3,2,1,6],[20,21,22,10,10,30],[25,36,47,55,10,35]
which is also a numpy array.
But the output that I am getting is
[array([5]), array([4]), array([3]), array([2]), array([1]), array([6]), array([20]), array([21]), array([22]), array([10]), array([10]), array([30]), array([25]), array([36]), array([47]), array([55]), array([10]), array([35])]
How can I solve this situation?
You want to subtract the last column of each sub array from themselves. Why don't you use a vectorized approach? You can do all the subtractions at once by subtracting the last column from the rest of the items and then column_stack together with unchanged version of the last column. Also note that you need to change the dimension of the last column inorder to be subtractable from the 2D array. For that sake we can use broadcasting.
In [71]: np.column_stack((abs(a[:, :-1] - a[:, None, -1]), a[:,-1]))
Out[71]:
array([[ 5, 4, 3, 2, 1, 6],
[20, 21, 22, 10, 10, 30],
[25, 36, 47, 55, 10, 35]])
I have a numpy array with shape (3, 600219), which is a list of indices.
i.e.
array([[ 0, 0, 0, ..., 2879, 2879, 2879],
[ 40, 40, 40, ..., 162, 165, 168],
[ 249, 250, 251, ..., 195, 196, 198]])
The first row are time indices, the second and third rows are indices of the coordinates. I am trying to figure out which pair of coordinates most frequently occurred, disregarding the time.
e.g. Was it (49,249) or (40,250)...etc.?
I just used a small sample of your data, but I think you'll get the point:
import numpy as np
array = np.array([[ 0, 0, 0, 2879, 2879, 2879],
[ 40, 40, 40, 162, 165, 168],
[ 249, 250, 251, 195, 196, 198]])
# Zip together only the second and third rows
only_coords = zip(array[1,:], array[2,:])
from collections import Counter
Counter(only_coords).most_common()
Produces:
Out[11]:
[((40, 249), 1),
((165, 196), 1),
((162, 195), 1),
((168, 198), 1),
((40, 251), 1),
((40, 250), 1)]
Here's one vectorized approach -
IDs = a[1].max()+1 + a[2]
unq, idx, count = np.unique(IDs, return_index=1,return_counts=1)
out = a[1:,idx[count.argmax()]]
If there could be negative coordinates, use a[1].max()-a[1].min()+1 + a[2] to compute IDs.
Sample run -
In [44]: a
Out[44]:
array([[8, 3, 6, 6, 8, 5, 1, 6, 6, 5],
[5, 2, 1, 1, 5, 1, 5, 1, 1, 4],
[8, 2, 3, 3, 8, 1, 7, 3, 3, 3]])
In [47]: IDs = a[1].max()+1 + a[2]
In [48]: unq, idx, count = np.unique(IDs, return_index=1,return_counts=1)
In [49]: a[1:,idx[count.argmax()]]
Out[49]: array([1, 3])
This might seem a little abstract, but you could try saving each co-ordinate as a number, e.g. [2,1] = 2.1. And put your data into a list of these co-ordinates. For example, a 2nd row of [1,1,2] and 3rd row of [2,2,1] would be [1.2, 1.2, 2.1] You could then use the code:
from collections import Counter
list1=[1.2,1.2,2.1]
data = Counter(list1)
print (data.most_common(1)) # Returns the highest occurring item
which prints the most common number, and how many times it occurs, then you can simply convert the number back to a co-ordinate if you need to use it in your code.
Here is a sample code that does the count:
import numpy as np
import collections
a = np.array([[0, 1, 2, 3], [10, 10, 30 ,40], [25, 25, 10, 50]])
# You don't care about time
b = np.transpose(a[1:])
# convert list items to tuples
c = map(lambda v:tuple(v), b)
collections.Counter(c)
The output:
Counter({(10, 25): 2, (30, 10): 1, (40, 50): 1})
I want to create a bumpy array from two different bumpy arrays. For example:
Say I have 2 arrays a and b.
a = np.array([1,3,4])
b = np.array([[1,5,51,52],[2,6,61,62],[3,7,71,72],[4,8,81,82],[5,9,91,92]])
I want it to loop through each indices in array a and find it in array b and then save the row of b into c. Like below:
c = np.array([[1,5,51,52],
[3,7,71,72],
[4,8,81,82]])
I have tried doing:
c=np.zeros(shape=(len(b),4))
for i in b:
c[i]=a[b[i][:]]
but get this error "arrays used as indices must be of integer (or boolean) type"
Approach #1
If a is sorted, we can use np.searchsorted, like so -
idx = np.searchsorted(a,b[:,0])
idx[idx==a.size] = 0
out = b[a[idx] == b[:,0]]
Sample run -
In [160]: a
Out[160]: array([1, 3, 4])
In [161]: b
Out[161]:
array([[ 1, 5, 51, 52],
[ 2, 6, 61, 62],
[ 3, 7, 71, 72],
[ 4, 8, 81, 82],
[ 5, 9, 91, 92]])
In [162]: out
Out[162]:
array([[ 1, 5, 51, 52],
[ 3, 7, 71, 72],
[ 4, 8, 81, 82]])
If a is not sorted, we need to use sorter argument with searchsorted.
Approach #2
We can also use np.in1d -
b[np.in1d(b[:,0],a)]