numpy - conditional change with closest elements - python

In a numpy array, I want to replace every occurrences of 4 where top and left of them is a 5.
so for instance :
0000300
0005000
0054000
0000045
0002050
Should become :
0000300
0005000
0058000
0000045
0002000
I'm sorry I can't share what I tried, that's a very specific question.
I've had a look at things like
map[map == 4] = 8
And np.where() but I have really no idea about how to check nearby elements of a specific value.

this might seem tricky, but an and between three shifted versions of the matrix will work, you simply need to shift the x==5 array to the bottom, and another version of it shifted to the right, the third matrix is the x==4.
first_array = np.zeros(x.shape,dtype=bool)
second_array = np.zeros(x.shape,dtype=bool)
equals_5 = x == 5
equals_4 = x == 4
first_array[1:] = equals_5[:-1] # shift down
second_array[:,1:] = equals_5[:,:-1] # shift right
third_array = equals_4 # put it as it is.
# and operation on the 3 arrays above
results = np.logical_and(np.logical_and(first_array,second_array),third_array)
x[results] = 8
now results will be the needed logical array.
and it's an O(n) algorithm, but it scales badly if the requested pattern is very complex, not that it's not doable.

Related

How to search for the position of specific XY pairs in a 2 dimensional numpy array?

I have an image stored as 3 Numpy arrays:
# Int arrays of coordinates
# Not continuous, some points are omitted
X_image = np.array([1,2,3,4,5,6,7,9])
Y_image = np.array([9,8,7,6,5,4,3,1])
# Float array of RGB values.
# Same index
rgb = np.array([
[0.5543,0.2665,0.5589],
[0.5544,0.1665,0.5589],
[0.2241,0.6645,0.5249],
[0.2242,0.6445,0.2239],
[0.2877,0.6425,0.5829],
[0.5543,0.3165,0.2839],
[0.3224,0.4635,0.5879],
[0.5534,0.6693,0.5889],
])
The RGB information is not convertible to int. So it has to stay floats
I have another array that defines the position of an area of some pixels in the image:
X_area = np.array([3,4,6])
Y_area = np.array([7,6,4])
I need to find the RGB information for these pixels, using the first 4 arrays as a reference.
My idea was to search for the index of these area points in the full image and then use this index to find back the RGB information.
index = search_for_index_of_array_1_in_array_2((X_area,Y_area),(X_image,Y_image))
# index shall be [3,4,6]
rgb_area = rgb[index]
The search_for_index_of_array_1_in_array_2 can be implemented with a for loop. I tried it, this is too slow. I actually have millions of points.
I know that it is probably more of a use case for Julia than Python, as we deal with low-level data manipulation with a performance need, but I'm obliged to use Python. So, the only performance trick I see is to use a vectorized solution with NumPy.
I'm not used to manipulating NumPy. I tried numpy.where.
index = np.where(X_area in X_image and Y_area in Y_image )
index
Gives :
<ipython-input-18-0e434ab7a291>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
index = np.where(X_area in X_image and Y_area in Y_image )
(array([], dtype=int64),)
It shall be empty as we have 3 compliant points.
I also tested, with the same result:
XY_image = np.vstack((X_image,Y_image))
XY_area = np.vstack((X_area,Y_area))
index = np.where(XY_area == XY_image)
and even:
np.extract(XY_image == XY_area, XY_image)
If I get it, the issue is that the arrays do not have the same length. But this is what I have.
Do you have an idea of how to proceed?
Thanks
Edit: here is a loop that works but... is not fast:
indexes = []
for i in range(XY_area.shape[1]):
XY_area_b = np.broadcast_to(XY_area[:,i],(9,2)).transpose()
where_in_image = np.where(XY_area_b == XY_image)
index_in_image = where_in_image[1][1]
indexes.append(index_in_image)
indexes
The classical method to solve this problem is generally to use a hashmap. However, Numpy do not provide such a data structure. That being said, an alternative (generally slower) solution is to sort the values and then perform a binary search. Hopefully, Numpy provide useful functions to do that. This solution run in O(n log(m)) (with n the number of value to search and m the number of value searched) should be much faster than a linear search running in O(n m) time. Here is an example:
# Format the inputs
valType = X_image.dtype
assert Y_image.dtype == valType and X_area.dtype == valType and X_image.dtype == valType
pointType = [('x', valType),('y', valType)]
XY_image = np.ravel(np.column_stack((X_image, Y_image))).view(pointType)
XY_area = np.ravel(np.column_stack((X_area, Y_area))).view(pointType)
# Build an index to sort XY_image and then generate the sorted points
sortingIndex = np.argsort(XY_image)
sorted_XY_image = XY_image[sortingIndex]
# Search each value of XY_area in XY_image and find the location in the unsorted array
tmp = np.searchsorted(XY_image, XY_area)
index = sortingIndex[tmp]
rgb_area = rgb[index]
Thanks to Jérôme's answer, I understand better the value of using a hashmap:
def hashmap(X,Y):
return X + 10000*Y
h_area = hashmap(X_area,Y_area)
h_image = hashmap(X_image,Y_image)
np.where(np.isin(h_image,h_area))
This hashmap is a bit brutal, but it actually returns the indexes:
(array([2, 3, 5], dtype=int64),)

Python - Adjusting a function to accept an array instead of a single value

EDIT:
I've made some progress on testing this out on a simple level, and now want to expand to a for loop. I've updated the question.
I have a function that take a three dimensional array and masks certain elements within the array based on specific conditions. See below:
#function for array masking
def masc(arr,z):
return(np.ma.masked_where((arr[:,:,2] <= z+0.05)*(arr[:,:,2] >= z-0.05), arr[:,:,2]))
arr is a 3D array and z is a single value.
I now want to iterate this for multiple Z values. Here is an example with 2 z values:
masked_array1_1 = masc(xyz,z1)
masked_array1_2 = masc(xyz,z2)
masked_1 = masked_array1_1.mask + masked_array1_2.mask
masked_array1 = np.ma.array(xyz[:,:,2],mask=masked_1)
The masked_array1 gives me exactly what i'm looking for.
I've started to write a forloop to iterate this over a 1D array of Z values:
mask_1 = xyz[:,:,2]
for i in range(Z_all_dim):
mask_1 += (masc(xyz,IWX_new[0],IWY_new[0],MWX[0],MWY[0],Z_all[i]).mask)
masked_array1 = np.ma.array(xyz[:,:,2], mask = mask_1)
Z_all is an array of 7 unique z values. This code does not work but i feel like i'm very close. Does anyone see if i'm doing something wrong?

Extracting the maximum element of an array by index based on conditional of other arrays

I believe that my problem is really straightforward and there must be a really easy way to solve this issue, however as I am quite new with Python, I could not sort it out by my own.
I made up the following examples, which naturally represents a way simpler scenario of what I have been working on, hence I am looking for a applicable general solution to other cases. So, please, consider:
import numpy as np
x = np.array([300,300,450,500,510,750,600,300])
x_validate1 = np.array([0,27,3,4,6,4,13,5])
x_validate2 = np.array([0,27,3,4,6,4,3,5])
x_validate3 = np.array([0,7,3,14,6,16,6,5])
x_validate4 = np.array([0,3,3,5,7,4,9,5])
What I need is to extract the maximum value in x whose same index in other arrays (x_validate1,2,3,4) represents elements between 5 and 10 (conditional), that means, if I wanted to pick up the maximum in the x array, it would logically be 750, however by applying this condition, what I want the script to return is 510, since for the other arrays, the condition is met.
Hope that I managed to be succinct and precise. I would really appreciate your help on this one!
Here's one approach:
# combine all the above arrays into one
a = np.array([x_validate1, x_validate2, x_validate3, x_validate4])
# check in which columns all rows satisfy the condition
m = ((a > 5) & (a < 10)).all(0)
# index x, and find the maximum value
x[m].max()
# 510

Why is the change of values in one array is affecting the value on another array (with exact same values) that isn't involved in the operation?

When I run the command "negative_only[negative_only>0]=0" (which should make positive values = 0 on the array "negative_only") the values on a similar array ("positive_only") are also changed. Why is this happening? I'm using Python 3.7 (Windows 10 / Spyder IDE).
The code where the two arrays are being manipulated is below. The "long_dollars_ch" is an array of ~2700 x 60 with some positive values, some negative values and a lot of zeros. This code is part of a loop that is cycling through each row of the array "long_dollars_ch".
# calculations to isolate top contributors to NAV change for audits
top_check = 3 # number of top values changes to track
# calculate dollar change (for longs), and create array with most positive/negative values
long_dollars_ch[c_day,:] = long_shares[c_day,:]*hist_prices_zeros[c_day,:]-long_shares[c_day,:]*hist_prices_zeros[c_day-1,:]
positive_only = long_dollars_ch[c_day,:]
positive_only[positive_only<0]=0 #makes non-positive values zero
idx = np.argsort(positive_only) #creat index representing sorted values for only_positive for c_day
non_top_vals = idx[:-top_check]
negative_only = long_dollars_ch[c_day,:]
negative_only[negative_only>0]=0 #makes non-negative values zero
idx = np.argsort(negative_only) #creat index representing sorted values for only_negative for c_day
non_bottom_vals = idx[:-top_check]
# create array that shows the most positive/negative dollar change for "top-check" securities
long_dollars_ch_pos[c_day,:] = positive_only
long_dollars_ch_pos[c_day,:][non_top_vals] *= 0
long_dollars_ch_neg[c_day,:] = negative_only
long_dollars_ch_neg[c_day,:][non_bottom_vals] *= 0
The objective of this code is to create two arrays. One that has only the top "top_check" positive values (if any) and another that has the bottom "top_check" negative values (if any) for each row of the original array "long_dollars_ch". However, it appears that Python is considering the "positive_only" and "negative_only" the same "variable". And therefore, the operation with one of them affects the values inside the other (that was not part of the operation).
its quite simple.
In numpy np.array x = np.array y you do not copy array :)
You make a reference to array x.
In other words, you do not have two array after using "=". You still have one array x, and a reference to that array (y is that reference).
positive_only = long_dollars_ch[c_day,:]
.
.
,
negative_only = long_dollars_ch[c_day,:]
do not make a copy of long_dollars_ch, but only makes references to it.
you need to use copy method, or other method (numpy provides few of them) to make it work.
Here is a documentation
EDIT: I posted wrong link, now it is ok.

Octave/Matlab version of this Python for-loop

I just want to know if there is any Octave/Matlab equivalent syntax for this particular for-loop in python:
for (i,j) in [(1,2),(2,3),(3,4),(4,5),(5,6),(6,7)]:
a[i,j] = 1
I need it to ease out my Image processing assignments where I can easily construct Image matrix without having to enter each pixel value for almost each element of the Image matrix. So, if there are any other ways of implementing the above functionality in Octave/Matlab, then please let me know.
Thanks.
In Octave ,I guess also in MATLAB, you can do:
for ij = [{1;2} {2;3} {3;4} {4;5} {5;6} {6;7}]
a(ij{:}) = 1;
end
But in general In MATLAB and Python it is better to prevent loops. There are much efficient indexing methods both in Python and MATLAB.
If you want to set a series of pixels in a, given by coordinates, to the same value, you can do as follows:
coord = [1,2; 2,3; 3,4; 4,5; 5,6; 6,7];
ind = sub2ind(size(a), coord(:,1), coord(: 2));
a(ind) = 1;
You can replace that last 1 with a vector with as many elements as coordinates in coord to assign a different value to each pixel.
Note that MATLAB indexes rows with the first index, so the first column of coord corresponds to the y coordinate.
The simplest here would be:
for i = 1 : 6
a(i, i+1) = 1; % Alternatively: j=i+1; a(i,j)=1;
end
The more flexible alternative is to construct the pairs:
vals = [1,2; … ; 6,7]; % Your i,j pairs. Possibly even put 3 numbers there, i,j,value.
for i = 1 : size(vals, 1)
a(vals(i,1), vals(i,2)) = 1;
end

Categories

Resources