So I have an image which I imported to python. The imread command basically gives me an array X,Y,Z where X and Y are the coordinates of the pixels and Z (which has four dimensions) gives me the RGB values at a given point (X,Y).
import matplotlib.image as img
import numpy as np
RawImg = img.imread('tek0000.bmp','RGB')
CrpImg = RawImg[14:208,12:256,:]
x_values = []
y_values = []
for row in CrpImg:
for cell in row:
print(np.nonzero)
if (cell == [136,136,0,255]).all:
My goal is to analyze the exact points in the array where the RGB configuration is [136,136,0,255]. These points are greenish-yellow. I want to add the X and Y values to lists or arrays so I can plot them.
In order to achieve this, I iterate over every point X and Y (row and column) of the array, and analyze the Z values. What I need is the coordinate (X,Y) of the cell in the for loop.
Basically, if the color in the point (X,Y) of the image is yellow, add that point (X,Y) to the list.
Surprisingly I cannot find pretty much anything online for what I think, is a relatively simple thing. I realize that I can interate using the following:
for i in range len(X axis) something like that, but I want to know if it is possible this way.
Not completely sure this is what you're looking for, but I think you want to get the index from inside the loop. The main ways to do this would be
loop using the index, e.g. for i in range(0,255): and then index into the array
iterate using enumerate, which returns an index as well as value in a collection
use the index method
I think the easiest option for you will be the index method.
for row in CrpImg:
for cell in row:
print(np.nonzero)
if (cell == [136,136,0,255]).all:
print(CrpImg.index(row), row.index(cell))
Note that this is going to give you the index inside your crop rather than the full image. You can either adjust (by adding 14 and 12), or you can iterate over the full image.
If you use enumerate from the standard library, you get access to a tuple containing a count and your values. The count starts at 0 by default
for row in CrpImg
becomes
for num, row in enumerate(CrpImg):
print(num)
Try using numpy.where:
indices = numpy.where(my_array == [136,136,0,255])
Related
I want to generate a number of random points in hexagon. To do so, i generate random points in square and then try to use conditions to drop not suitable pairs. I tried solutions like this:
import scipy.stats as sps
import numpy as np
size=100
kx = 1/np.sqrt(3)*sps.uniform.rvs(loc=-1,scale=2,size=size)
ky = 2/3*sps.uniform.rvs(loc=-1,scale=2,size=size)
pairs = [(i, j) for i in kx for j in ky]
def conditions(pair):
return (-1/np.sqrt(3)<pair[0]<1/np.sqrt(3)) & (-2/3<pair[1]<2/3)
mask = np.apply_along_axis(conditions, 1, pairs)
hex_pairs = np.extract(mask, pairs)
L=len(hex_pairs)
print(L)
In this example I try to construct a logical mask for future use of np.extract to extract needed values. I try to apply conditional function to all pairs from a list. But it seems that I understand something badly because if using this mask the output of this code is:
10000
That means that no pairs were dropped and all boolean numbers in mask were True. Can anyone suggest how to correct this solution or maybe to put it another way (with a set of randomly distributed points in hexagon as a result)?
The reason why none of your pairs gets eliminated is, that they are created such that the condition is fulfilled (all x-values are in [-1/sqrt(3), 1/sqrt(3)], similar for the y-values).
I think an intuitive and easy way to get their is to create a hexagonal polygon, generate uniformly distributed random numbers within a square that encloses this hexagon and then apply the respective method from one of the already existing polygon-libraries, such as shapely. See e.g. https://stackoverflow.com/a/36400130/7084566
EDIT:
I've made some progress on testing this out on a simple level, and now want to expand to a for loop. I've updated the question.
I have a function that take a three dimensional array and masks certain elements within the array based on specific conditions. See below:
#function for array masking
def masc(arr,z):
return(np.ma.masked_where((arr[:,:,2] <= z+0.05)*(arr[:,:,2] >= z-0.05), arr[:,:,2]))
arr is a 3D array and z is a single value.
I now want to iterate this for multiple Z values. Here is an example with 2 z values:
masked_array1_1 = masc(xyz,z1)
masked_array1_2 = masc(xyz,z2)
masked_1 = masked_array1_1.mask + masked_array1_2.mask
masked_array1 = np.ma.array(xyz[:,:,2],mask=masked_1)
The masked_array1 gives me exactly what i'm looking for.
I've started to write a forloop to iterate this over a 1D array of Z values:
mask_1 = xyz[:,:,2]
for i in range(Z_all_dim):
mask_1 += (masc(xyz,IWX_new[0],IWY_new[0],MWX[0],MWY[0],Z_all[i]).mask)
masked_array1 = np.ma.array(xyz[:,:,2], mask = mask_1)
Z_all is an array of 7 unique z values. This code does not work but i feel like i'm very close. Does anyone see if i'm doing something wrong?
When I run the command "negative_only[negative_only>0]=0" (which should make positive values = 0 on the array "negative_only") the values on a similar array ("positive_only") are also changed. Why is this happening? I'm using Python 3.7 (Windows 10 / Spyder IDE).
The code where the two arrays are being manipulated is below. The "long_dollars_ch" is an array of ~2700 x 60 with some positive values, some negative values and a lot of zeros. This code is part of a loop that is cycling through each row of the array "long_dollars_ch".
# calculations to isolate top contributors to NAV change for audits
top_check = 3 # number of top values changes to track
# calculate dollar change (for longs), and create array with most positive/negative values
long_dollars_ch[c_day,:] = long_shares[c_day,:]*hist_prices_zeros[c_day,:]-long_shares[c_day,:]*hist_prices_zeros[c_day-1,:]
positive_only = long_dollars_ch[c_day,:]
positive_only[positive_only<0]=0 #makes non-positive values zero
idx = np.argsort(positive_only) #creat index representing sorted values for only_positive for c_day
non_top_vals = idx[:-top_check]
negative_only = long_dollars_ch[c_day,:]
negative_only[negative_only>0]=0 #makes non-negative values zero
idx = np.argsort(negative_only) #creat index representing sorted values for only_negative for c_day
non_bottom_vals = idx[:-top_check]
# create array that shows the most positive/negative dollar change for "top-check" securities
long_dollars_ch_pos[c_day,:] = positive_only
long_dollars_ch_pos[c_day,:][non_top_vals] *= 0
long_dollars_ch_neg[c_day,:] = negative_only
long_dollars_ch_neg[c_day,:][non_bottom_vals] *= 0
The objective of this code is to create two arrays. One that has only the top "top_check" positive values (if any) and another that has the bottom "top_check" negative values (if any) for each row of the original array "long_dollars_ch". However, it appears that Python is considering the "positive_only" and "negative_only" the same "variable". And therefore, the operation with one of them affects the values inside the other (that was not part of the operation).
its quite simple.
In numpy np.array x = np.array y you do not copy array :)
You make a reference to array x.
In other words, you do not have two array after using "=". You still have one array x, and a reference to that array (y is that reference).
positive_only = long_dollars_ch[c_day,:]
.
.
,
negative_only = long_dollars_ch[c_day,:]
do not make a copy of long_dollars_ch, but only makes references to it.
you need to use copy method, or other method (numpy provides few of them) to make it work.
Here is a documentation
EDIT: I posted wrong link, now it is ok.
I'm starting to use numpy. I get the slice notations and element-wise computations, but I can't understand this:
for i, (I,J) in enumerate(zip(data_list[0], data_list[1])):
joint_hist[int(np.floor(I/self.bin_size))][int(np.floor(J/self.bin_size))] += 1
Variables:
data_list contains two np.array().flatten() images (eventually more)
joint_hist[] is the joint histogram of those two images, it's displayed later with plt.imshow()
bin_size is the number of slots in the histogram
I can't understand why the coordinate in the final histogram is I,J. So it's not just that the value at a position in joint_hist[] is the result of some slicing/element-wise computation. I need to take the result of that computation and use THAT as the indices in joint_hist...
EDIT:
I indeed do not use the i in the loop actually - it's a leftover from previous iterations and I simply hadn't noticed I didn't need it anymore
I do want to remain in control of the bin sizes & the details of how this is done, so not particularly looking to use histogramm2D. I will later be using that for further image processing, so I'd rather have the flexibility to adapt my approach than have to figure out if/how to do particular things with built-in functions.
You can indeed gussy up that for loop using some numpy notation. Assuming you don't actually need i (since it isn't used anywhere):
for I,J in (data_list.T // self.bin_size).astype(int):
joint_hist[I, J] += 1
Explanation
data_list.T flips data_list on its side. Each row of data_list.T will contain the data for the pixels at a particular coordinate.
data_list.T // self.bin_size will produce the same result as np.floor(I/self.bin_size), only it will operate on all of the pixels at once, instead of one at a time.
.astype(int) does the same thing as int(...), but again operates on the entire array instead of a single element.
When you iterate over a 2D array with a for loop, the rows are returned one at a time. Thus, the for I,J in arr syntax will give you back one pair of pixels at a time, just like your zip statement did originally.
Alternative
You could also just use histogramdd to calculate joint_hist, in place of your for loop. For your application it would look like:
import numpy as np
joint_hist,edges = np.histogramdd(data_list.T)
This would have different bins than the ones you specified above, though (numpy would determine them automatically).
If I understand, your goal is to make an histogram or correlated values in your images? Well, to achieve the right bin index, the computation that you used is not valid. Instead of np.floor(I/self.bin_size), use np.floor(I/(I_max/bin_size)).astype(int). You want to divide I and J by their respective resolution. The result that you will get is a diagonal matrix for joint_hist if both data_list[0] and data_list[1] are the same flattened image.
So all put together:
I_max = data_list[0].max()+1
J_max = data_list[1].max()+1
joint_hist = np.zeros((I_max, J_max))
bin_size = 256
for i, (I, J) in enumerate(zip(data_list[0], data_list[1])):
joint_hist[np.floor(I / (I_max / bin_size)).astype(int), np.floor(J / (J_max / bin_size)).astype(int)] += 1
So I have image data which I am iterating through in order to find the pixel which have useful data in them, I then need to find these coordinates subject to a conditional statement and then put these into an array or DataFrame. The code I have so far is:
pix_coor = np.empty((0,2))
for (x,y), value in np.ndenumerate(data_int):
if value >= sigma3:
pix_coor.append([x,y])
where data is just an image array (129,129). All the pixels that have a value larger than sigma3 are useful and the other ones I dont need.
Creating an empty array works fine but when I append this it doesn't seem to work, I need to end up with an array which has two columns of x and y values for the useful pixels. Any ideas?
You could simply use np.argwhere for a vectorized solution -
pix_coor = np.argwhere(data_int >= sigma3)
In numpy, array.append is not an inplace operation, instead it copies the entire array into newly allocated memory (big enough to hold it along with the new values), and returns the new array. Therefore it should be used as such:
new_arr = arr.append(values)
Obviously, this is not an efficient way to add elements one by one.
You should use probably a regular python list for this.
Alternatively, pre allocate the numpy array with all values and then resize it:
pix_coor = np.empty((data_int.size, 2), int)
c = 0
for (x, y), value in np.ndenumerate(data_int):
if value >= sigma3:
pix_coor[c] = (x, y)
c += 1
numpy.resize(pix_coor, (c, 2))
Note that I used np.empty((data_int.size, 2), int), since your coordinates are integral, while numpy defaults to floats.