Breaking down numpy code [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been meticulously reading documentation and rereading/running the below code in order to understand exactly what is occurring. There are still gaps in my knowledge though. I wanted to present the code to you, with comments, which signify the gaps in my knowledge that hopefully some of you are willing to fill.
So here are my request friends:
1) Help me fill in gaps in my knowledge
2) Explain what is going on here step by step in a non-technical and simple format.
import numpy
import scipy.misc
import matplotlib.pyplot
lena = scipy.misc.lena()
''' Generates an artificial range within the framework of the original array (Which is an image)
This artificial range will be paired with another one and used to 'climb'
Through the original array and make changes'''
def get_indices(size):
arr = numpy.arange(size)
#This sets every fourth element to False? How?
return arr % 4 == 0
lena1 = lena.copy()
xindices = get_indices(lena.shape[0])
yindices = get_indices(lena.shape[1])
'''I am unsure of HOW the below code is executing. I know something is being
Set to zero, but what? And how can I verify it?'''
lena[xindices, yindices] = 0
#What does the argument 211 do exactly?
matplotlib.pyplot.subplot(211)
matplotlib.pyplot.imshow(lena1)
matplotlib.pyplot.show()
Thanks mates!

Using the Python debugger is always useful to step through your code while it is executing. Write the following in any place you choose:
import pdb; pdb.set_trace()
Execution will be stopped and you can inspect any variable, use any defined functions, and advance line by line.
Here you have a commented version of your code. The comment on the function is transformed into a docstring with a doctest that could be executed.
import numpy
import scipy.misc
import matplotlib.pyplot
# Get classic image processing example image, Lena, at 8-bit grayscale
# bit-depth, 512 x 512 size.
lena = scipy.misc.lena()
# lena is now a Numpy array of integers, between 245 and 25, of 512 rows and
# 512 columns.
def get_indices(size):
"""
Returns each fourth index in a Numpy vector of the passed in size.
Specifically, return a vector of booleans, where all indices are set to
False except those of every fourth element. This vector can be used to
index another Numpy array and select *only* those elements. Example use:
>>> import numpy as np
>>> vector = np.array([0, 1, 2, 3, 4])
>>> get_indices(vector.size)
array([ True, False, False, False, True], ...)
"""
arr = numpy.arange(size)
return arr % 4 == 0
# Keep a copy of the original image
lena1 = lena.copy()
# Use the defined function to get every fourth index, first in the x direction,
# then in the y direction
xindices = get_indices(lena.shape[0])
yindices = get_indices(lena.shape[1])
# Set every pixel that equals true in the vectors further up to 0. This
# selects **each fourth pixel on the diagonal** (from up left to bottom right).
lena[xindices, yindices] = 0
# Create a Matplotlib plot, with 2 subplots, and selects the one on the 1st
# colum, 1st row. The layout for all subplots is determined from all calls to
# subplot, i.e. if you later call `subplot(212)` you will get a vertical layout
# in one column and two rows; but if you call `subplot(221)` you will get a
# horizontal layout in two columns and one row.
matplotlib.pyplot.subplot(211)
# Show the unaltered image on the first subplot
matplotlib.pyplot.imshow(lena1)
# You could plot the modified original image in the second subplot, and compare
# to the unmodified copy by issuing:
#matplotlib.pyplot.subplot(212)
#matplotlib.pyplot.imshow(lena)
matplotlib.pyplot.show()

Related

How to find region bounding box of an object in image [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have the image and its mask like this:
How can I use the mask to identify the rectangle bounding box around the object? So the final result should be this (with the background removed):
import urllib
from io import BytesIO
url_mask = "https://i.stack.imgur.com/rIyJ6.png"
f = urllib.request.urlopen(url_mask)
mask = Image.open(BytesIO(f.read()))
url_im = "https://i.stack.imgur.com/msm7L.jpg"
f2 = urllib.request.urlopen(url_im)
img = Image.open(BytesIO(f2.read()))
There's a fast way to do this (cv2.boundingRect()), but here's a way to think about how to do it from scratch.
Let's call your image im and the mask im, which I assume are NumPy arrays (or similar). Your goal is to find row indices [row_low, row_high] and column indices [col_low, col_high] such that the array im[row_low:row_high, col_low:col_high] is the sub-image that you're looking for.
If mask is an array of pixel values (probably 0's (black) and 255's (white)), start by converting it to a two-dimensional boolean array where an entry being True means you have a white pixel at that part of the mask (this isn't strictly necessary but it helps to see what's going on).
>>> mask.shape
(758, 734, 3) # The original mask, with RGB layers.
>>> mask2d = mask.mean(axis=2) # Get a single black-and-white mask.
>>> mask2d.shape
(758, 734)
>>> bmask = (mask2d == 255) # Or maybe (mask >= 200) to be safe.
Now for each row and column, you can use np.max() to determine if that row or column has True in it or not (meaning there is a white pixel in that row or column of the mask). You can do this for all of the columns at once by specifying axis in np.max(): axis=0 will check if there's a True in the column, and axis=1 will check if there's a True in the row.
>>> import numpy as np
>>> bmask.shape # Here's the boolean mask.
(758, 734) # It has 758 rows and 734 columns.
>>> rows_with_white = np.max(bmask, axis=1)
>>> cols_with_white = np.max(bmask, axis=0)
# Check shapes.
>>> rows_with_white.shape
(758,)
>>> cols_with_white.shape
(734,)
The location of the first and last True in rows_with_white give you row_low and row_high, respectively, and similarly for cols_with_white. We can get them both with np.argmax(), which finds the first location of the largest value (which, for boolean arrays, is True). To get the location of the last True, we can simply reverse the array and repeat the process. These are negative indices, which indicate that we're counting backward from the end.
>>> row_low = np.argmax(rows_with_white)
>>> row_high = -np.argmax(rows_with_white[::-1])
>>> col_low = np.argmax(cols_with_white)
>>> col_high = -np.argmax(cols_with_white[::-1])
>>> print((row_low, row_high), (col_low, col_high))
(85, -85) (174, -164)
Now that you have the indices, you can simply slice the original image to get the cropped one.
>>> im_cropped = im[row_low:row_high, col_low:col_high]
And here's the whole thing put together, which assumes you already have mask and im defined.
>>> import numpy as np
>>> bmask = (mask.mean(axis=2) == 255)
>>> rows_with_white = np.max(bmask, axis=1)
>>> cols_with_white = np.max(bmask, axis=0)
>>> row_low = np.argmax(rows_with_white)
>>> row_high = -np.argmax(rows_with_white[::-1])
>>> col_low = np.argmax(cols_with_white)
>>> col_high = -np.argmax(cols_with_white[::-1])
>>> im_cropped = im[row_low:row_high, col_low:col_high]
If you are not using cv2, then you could look trough all pixels and find xmin, xmax, ymin ymax that equal 1. Since black is usually represented as 0 and white as 1.

Replacing masked data with previous values of same dataset

I am working on filling in missing data in a large (4GB) netcdf datafile (3 dimensions: time, longitude and latitude). The method is to fill in the masked values in data1 either with:
1) previous values from data1 or
2) with data from another (also masked dataset, data2) if the found value from data1 < the found value from data2.
So fare I have tried a couple of things, one is to make a very complex script with long for loops which never finished running after 24 hours. I have tried to reduce it, but i think it is still very much to complicated. I believe there is a much more simple procedure to do it than the way I am doing it now I just can't see how.
I have made a script where masked data is first replaced with zeroes in order to use the function np.where to get the index of my masked data (i did not find a function that returns the coordinates of masked data, so this is my work arround it). My problem is that my code is very long and i think time consuming for large datasets to run through. I believe there is a more simple way of doing it, but I haven't found another work arround it.
Here is what I have so fare: : (the first part is just to generate some matrices that are easy to work with):
if __name__ == '__main__':
import numpy as np
import numpy.ma as ma
from sortdata_helpers import decision_tree
# Generating some (easy) test data to try the algorithm on:
# data1
rand1 = np.random.randint(10, size=(10, 10, 10))
rand1 = ma.masked_where(rand1 > 5, rand1)
rand1 = ma.filled(rand1, fill_value=0)
rand1[0,:,:] = 1
#data2
rand2 = np.random.randint(10, size=(10, 10, 10))
rand2[0, :, :] = 1
coordinates1 = np.asarray(np.where(rand1 == 0)) # gives the locations of where in the data there are zeros
filled_data = decision_tree(rand1, rand2, coordinates1)
print(filled_data)
The functions that I defined to be called in the main script are these, in the same order as they are used:
def decision_tree(data1, data2, coordinates):
# This is the main function,
# where the decision between data1 or data2 is chosen.
import numpy as np
from sortdata_helpers import generate_vector
from sortdata_helpers import find_value
for i in range(coordinates.shape[1]):
coordinate = [coordinates[0, i], coordinates[1,i], coordinates[2,i]]
AET_vec = generate_vector(data1, coordinate) # makes vector to go back in time
AET_value = find_value(AET_vec) # Takes the vector and find closest day with data
PET_vec = generate_vector(data2, coordinate)
PET_value = find_value(PET_vec)
if PET_value > AET_value:
data1[coordinate[0], coordinate[1], coordinate[2]] = AET_value
else:
data1[coordinate[0], coordinate[1], coordinate[2]] = PET_value
return(data1)
def generate_vector(data, coordinate):
# This one generates the vector to go back in time.
vector = data[0:coordinate[0], coordinate[1], coordinate[2]]
return(vector)
def find_value(vector):
# Here the fist value in vector that is not zero is chosen as "value"
from itertools import dropwhile
value = list(dropwhile(lambda x: x == 0, reversed(vector)))[0]
return(value)
Hope someone has a good idea or suggestions on how to improve my code. I am still struggling with understanding indexing in python, and I think this can definately be done in a more smooth way than I have done here.
Thanks for any suggestions or comments,

(Python)How to rotate an image so that a feature becomes vertical? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to rotate an image (such as the one below) in a way that one of its features (which resembles a line) becomes vertical. However, I can't seem to find a way to programmatically do it in Python.
Example_Image
The rotation itself can be done by the scipy.ndimage.interpolation.rotate operation.
The following first part is solving the issue for the example scenario in the original question (having one elongated data blob), see below for a more general (but slower) approach. Hope this helps!
First Approach: To find the axis and with that the angle of your line I suggest to use a PCA on the non-zero values:
from scipy.ndimage.interpolation import rotate
#from skimage.transform import rotate ## Alternatively
from sklearn.decomposition.pca import PCA ## Or use its numpy variant
import numpy as np
def verticalize_img(img):
"""
Method to rotate a greyscale image based on its principal axis.
:param img: Two dimensional array-like object, values > 0 being interpreted as containing to a line
:return rotated_img:
"""# Get the coordinates of the points of interest:
X = np.array(np.where(img > 0)).T
# Perform a PCA and compute the angle of the first principal axes
pca = PCA(n_components=2).fit(X)
angle = np.arctan2(*pca.components_[0])
# Rotate the image by the computed angle:
rotated_img = rotate(img,angle/pi*180-90)
return rotated_img
As usually this function could also be written as one-liner:
rotated_img = rotate(img,np.arctan2(*PCA(2).fit(np.array(np.where(img > 0)).T).components_[0])/pi*180-90)
And here is an example:
from matplotlib import pyplot as plt
# Example data:
img = np.array([[0,0,0,0,0,0,0],
[0,1,0,0,0,0,0],
[0,0,1,1,0,0,0],
[0,0,0,1,1,0,0],
[0,0,1,0,0,1,0],
[0,0,0,0,0,0,1]])
# Or alternatively a straight line:
img = np.diag(ones(15))
img = np.around(rotate(img,25))
# Or a distorted blob:
from sklearn import cluster, datasets
X, y = datasets.make_blobs(n_samples=100, centers = [[0,0]])
distortion = [[0.6, -0.6], [-0.4, 0.8]]
theta = np.radians(20)
rotation = np.array(((cos(theta),-sin(theta)), (sin(theta), cos(theta))))
X = np.dot(np.dot(X, distortion),rotation)
img = np.histogram2d(*X.T)[0] # > 0 ## uncomment for making the example binary
rotated_img = verticalize_img(img)
# Plot the results
plt.matshow(img)
plt.title('Original')
plt.matshow(rotated_img)
plt.title('Rotated'))
Note that for highly noisy data or images with no clear orientation this method will come up with arbitrary rotations.
And here is an example output:
Second Approach: Ok after clarification of the actual task in a more complicated setting (see comments) here a second approach based on template matching:
from matplotlib import pyplot as plt
import numpy as np
import pandas
from scipy.ndimage.interpolation import rotate
from scipy.signal import correlate2d#, fftconvolve
# Data from CSV file:
img = pandas.read_csv('/home/casibus/testdata.csv')
# Create a template:
template = np.zeros_like(img.values)
template[:,int(len(template[0])*1./2)] = 1
suggested_angles = np.arange(0,180,1) # Change to any resolution you like
overlaps = [np.amax(correlate2d(rotate(img,alpha,reshape=False),template,mode='same')) for alpha in suggested_angles]
# Determine the angle resulting in maximal overlap and rotate:
rotated_img = rotate(img.values,-suggested_angles[np.argmax(overlaps)])
plt.matshow(rotated_img)
plt.matshow(template)

Maximum intensity projection from image stack

I'm trying to recreate the function
max(array, [], 3)
From MatLab, which can take my 300x300px image stack of N images (I'm saying "Image" here because I'm processing images, really this is just a big double array), 300x300xN, and create a 300x300 array. What I think is happening in this function, if it were to operate inefficiently, is that it is parsing through each (x,y) point, then taking the maximum value from that point across the z-axis, then normalizing with maximum and minimum values of the entire array.
I've tried recreating this in python with
# Shape of dataset: (300, 300, 181)
# Type of dataset: <type 'numpy.ndarray'>
for x in range(numpy.size(self.dataset, 0)):
for y in range(numpy.size(self.dataset, 1)):
print "Point is", x, y
# more would go here to find the maximum (x,y) value over Z axis in self.dataset
A very simple X,Y iterator. -- but not only does my IDE crash after a few milliseconds of running this code, but also it feels gross and inefficient.
Is there something I'm missing? I'm new to Python, and therefore the answer here isn't clear to me. Is there an existing function that does this operation?
import numpy as np
import matplotlib.pyplot as plt
from skimage import io
path = "test.tif"
IM = io.imread(path)
IM_MAX= np.max(IM, axis=0)
plt.imshow(IM_MAX)

Bootstrapping function grinds to a halt, due to python pseudorandom generator?

I am working on a kind of bootstrapping procedure for visual fixation data, and would be helped by the insights of others on this issue I am having. I suspect that either I'm missing something related to the functioning of the random number generator (random.randrange), or it shows my currently novice understanding of numpy array iteration and slicing. Being a psychologist with only hobby-level programming experience, i would not be surprised if it turns out I'm doing this in a really backwards way.
When you want to perform statistical analysis on visual fixation data, you often need to take center-bias into account, which is the bias whereby observers tend to fixate more to the center of an image at first and more randomly in the image later. This bias causes a temporal correlation between fixations, and an ROC-analysis (Receiver Operator Characteristic) performed on such data needs a baseline based on a specific kind of bootstrap method.
In this case, the data resides in a numpy array named original. This array is of shape (22, 800, 15, 2), where the dimensions indicate [observer, image, fixation (x, y)]. So, 15 fixations per observer per image.
In the bootstrap, we generally want to replace each fixation with another fixation that occurs somewhere in the set of all other images and all observers, but at the same time (in this case: the same fixation index, index 2 of original).
I think this means that we have to do the following:
create a new array of the same dimensions as original. This array will be called shuffled.
check if current x or y in original == NaN. If so, do not change this fixation. Otherwise continue;
choose a random fixation from the subset of original that satisfies the following index: [all observers, all images except the current image, current fixation]. Make sure it does not contain NaN, otherwise pick another random fixation until it does not contain NaN;
Set shuffled to the random fixation at the current location in original.
I have a function that takes array original and does what is described above with the slight modification that when only one of the original x, y pair is NaN, it only sets that x or y in the random fixation to np.nan. When I iterate through the loops I saw good results. After iterating through +- 10 loops I was satisfied as all data looked perfect, after which I proceeded to remove the raw_input() breakpoints I had set and let the function process all of the data without interruption. When I did so, I noticed that the function slows down each loop and grinds to a halt when it reaches observer=0 image=48.
My code is as follows:
for obs_index, obs in enumerate(original):
for img_index, img in enumerate(obs):
print obs_index, img_index
for fix_index, fix in enumerate(img):
# do the following because sometimes only x or y in the original is NaN
rand_fix = (np.nan, np.nan)
while np.isnan(rand_fix[0]) or np.isnan(rand_fix[1]):
rand_obs = randrange(observers)
rand_img = img_index
while rand_img == img_index:
rand_img = randrange(images)
rand_fix = original[rand_obs, rand_img, fix_index]
# do the following because sometimes only x or y in the original is NaN
if np.isnan(fix[0]):
rand_fix[0] = np.nan
if np.isnan(fix[1]):
rand_fix[1] = np.nan
shuffled[obs_index, img_index, fix_index] = rand_fix
When this function finishes, shuffled should contain correctly shuffled fixation data for use in ROC-analysis.
SOLVED
I came up with the following code, that no longer slows down:
for obs_index, obs in enumerate(original):
for img_index, img in enumerate(obs):
for fix_index, fix in enumerate(img):
x = fix[0]
y = fix[1]
rand_x = np.nan
rand_y = np.nan
if not(np.isnan(x) or np.isnan(y)):
while np.isnan(rand_x) or np.isnan(rand_y):
rand_obs = randrange(observers)
rand_img = img_index
while rand_img == img_index:
rand_img = randrange(images)
rand_x = original[rand_obs, rand_img, fix_index, 0]
rand_y = original[rand_obs, rand_img, fix_index, 1]
shuffled[obs_index, img_index, fix_index, 0] = rand_x
shuffled[obs_index, img_index, fix_index, 1] = rand_y
I also fixed the way the new fixation was assigned to the location in shuffled, to follow numpy indexing properly.

Categories

Resources