Order of numpy (logical_and vs '&') statements leads to different results

Order of numpy (logical_and vs '&') statements leads to different results - python

I have a simple piece of code where I am trying to compare the numpy function logical_and vs the "&" operator.
I encounter a very strange behavior where the order in the which the statements are executed, seem to have an effect on the final result when in fact they shouldn't. Weird!
In the below code , if I interchange the final_mask1 and final_mask statement order, it leads to a different value of the variable "test" as well as a different image as an output. This is for the case where I have final_mask as output. Am I missing something here? How can I resolve this?
TIA
import numpy as np
from scipy import misc
import matplotlib.pyplot as plt
photo_data = misc.imread('./sd-3layers.jpg')
red_mask = photo_data[:, : ,0] < 150
green_mask = photo_data[:, : ,1] > 100
blue_mask = photo_data[:, : ,2] < 100
final_mask1 = np.logical_and(red_mask, green_mask, blue_mask)
final_mask = red_mask & green_mask & blue_mask
test = (final_mask1 == final_mask)
print(np.all(test))
photo_data[final_mask] = 0
plt.figure(figsize=(15,15))
plt.imshow(photo_data)

Looking up the documentation of logical_and, one finds that it compares only two arrays, and that the third argument is used for storing results in a different array. You can use reduce to avoid having to write to calls to logical_and, so what you're trying to do ends up looking like
np.logical_and.reduce((red_mask, green_mask, blue_mask))

Related

Replacing masked data with previous values of same dataset

I am working on filling in missing data in a large (4GB) netcdf datafile (3 dimensions: time, longitude and latitude). The method is to fill in the masked values in data1 either with:
1) previous values from data1 or
2) with data from another (also masked dataset, data2) if the found value from data1 < the found value from data2.
So fare I have tried a couple of things, one is to make a very complex script with long for loops which never finished running after 24 hours. I have tried to reduce it, but i think it is still very much to complicated. I believe there is a much more simple procedure to do it than the way I am doing it now I just can't see how.
I have made a script where masked data is first replaced with zeroes in order to use the function np.where to get the index of my masked data (i did not find a function that returns the coordinates of masked data, so this is my work arround it). My problem is that my code is very long and i think time consuming for large datasets to run through. I believe there is a more simple way of doing it, but I haven't found another work arround it.
Here is what I have so fare: : (the first part is just to generate some matrices that are easy to work with):
if __name__ == '__main__':
import numpy as np
import numpy.ma as ma
from sortdata_helpers import decision_tree
# Generating some (easy) test data to try the algorithm on:
# data1
rand1 = np.random.randint(10, size=(10, 10, 10))
rand1 = ma.masked_where(rand1 > 5, rand1)
rand1 = ma.filled(rand1, fill_value=0)
rand1[0,:,:] = 1
#data2
rand2 = np.random.randint(10, size=(10, 10, 10))
rand2[0, :, :] = 1
coordinates1 = np.asarray(np.where(rand1 == 0)) # gives the locations of where in the data there are zeros
filled_data = decision_tree(rand1, rand2, coordinates1)
print(filled_data)
The functions that I defined to be called in the main script are these, in the same order as they are used:
def decision_tree(data1, data2, coordinates):
# This is the main function,
# where the decision between data1 or data2 is chosen.
import numpy as np
from sortdata_helpers import generate_vector
from sortdata_helpers import find_value
for i in range(coordinates.shape[1]):
coordinate = [coordinates[0, i], coordinates[1,i], coordinates[2,i]]
AET_vec = generate_vector(data1, coordinate) # makes vector to go back in time
AET_value = find_value(AET_vec) # Takes the vector and find closest day with data
PET_vec = generate_vector(data2, coordinate)
PET_value = find_value(PET_vec)
if PET_value > AET_value:
data1[coordinate[0], coordinate[1], coordinate[2]] = AET_value
else:
data1[coordinate[0], coordinate[1], coordinate[2]] = PET_value
return(data1)
def generate_vector(data, coordinate):
# This one generates the vector to go back in time.
vector = data[0:coordinate[0], coordinate[1], coordinate[2]]
return(vector)
def find_value(vector):
# Here the fist value in vector that is not zero is chosen as "value"
from itertools import dropwhile
value = list(dropwhile(lambda x: x == 0, reversed(vector)))[0]
return(value)
Hope someone has a good idea or suggestions on how to improve my code. I am still struggling with understanding indexing in python, and I think this can definately be done in a more smooth way than I have done here.
Thanks for any suggestions or comments,

Code of plotting a function in an interval (graph result)

I need your help with coding a graph result - plotting a function in an interval.
The question which I got is:
"Plot the following composite function. You probably want to use 'if' statements and a loop to 'build' it. Plot the function in the interval from [-3, 5].
enter code here
f(x) = {|x| x<0}
{-1 0 <= x < 1}
{+1 1 <= x < 2}
{ln(x) 2 <= x}
Can anyone write for me please, a code in which the result shows me a GRAPH, in which the above function is shown, without consistancy in the graph's line.
Thank you very much in advance!

Using if statement would be a more involved way. You can directly make use of NumPy indexing and masking to get the task done. Below is how I would do it.
Explanation: First you create a mesh of x-data points in the interval (3, 5). Then you initialize an empty y-array of same length. Next, you use the conditions on x to get the indices of x-array. This is done by using mask. mask1 = ((x>=0) & (x<1)) defines a condition and then you use y[mask1] = -1 which means, [mask1] would return the array indices where the condition holds True and then you use those indices to assign the y-value. You do this for all 4 conditions. I just used two masks for the middle two conditions. You can also use 4 variables (masks) to do the same thing. It's a matter of personal taste.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-3, 5, 100)
y = np.zeros(len(x))
mask1 = ((x>=0) & (x<1))
mask2 = ((x>=1) & (x<2))
y[x<0] = np.abs(x[x<0])
y[mask1] = -1
y[mask2] = 1
y[x>=2] = np.log(x[x>=2])
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel(r'$f(x)$')
plt.show()

Usually, simple composite functions can easily be written like any other function by multiplying by the respective condition(s). The only place one needs to be careful is with the logarithm, which is not defined over the complete inverval. This problem is circumvented by taking the absolute value here, because it's anyways only relevant in the range > 2.
import numpy as np
import matplotlib.pyplot as plt
f = lambda x: np.abs(x)*(x<0) - ((0<=x) & (x < 1)) + ((1<=x) & (x < 2)) + np.log(np.abs(x))*(2<=x)
x = np.linspace(-3,5,200)
plt.plot(x,f(x))
plt.show()
According to a comment below the answer, one can also evaluate the function in each of the intervals separately,
intervals = [(-3, -1e-6), (0,1-1e-6), (1, 2-1e-6), (2,5)]
for (s,e) in intervals:
x = np.linspace(s,e,100)
plt.plot(x,f(x), color="C0")

Thank you very much for your help, It is really useful :)
In addition, I would like to know how can I eliminate the lines that connecting each step of the interval to the next one?
I need to show only 4 seperate graphic results on the graph, in each step, without the "continuity" of the lines that connect between them.

Eigen vectors in python giving seemingly random element-wise signs

I'm running the following code:
import numpy as np
import matplotlib
matplotlib.use("TkAgg")
import matplotlib.pyplot as plt
N = 100
t = 1
a1 = np.full((N-1,), -t)
a2 = np.full((N,), 2*t)
Hamiltonian = np.diag(a1, -1) + np.diag(a2) + np.diag(a1, 1)
eval, evec = np.linalg.eig(Hamiltonian)
idx = eval.argsort()[::-1]
eval, evec = eval[idx], evec[:,idx]
wave2 = evec[2] / np.sum(abs(evec[2]))
prob2 = evec[2]**2 / np.sum(evec[2]**2)
_ = plt.plot(wave2)
_ = plt.plot(prob2)
plt.show()
And the plot that comes out is this:
But I'd expect the blue line to be a sinoid as well. This has got me confused and I can't find what's causing the sudden sign changes. Plotting the function absolutely shows that the values associated with each x are fine, but the signs are screwed up.
Any ideas on what might cause this or how to solve it?

Here's a modified version of your script that does what you expected. The changes are:
Corrected the indexing for the eigenvectors; they are the columns of evec.
Use np.linalg.eigh instead of np.linalg.eig. This isn't strictly necessary, but you might as well use the more efficient code.
Don't reverse the order of the sorted eigenvalues. I keep the eigenvalues sorted from lowest to highest. Because eigh returns the eigenvalues in ascending order, I just commented out the code that sorts the eigenvalues.
(Only the first change is a required correction.)
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
N = 100
t = 1
a1 = np.full((N-1,), -t)
a2 = np.full((N,), 2*t)
Hamiltonian = np.diag(a1, -1) + np.diag(a2) + np.diag(a1, 1)
eval, evec = np.linalg.eigh(Hamiltonian)
#idx = eval.argsort()[::-1]
#eval, evec = eval[idx], evec[:,idx]
k = 2
wave2 = evec[:, k] / np.sum(abs(evec[:, k]))
prob2 = evec[:, k]**2 / np.sum(evec[:, k]**2)
_ = plt.plot(wave2)
_ = plt.plot(prob2)
plt.show()
The plot:

I may be wrong, but aren't they all valid eigen vectors/values? The sign shouldn't matter, as the definition of an eigen vector is:
In linear algebra, an eigenvector or characteristic vector of a linear transformation is a non-zero vector that only changes by an overall scale when that linear transformation is applied to it.
Just because the scale is negative doesn't mean it isn't valid.
See this post about Matlab's eig that has a similar problem
One way to fix this is to simply pick a sign for the start, and multiply everthing by -1 that doesn't fit that sign (or take abs of every element and multiply by your expected sign). For your results this should work (nothing crosses 0).
Neither matlab nor numpy care about what you are trying to solve, its simple mathematics that dictates that both signed eigenvector/value combinations are valid, your values are sinusoidal, its just that there exists two sets of eigenvector/values that work (negative and positive)

Rebinning ndarray while conserving summation

I am looking for some function that can be used to rebin some ndarray, that satisfies:
The result can be arbitrary dimensions, either upscaling or downscaling.
After the rebinning, the summation should be the same as before.
It should not change the overall image shape. In other words, it should be reversible in case of upscaling.
Second constraint is not just summation-normalization or something, but the rebinning algorithm itself should calculate the fraction the original array elements are overlapped within resulting array elements.
Third argument can be tested in this way:
# image is ndarray with shape of 20x20
func(image, func(image, [40,40]),[20,20])==image # if func works as intended
So far I am aware of only two functions, which are
ndarray.resize: I don't fully understand what it does, but basically not what I am looking for.
scipy.misc.imresize: It interpolates values of each element, which is not so good for my purpose.
But they does not satisfy conditions I mentioned. As an example, I attached a code to argue the behaviour of scipy.misc.imresize.
import numpy as np
from scipy.special import erf
import matplotlib.pyplot as plt
from scipy.misc import imresize
def gaussian(size, center, width, a):
xcoord=np.arange(size[0])[:,np.newaxis]+np.zeros(size[1])[np.newaxis,:]
ycoord=np.zeros(size[0])[:,np.newaxis]+np.arange(size[1])[np.newaxis,:]
return a*((erf((xcoord+1-center[0])/(width[0]*np.sqrt(2)))-erf((xcoord-center[0])/(width[0]*np.sqrt(2))))*
(erf((ycoord+1-center[1])/(width[1]*np.sqrt(2)))-erf((ycoord-center[1])/(width[1]*np.sqrt(2)))))
size=np.asarray([20,20])
c=[[0.1,0.2],[0.4,0.6],[0.8,0.4]]
c=[np.asarray(x) for x in c]
s=[[0.02,0.02],[0.05,0.05],[0.03,0.01]]
s=[np.asarray(x) for x in s]
im = gaussian(size, c[0]*size, s[0]*size, 1) \
+gaussian(size, c[1]*size, s[1]*size, 3) \
+gaussian(size, c[2]*size, s[2]*size, 2)
sciim=imresize(imresize(im,[40,40]),[20,20])
plt.imshow(im/np.sum(im)-sciim/np.sum(sciim))
plt.show()
So, is there any function, preferably built-in function to some package, that satisfies my requirements?
For other language, I know that frebin in IDL works as what I mentioned. Of course I could re-write the function, or perhaps someone already did it, but I wonder whether if there is any existing solution.

frebin implements pixel duplication when the expansion is by integer value (like the 2x increase in your toy problem). If you want similar reversibility in such cases, try this:
def py_frebin(im, shape):
if np.isclose(x.shape % shape , np.zeros.like(x.shape)):
interp = 'nearest'
else:
interp = 'lanczos'
im2 = scipy.misc.imresize(im, shape, interp = interp, mode = 'F')
im2 *= im.sum() / im2.sum()
return im2
Should be better than frebin in non-integer expansions (as frebin seems to be doing interp = 'bilinear' which is less reversible), and similar in integral expansions.

Different results from the same function (numpy, OCR)

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
I am wrtiting a program with Python 3.4.1 to analyze a certain type of captcha.
Here's the part where I have the problem.
def f(path):
i = Image.open(path)
a = np.array(i) #to transform the image into an array using numpy
b = combinator(a) #this is a function I created to process the image (thresholding...)
capreader(b) #this is a function that divides the array and recognizes the character in each segment (I created previously functions for each character)
now when I call f() for a path 'p' it gives me a certain result (which is wrong). And when I call each of the instructions inside f() individually it gives me an other result (which is correct):
i = Image.open('p')
ia = np.array(i)
ib = combinator(ia)
capreader(ib)
This is weird! because I think that logically they should give the same result.
I then tried to see if inside f(), the array b is the same if it isn't inside f():
def x(path):
i = Image.open(path)
a = np.array(i)
b = combinator(a)
print(np.array_equal(b,ib)
and the result was False.
I then tested this:
def y(path):
i = Image.open(path)
a = np.array(i)
b = combinator(a)
plt.imshow(b)
plt.show()
capreader(b)
this time (after I close the pyplot window) capreader() gives me the correct answer!!
so I continued the testing, this time with this:
def test(path):
a = Image.open(path)
b = np.array(a)
c = combinator(b)
print(np.array_equal(c,ib)) #False
plt.imshow(c)
plt.show()
print(np.array_equal(c,ib)) #True
capreader(c)
I don't understand what's happening and how I can solve it. obviously what's making the difference here are the function from plt between the two comparisons which give opposing results(False, True). I read on the internet that plt.show() is a blocking function. I don't know what that means, but I put it in here in case it helps solving the case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Order of numpy (logical_and vs '&') statements leads to different results - python

Related

Replacing masked data with previous values of same dataset

Code of plotting a function in an interval (graph result)

Eigen vectors in python giving seemingly random element-wise signs

Rebinning ndarray while conserving summation

Different results from the same function (numpy, OCR)

Categories

Resources