Different results from the same function (numpy, OCR)

Different results from the same function (numpy, OCR) - python

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
I am wrtiting a program with Python 3.4.1 to analyze a certain type of captcha.
Here's the part where I have the problem.
def f(path):
i = Image.open(path)
a = np.array(i) #to transform the image into an array using numpy
b = combinator(a) #this is a function I created to process the image (thresholding...)
capreader(b) #this is a function that divides the array and recognizes the character in each segment (I created previously functions for each character)
now when I call f() for a path 'p' it gives me a certain result (which is wrong). And when I call each of the instructions inside f() individually it gives me an other result (which is correct):
i = Image.open('p')
ia = np.array(i)
ib = combinator(ia)
capreader(ib)
This is weird! because I think that logically they should give the same result.
I then tried to see if inside f(), the array b is the same if it isn't inside f():
def x(path):
i = Image.open(path)
a = np.array(i)
b = combinator(a)
print(np.array_equal(b,ib)
and the result was False.
I then tested this:
def y(path):
i = Image.open(path)
a = np.array(i)
b = combinator(a)
plt.imshow(b)
plt.show()
capreader(b)
this time (after I close the pyplot window) capreader() gives me the correct answer!!
so I continued the testing, this time with this:
def test(path):
a = Image.open(path)
b = np.array(a)
c = combinator(b)
print(np.array_equal(c,ib)) #False
plt.imshow(c)
plt.show()
print(np.array_equal(c,ib)) #True
capreader(c)
I don't understand what's happening and how I can solve it. obviously what's making the difference here are the function from plt between the two comparisons which give opposing results(False, True). I read on the internet that plt.show() is a blocking function. I don't know what that means, but I put it in here in case it helps solving the case.

Related

Replacing masked data with previous values of same dataset

I am working on filling in missing data in a large (4GB) netcdf datafile (3 dimensions: time, longitude and latitude). The method is to fill in the masked values in data1 either with:
1) previous values from data1 or
2) with data from another (also masked dataset, data2) if the found value from data1 < the found value from data2.
So fare I have tried a couple of things, one is to make a very complex script with long for loops which never finished running after 24 hours. I have tried to reduce it, but i think it is still very much to complicated. I believe there is a much more simple procedure to do it than the way I am doing it now I just can't see how.
I have made a script where masked data is first replaced with zeroes in order to use the function np.where to get the index of my masked data (i did not find a function that returns the coordinates of masked data, so this is my work arround it). My problem is that my code is very long and i think time consuming for large datasets to run through. I believe there is a more simple way of doing it, but I haven't found another work arround it.
Here is what I have so fare: : (the first part is just to generate some matrices that are easy to work with):
if __name__ == '__main__':
import numpy as np
import numpy.ma as ma
from sortdata_helpers import decision_tree
# Generating some (easy) test data to try the algorithm on:
# data1
rand1 = np.random.randint(10, size=(10, 10, 10))
rand1 = ma.masked_where(rand1 > 5, rand1)
rand1 = ma.filled(rand1, fill_value=0)
rand1[0,:,:] = 1
#data2
rand2 = np.random.randint(10, size=(10, 10, 10))
rand2[0, :, :] = 1
coordinates1 = np.asarray(np.where(rand1 == 0)) # gives the locations of where in the data there are zeros
filled_data = decision_tree(rand1, rand2, coordinates1)
print(filled_data)
The functions that I defined to be called in the main script are these, in the same order as they are used:
def decision_tree(data1, data2, coordinates):
# This is the main function,
# where the decision between data1 or data2 is chosen.
import numpy as np
from sortdata_helpers import generate_vector
from sortdata_helpers import find_value
for i in range(coordinates.shape[1]):
coordinate = [coordinates[0, i], coordinates[1,i], coordinates[2,i]]
AET_vec = generate_vector(data1, coordinate) # makes vector to go back in time
AET_value = find_value(AET_vec) # Takes the vector and find closest day with data
PET_vec = generate_vector(data2, coordinate)
PET_value = find_value(PET_vec)
if PET_value > AET_value:
data1[coordinate[0], coordinate[1], coordinate[2]] = AET_value
else:
data1[coordinate[0], coordinate[1], coordinate[2]] = PET_value
return(data1)
def generate_vector(data, coordinate):
# This one generates the vector to go back in time.
vector = data[0:coordinate[0], coordinate[1], coordinate[2]]
return(vector)
def find_value(vector):
# Here the fist value in vector that is not zero is chosen as "value"
from itertools import dropwhile
value = list(dropwhile(lambda x: x == 0, reversed(vector)))[0]
return(value)
Hope someone has a good idea or suggestions on how to improve my code. I am still struggling with understanding indexing in python, and I think this can definately be done in a more smooth way than I have done here.
Thanks for any suggestions or comments,

Order of numpy (logical_and vs '&') statements leads to different results

I have a simple piece of code where I am trying to compare the numpy function logical_and vs the "&" operator.
I encounter a very strange behavior where the order in the which the statements are executed, seem to have an effect on the final result when in fact they shouldn't. Weird!
In the below code , if I interchange the final_mask1 and final_mask statement order, it leads to a different value of the variable "test" as well as a different image as an output. This is for the case where I have final_mask as output. Am I missing something here? How can I resolve this?
TIA
import numpy as np
from scipy import misc
import matplotlib.pyplot as plt
photo_data = misc.imread('./sd-3layers.jpg')
red_mask = photo_data[:, : ,0] < 150
green_mask = photo_data[:, : ,1] > 100
blue_mask = photo_data[:, : ,2] < 100
final_mask1 = np.logical_and(red_mask, green_mask, blue_mask)
final_mask = red_mask & green_mask & blue_mask
test = (final_mask1 == final_mask)
print(np.all(test))
photo_data[final_mask] = 0
plt.figure(figsize=(15,15))
plt.imshow(photo_data)

Looking up the documentation of logical_and, one finds that it compares only two arrays, and that the third argument is used for storing results in a different array. You can use reduce to avoid having to write to calls to logical_and, so what you're trying to do ends up looking like
np.logical_and.reduce((red_mask, green_mask, blue_mask))

How to properly use semilogy?

For class, we are supposed to calculate the absolute error, realistic error for an e series. In the end, we have to graph both the relative error and the absolute error using a "semilogy" graph. The code itself works fine and produces a numerical calculation as expected. But in terms of the graph, it doesn't get near the actual result.
Any idea of why this isn't working?
An image of the graph has been attached at the end.
import numpy as np
import math as m
import matplotlib.pyplot as plt
Exp_List = []
Relative_Error = []
Absolute_Error = []
def ExpCalc(x,N,t):
exp = 0.0
for i in range(N-1):
fac = m.factorial(i)
next_term = x**i/fac
Exp_List.append(next_term)
exp += next_term
Relative = abs((sum(Exp_List)-t)/sum(Exp_List))
Relative_Error.append(Relative)
Absolute = abs(sum(Exp_List)-t)
Absolute_Error.append(Absolute)
i += 1
return (Absolute_Error, Relative_Error)
A = m.exp(1)
B = m.exp(20)
C = m.exp(100)
print(ExpCalc(1,20,A))
plt.figure()
plt.semilogy(ExpCalc(1,20,A))
plt.show()
The image shows the numerical calculations as well as the graph obtained from the code

If you look at the result of your print(ExpCalc(1,20,A)) it is actually a tuple, which is causing the plotting behaviour you are seeing.
To fix this you can call the function and unpack the values, then do the plotting seperately to ensure you are plotting the correct values.
x, y = ExpCalc(1,20,A)
plt.figure()
plt.semilogy(x,y)
plt.show()
Which gives:

FFT results Matlab VS Numpy (Python) : not the same results

I have a Matlab script to compute the DFT of a signal and plot it:
(data can be found here)
clc; clear; close all;
fid = fopen('s.txt');
txt = textscan(fid,'%f');
s = cell2mat(txt);
nFFT = 100;
fs = 24000;
deltaF = fs/nFFT;
FFFT = [0:nFFT/2-1]*deltaF;
win = hann(length(s));
sw = s.*win;
FFT = fft(sw, nFFT)/length(s);
FFT = [FFT(1); 2*FFT(2:nFFT/2)];
absFFT = 20*log10(abs(FFT));
plot(FFFT, absFFT)
grid on
I am trying to translate it to Python and can't get the same result.
import numpy as np
from matplotlib import pyplot as plt
x = np.genfromtxt("s.txt", delimiter=' ')
nfft = 100
fs = 24000
deltaF = fs/nfft;
ffft = [n * deltaF for n in range(nfft/2-1)]
ffft = np.array(ffft)
window = np.hanning(len(x))
xw = np.multiply(x, window)
fft = np.fft.fft(xw, nfft)/len(x)
fft = fft[0]+ [2*fft[1:nfft/2]]
fftabs = 20*np.log10(np.absolute(fft))
plt.figure()
plt.plot(ffft, np.transpose(fftabs))
plt.grid()
The plots I get (Matlab on the left, Python on the right):
What am I doing wrong?

Both codes are different in one case you concatenate two lists
FFT = [FFT(1); 2*FFT(2:nFFT/2)];
in the matlab code
in the other you add the first value of fft with the rest of the vector
fft = fft[0]+ [2*fft[1:nfft/2]]
'+' do not concatenate here because you have numpy array
In python, it should be:
fft = fft[0:nfft/2]
fft[1:nfft/2] = 2*fft[1:nfft/2]

I am not a Mathlab user so I am not sure but there are few things I'd ask to see if I can help you.
You called np.array after array has been made (ffft). That probably will not change the nature of array as well as you hoped, perhaps it would be better to try to define it inside np.array(n * deltaF for n in range(nfft/2-1)) I am not sure of formatting but you get the idea. The other thing is that the range doesn't seem right to me. You want it to have a value of 49?
Another one is the fft = fft[0]+ [2*fft[1:nfft/2]] compared to FFT = [FFT(1); 2*FFT(2:nFFT/2)]; I am not sure if the comparsion is accurate or not. It just seemed to be a different type of definition to me?
Also, when I do these type of calculations, I 'print' out the intermediate steps so I can compare the numbers to see where it breaks.
Hope this helps.

I found out that using np.fft.rfft instead of np.fft.fft and modifying the code as following does the job :
import numpy as np
from matplotlib import pyplot as pl
x = np.genfromtxt("../Matlab/s.txt", delimiter=' ')
nfft = 100
fs = 24000
deltaF = fs/nfft;
ffft = np.array([n * deltaF for n in range(nfft/2+1)])
window = np.hanning(len(x))
xw = np.multiply(x, window)
fft = np.fft.rfft(xw, nfft)/len(x)
fftabs = 20*np.log10(np.absolute(fft))
pl.figure()
pl.plot(np.transpose(ffft), fftabs)
pl.grid()
The resulting plot :
right result with Python
I can see that the first and the last points, as well as the amplitudes are not the same. It isn't a problem for me (I am more interested in the general shape), but if someone can explain, I'd be happy.

imshow() returns invalid dimensions for 2D array when using multiprocessing.Pool

I'm trying to use the multiprocessing module to create figures from 2D arrays faster. In the code below I create a 2D array from a hdf5 data file (please message me if you would like a sample file to test on). Using multiprocessing.Pool, I try to pass this array to the map function but it raises TypeError: Invalid dimensions for image data. I've checked to make sure my array is 2 dimensions using da.shape, so I'm not sure why it isn't working for me. Any help is much appreciated!
To import yt, see yt-project.org/#getyt.
P.S. This is my first question on Stack Overflow so please let me know if/how I can improve.
import yt
import numpy as np
import multiprocessing
from multiprocessing import Pool, Process, Array
fl_nm = raw_input("enter filename: ").strip()
level = int(raw_input("resolution level: ").strip())
ds = yt.load(fl_nm)
all_data_level_x = ds.covering_grid(level=level,left_edge=[-3.70281620e+21,0.00000000e+00,-3.70281620e+21],dims=ds.domain_dimensions*2**level)
disp_array = []
for x in xrange(0,16*2**level):
vbin = []
for z in xrange(0,80*2**level):
v = []
for y in xrange(0,8*2**level):
vel = all_data_level_x["velocity_magnitude"][x,y,z].in_units("km/s")
v.append(vel)
sigma = np.sqrt(np.sum((v - np.mean(v))**2) / np.size(v))
vbin.append(sigma)
disp_array.append(vbin)
print "{0:.1f} %".format((x+1)*100/float(16*2**level))
da = np.array(disp_array)
print "fixed resolution array created"
def __main__(data_array):
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
plt.imshow(data_array, origin = "lower", aspect = "equal", extent=[-1.2,10.8,-1.2,1.2])
plt.colorbar(fraction=0.046, pad=0.04)
print "plot created. Saving figure..."
fig_nm = 'velocity_disp_{0}_lvl_{1}.png'.format(fl_nm[-4:],level)
plt.savefig(fig_nm)
plt.close()
print "File saved as: " + fig_nm
return
pool = multiprocessing.Pool(4)
pool.map(__main__,da)

pool.map(func, iterable[, chunksize]) iterates the da. So if da is a 2-D array like [[1,2],[3,4]]. The input of your __main__ function will be [1,2] and [3,4] for every process.
I'm not sure what you want to do, so if you really want to get a full help, you can upload your executable project(to github or something else, whatever) and I will check.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Different results from the same function (numpy, OCR) - python

Related

Replacing masked data with previous values of same dataset

Order of numpy (logical_and vs '&') statements leads to different results

How to properly use semilogy?

FFT results Matlab VS Numpy (Python) : not the same results

imshow() returns invalid dimensions for 2D array when using multiprocessing.Pool

Categories

Resources