Dear fellow coders and science guys :)
I am using python with numpy and matplotlib to simulate a perceptron, proud to say it works pretty well.
I used python even tough I've never seen it before, cause I heard matplotlib offered amazing graph visualisation capabilities.
Using functions below I get a 2d array that looks like this:
[[aplha_1, 900], [alpha_2], 600, .., [alpha_99, 900]
So I get this 2D array and would love to write a function that would enable me to analyze the convergence.
I am looking for something that will easily and intuitively (don't have time to study a whole new library for 5 hours now) draw a function like this sketch:
def get_convergence_for_alpha(self, _alpha):
epochs = []
for i in range(0, 5):
epochs.append(self.perceptron_algorithm())
self.weights = self.generate_weights()
avg = sum(epochs, 0) / len(epochs)
res = [_alpha, avg]
return res
And this is the whole generation function.
def alpha_convergence_function(self):
res = []
for i in range(1, 100):
res.append(self.get_convergence_for_alpha(i / 100))
return res
Is this easily doable?
You can convert your nested list to a 2d numpy array and then use slicing to get the alphas and epoch counts (just like in matlab).
import numpy as np
import matplotlib.pyplot as plt
# code to simulate the perceptron goes here...
res = your_object.alpha_convergence_function()
res = np.asarray(res)
print('array size:', res.shape)
plt.xkcd() # so you get the sketchy look :)
# first column -> x-axis, second column -> y-axis
plt.plot(res[:,0], res[:,1])
plt.show()
Remove the plt.xkcd() line if you don't actually want the plot to look like a sketch...
Related
I am working on a project that involves both Matlab and Python and I am producing some images. Altough the matrixes I want to transform into images are the same, the images I get are not the same. I assume this has something to do with the equivalence between Python and Matlab commands for displaying images and thus this is why I am here.
MATLAB CODE:
fmn0 = imread('cameraman.tif');
fmn=double(ifftshift(fmn0,2));
Fun=fftshift(fft(fmn,[],2),2);
imshow(real(Fun))
MATLAB OUTPUT:
PYTHON CODE:
import numpy as np
import matplotlib.pyplot as plt
import cv2
def row_wise_fft(A):
A = np.asarray(A)
rowWiseFFT = np.zeros((A.shape[0], A.shape[1]), dtype='complex')
for i in range(0, A.shape[0]):
rowWiseFFT[i, :] = np.fft.fft(A[i,:])
return rowWiseFFT
def row_wise_ifftshift(A):
for i in range(0, len(A)):
A[i] = np.fft.ifftshift(A[i])
return A
def row_wise_fftshift(A):
for i in range(0, len(A)):
A[i] = np.fft.fftshift(A[i])
return A
fmn = cv2.imread("cameraman.tif", cv2.IMREAD_GRAYSCALE)
fun = row_wise_fftshift(row_wise_fft(row_wise_ifftshift(fmn)))
plt.set_cmap("Greys_r")
plt.imshow(fun.real)
PYHTON OUTPUT:
I can see some similarities, but how would one leave the Python output as the exact same as the Matlab one? Note that the fun matrixes are the exact same.
MATLAB autoscales the output to [0 1], so most of your data in the MATLAB plot is extremely saturated and not really visible.
do imshow(real(Fun),[]) to remove the saturation and actually see all your data (MATLAB).
do plt.clim([0,1]) to saturate the visualization of your data in python.
You can also just give either MATLAB or python a different range of values to visualize (e.g. [0, 15])
I am working on filling in missing data in a large (4GB) netcdf datafile (3 dimensions: time, longitude and latitude). The method is to fill in the masked values in data1 either with:
1) previous values from data1 or
2) with data from another (also masked dataset, data2) if the found value from data1 < the found value from data2.
So fare I have tried a couple of things, one is to make a very complex script with long for loops which never finished running after 24 hours. I have tried to reduce it, but i think it is still very much to complicated. I believe there is a much more simple procedure to do it than the way I am doing it now I just can't see how.
I have made a script where masked data is first replaced with zeroes in order to use the function np.where to get the index of my masked data (i did not find a function that returns the coordinates of masked data, so this is my work arround it). My problem is that my code is very long and i think time consuming for large datasets to run through. I believe there is a more simple way of doing it, but I haven't found another work arround it.
Here is what I have so fare: : (the first part is just to generate some matrices that are easy to work with):
if __name__ == '__main__':
import numpy as np
import numpy.ma as ma
from sortdata_helpers import decision_tree
# Generating some (easy) test data to try the algorithm on:
# data1
rand1 = np.random.randint(10, size=(10, 10, 10))
rand1 = ma.masked_where(rand1 > 5, rand1)
rand1 = ma.filled(rand1, fill_value=0)
rand1[0,:,:] = 1
#data2
rand2 = np.random.randint(10, size=(10, 10, 10))
rand2[0, :, :] = 1
coordinates1 = np.asarray(np.where(rand1 == 0)) # gives the locations of where in the data there are zeros
filled_data = decision_tree(rand1, rand2, coordinates1)
print(filled_data)
The functions that I defined to be called in the main script are these, in the same order as they are used:
def decision_tree(data1, data2, coordinates):
# This is the main function,
# where the decision between data1 or data2 is chosen.
import numpy as np
from sortdata_helpers import generate_vector
from sortdata_helpers import find_value
for i in range(coordinates.shape[1]):
coordinate = [coordinates[0, i], coordinates[1,i], coordinates[2,i]]
AET_vec = generate_vector(data1, coordinate) # makes vector to go back in time
AET_value = find_value(AET_vec) # Takes the vector and find closest day with data
PET_vec = generate_vector(data2, coordinate)
PET_value = find_value(PET_vec)
if PET_value > AET_value:
data1[coordinate[0], coordinate[1], coordinate[2]] = AET_value
else:
data1[coordinate[0], coordinate[1], coordinate[2]] = PET_value
return(data1)
def generate_vector(data, coordinate):
# This one generates the vector to go back in time.
vector = data[0:coordinate[0], coordinate[1], coordinate[2]]
return(vector)
def find_value(vector):
# Here the fist value in vector that is not zero is chosen as "value"
from itertools import dropwhile
value = list(dropwhile(lambda x: x == 0, reversed(vector)))[0]
return(value)
Hope someone has a good idea or suggestions on how to improve my code. I am still struggling with understanding indexing in python, and I think this can definately be done in a more smooth way than I have done here.
Thanks for any suggestions or comments,
I'm trying to plot a simple moving averages function but the resulting array is a few numbers short of the full sample size. How do I plot such a line alongside a more standard line that extends for the full sample size? The code below results in this error message:
ValueError: x and y must have same first dimension, but have shapes (96,) and (100,)
This is using standard matplotlib.pyplot. I've tried just deleting X values using remove and del as well as switching all arrays to numpy arrays (since that's the output format of my moving averages function) then tried adding an if condition to the append in the while loop but neither has worked.
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
weights = np.repeat(1.0, window) / window
smas = np.convolve(values, weights, 'valid')
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min, max))
vY = np.append(vY, val)
vX = np.append(vX, x)
x += 1
plt.plot(vX, vY)
plt.plot(vX, movingaverage(vY, window))
plt.show()
Expected results would be two lines on the same graph - one a simple moving average of the other.
Just change this line to the following:
smas = np.convolve(values, weights,'same')
The 'valid' option, only convolves if the window completely covers the values array. What you want is 'same', which does what you are looking for.
Edit: This, however, also comes with its own issues as it acts like there are extra bits of data with value 0 when your window does not fully sit on top of the data. This can be ignored if chosen, as is done in this solution, but another approach is to pad the array with specific values of your choosing instead (see Mike Sperry's answer).
Here is how you would pad a numpy array out to the desired length with 'nan's (replace 'nan' with other values, or replace 'constant' with another mode depending on desired results)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.pad.html
import numpy as np
bob = np.asarray([1,2,3])
alice = np.pad(bob,(0,100-len(bob)),'constant',constant_values=('nan','nan'))
So in your code it would look something like this:
import random
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values,window):
weights = np.repeat(1.0,window)/window
smas = np.convolve(values,weights,'valid')
shorted = int((100-len(smas))/2)
print(shorted)
smas = np.pad(smas,(shorted,shorted),'constant',constant_values=('nan','nan'))
return smas
sampleSize = 100
min = -10
max = 10
window = 5
vX = np.array([])
vY = np.array([])
x = 0
val = 0
while x < sampleSize:
val += (random.randint(min,max))
vY = np.append(vY,val)
vX = np.append(vX,x)
x += 1
plt.plot(vX,vY)
plt.plot(vX,(movingaverage(vY,window)))
plt.show()
To answer your basic question, the key is to take a slice of the x-axis appropriate to the data of the moving average. Since you have a convolution of 100 data elements with a window of size 5, the result is valid for the last 96 elements. You would plot it like this:
plt.plot(vX[window - 1:], movingaverage(vY, window))
That being said, your code could stand to have some optimization done on it. For example, numpy arrays are stored in fixed size static buffers. Any time you do append or delete on them, the entire thing gets reallocated, unlike Python lists, which have amortization built in. It is always better to preallocate if you know the array size ahead of time (which you do).
Secondly, running an explicit loop is rarely necessary. You are generally better off using the under-the-hood loops implemented at the lowest level in the numpy functions instead. This is called vectorization. Random number generation, cumulative sums and incremental arrays are all fully vectorized in numpy. In a more general sense, it's usually not very effective to mix Python and numpy computational functions, including random.
Finally, you may want to consider a different convolution method. I would suggest something based on numpy.lib.stride_tricks.as_strided. This is a somewhat arcane, but very effective way to implement a sliding window with numpy arrays. I will show it here as an alternative to the convolution method you used, but feel free to ignore this part.
All in all:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
def movingaverage(values, window):
# this step creates a view into the same buffer
values = np.lib.stride_tricks.as_strided(values, shape=(window, values.size - window + 1), strides=values.strides * 2)
smas = values.sum(axis=0)
smas /= window # in-place to avoid temp array
return smas
sampleSize = 100
min = -10
max = 10
window = 5
v_x = np.arange(sampleSize)
v_y = np.cumsum(np.random.random_integers(min, max, sampleSize))
plt.plot(v_x, v_y)
plt.plot(v_x[window - 1:], movingaverage(v_y, window))
plt.show()
A note on names: in Python, variable and function names are conventionally name_with_underscore. CamelCase is reserved for class names. np.random.random_integers uses inclusive bounds just like random.randint, but allows you to specify the number of samples to generate. Confusingly, np.random.randint has an exclusive upper bound, more like random.randrange.
I need to create a histogram of a very large data set in python 3. However, I cannot use a list to create a histogram because the list would be too large given my data. I need a way to update a histogram as each data point is created. That way my computer is only ever dealing with a single point and updating the plot.
I've been using matplotlib. Tried plt.draw() but couldn't get it to work. (See code below)
#Proof of concept code
l = [1, 2, 3, 2, 3, 2]
n = 0
p = False
for x in range(0,6):
n = l[x]
if p == False:
fig = plt.hist(n)
p = True
else:
plt.draw()
I need a plot that looks like plt.hist(l). But have only been getting the first point plotted.
Are you familiar with Numpy? Numpy handles large arrays pretty well.
Here's an example using a random integer set from 1 to 3 (inclusive).
import matplotlib.pyplot as plt
import numpy as np
arr_random = np.random.randint(1,4,10000)
plt.hist(arr_random)
plt.show()
It's very efficient to compute plt.hist() with Numpy arrays.
I'm struggling to find an easy example to parallelize nested loops in NDim data in Python.
As a simple example suppose we have a gridded precipitation data of dimensions (time, lat, lon), and are to find the temporal mean at each lat-lon grid point, i.e. to obtain same result as data.mean(axis=0).
def mean(data):
result = np.zeros(data[0,:,:].shape, dtype=np.float)
for i in range(data.shape[1]):
for j in range(data.shape[2]):
result[i,j] = data[:,i,j].mean()
return result
What would be the most elegant way to parallelize this function?
Update: The precip data can be downloaded at: https://www.esrl.noaa.gov/psd/data/gridded/data.gpcp.html.
Test Code:
%matplotlib inline
import xarray
import numpy as np
import matplotlib.pyplot as plt
#Load data:
ds = xarray.open_dataset('precip.mon.mean.nc')
# Select a small subset, shape is now (442, 50,50)
data = ds.precip[:,:50,:50].to_masked_array()
#define the function to compute the temporal mean at each grid point:
def mean(data):
result = np.zeros(data[0,:,:].shape, dtype=np.float)
for i in range(data.shape[1]):
for j in range(data.shape[2]):
result[i,j] = data[:,i,j].mean()
return result
#Call the function
result = mean(data)
#A quick plot for visual reference
plt.figure()
plt.imshow(result, origin='upper',interpolation='None'); plt.colorbar()
My working codes involve more complex techniques (than just taking the mean), but the basic code structure is similar: nested double loop to access each grid point in order to perform analysis, and save the result as a 2D or ND array. So being able to parallelize this would be immensely beneficial.