I had a question on how matplotlib worked. Basically, I want to flip x and y in my image just as this person asked.
However, I do not want to resort to transposing the array before sending it.
I assume this would result in a loss of performance. Now here is my reasoning. My guess is that matplotlib probably tries to copy an image from numpy to matplotlib by iterating over the rapidly varying index in both (where rapidly varying index here is assumed the index that leads to accessing contiguous elements in physical memory). If I transpose the array, one of two cases can probably happen:
What matplotlib thinks to be the rapidly varying index in memory is no longer true and thus it will no longer be accessing the numpy array in a memory contiguous fashion, resulting in slower readout. (i.e. numpy just changes it's "view" into the matrix)
The Numpy array is actually copied into a new array where the rapidly varying index is transposed. The reading into matplotlib is fast at the cost of copying a new matrix into memory.
Both of these cases are not ideal in my case as I would like to achieve reasonably high refresh rates. The arrays are loaded from images already stored on a hard drive stored in this fashion.
If my assumption is true, is there some method to have matplotlib change its rapidly varying index for the case of an image? This would be a very useful feature I believe.
I hope to have communicated my line of reasoning. I want to make sure every read and write in the chain is memory contiguous, from the hard drive to the numpy array to matplotlib. I believe that simply finding (or offering in the future) an option for matplotlib to reverse its method of ordering would save time.
I am definitely open to other solutions. If there's something obvious I have missed, I'd like to hear it thanks :-)
Edit: Thanks for the comments. I believe you are right in that I should have done some testing before hand. I have an interesting result. The transpose seems to be faster than the non-transpose. Since the numpy reference is simply a view into the array, it is possible matplotlib uses this to its advantage to cleverly decide whether or not to transpose last minute? This results suggests so, that or my code has a flaw (which is very possible). Seriously, if that's true, good friggin' job matplotlib developers! :-)
Here are the figures:
Figure 1 : Transposed image (blue) and non transposed image (green). Each data point is the time taken to draw all 10 frames in sequence. This is repeated 100 times. I did not use the machine for the duration of these experiments.
Figure 2 : Zoom of same figure
The code (it's crude, learning and limited time to spend on this. I ran the first one for 1000 frames, closed the window, uncommented the second animation, commented the first and ran again for 1000 frames then plotted the two.):
`
from matplotlib.pyplot import *
from matplotlib import animation
from time import time
import pylab
import numpy as np
data = np.random.random((10,1000,1000))
ion()
fig = figure(0)
im = imshow(data[0,:,:])
data[0,:,:] *= 0
time1 = time()
time0 = time()
timingarraynoT = np.zeros(100)
timingarrayT = np.zeros(100)
def animatenoT(i):
global time0,time1,timingarraynoT
if(i%10==0):
time0 = time()
if(i%10 == 9):
time1 = time()
#print("Time for 10 frames: {}".format(time1-time0))
timingarraynoT[i/10] = time1-time0
im.set_data(data[i%10,:,:])
if(i == 1000):
return -1
return im
def animateT(i):
global time0,time1,timingarrayT
if(i%10==0):
time0 = time()
if(i%10 == 9):
time1 = time()
#print("Time for 10 frames: {}".format(time1-time0))
timingarrayT[i/10] = time1-time0
im.set_data(data[i%10,:,:].T)
if(i == 1000):
return -1
return im
#anim = animation.FuncAnimation(fig, animateT,interval=0)
anim = animation.FuncAnimation(fig, animatenoT,interval=0)
`
Related
I have to plot a large amount of data in python (a list of size 3 million) any method/libraries to plot them easily since matplotlib does not seem to work.
what do you mean matplotlib does not work? It works when I tried it. is your data 1-dimensional or multi-dimensional? Are you expecting to see 3 million ticks in x axis? because that would not be possible.
d = 3*10**6
a = np.random.rand(d)
a[0] = 5
a[-1] = -5
print(a.shape)
plt.plot(a)
the plot
I use quite intensively matplotlib in order to plot arrays of size n > 10**6.
You can use plt.xscale('log') which allow you to display your results.
Furthermore, if your dataset shows great disparity in value, you can use plt.yscale('log') in order to plot them nicely if you use the plt.plot() function.
If not (ie you use imshow, hist2d and so on) you can write this in your preamble :
from matplotlib.colors import LogNorm and just declare the optional argument norm = LogNorm().
One last thing : you shouldn't use numpy.loadtxt if the size of the text file is greater than your available RAM. In that case, the best option is to read the file line by line, even if it take more time. You can speed up the process with from numba import jit and declare #jit(nopython=True, parallel =True) .
With that in mind, you should be able to plot in a reasonably short time array of size of about ten millions.
I essentially have three data points in three separate numpy array and I want to plot the time along the x-axis, the frequency along the y-axis and the magnitude should represent the colors. But this is all happening in real time so it should look like the spectrogram is drawing itself as the data updates.
I have simplified the code to show that I am running a loop that takes at most 10 ms to run and during that time I am getting new data from functions.
import numpy as np
import random
from time import sleep
def new_val_freq():
return random.randint(0,22)
def new_val_mag():
return random.randint(100,220)
x_axis_time = np.array([1])
y_axis_frequency = np.array([10])
z_axis_magnitude = np.array([100])
t = 1
while True:
x_axis_time = np.append (x_axis_time , [t+1])
t+=1
y_axis_frequency = np.append (y_axis_frequency , [new_val_freq()])
z_axis_magnitude = np.append (z_axis_magnitude, [new_val_mag()])
time.sleep(0.01)
#Trying to figure out how Create/Update spectrogram plot with above additional
#data in real time without lag
Ideally I would like this to be as fast as it possibly can rather than having to redraw the whole spectrogram. It seems matplolib is not good for plotting dynamic spectrograms and I have not come across any dynamic spectrogram examples so I was wondering how I could do this?
i'm trying to get the maximum points in a one dimensional array, where it makes several curves. To do this i use scipy.signal.argrelextrema, along with np.greater. Give here where the array is y:
argrelextrema(y, np.greater)
The issue is, that this one dimensional data has inaccuracies due to the way the data for y was gathered. And so, there's a lot of "false positives" at the bottom of the curve, where there technically is a maxima at the bottom because of one value being bigger than the surrounding ones.
For reference, here's y, over x which is just the index of each y value, to demonstrate the array i'm working with. The inaccuracies at the bottom is not visible. Ignore axises, used what i had in the code.
Also, here's the result of me using the found maxima to calculate a value, as seen this is not wanted, as the expected result should have been a smooth falling curve. The graph was made with one point for each maxima, in an increasing order. And it's clearly wrong, as one can observe from the actual graph.
So, what's the best solution to avoid shis? I failed to find something that could approximate the graph in a good enough manner for me the be able to use it. I looked into smoothening, but the methods i found, like savgol_filter from scipy.signal, was something i could not understand.
The current solution was to ignore values of y that were below 5, which was roughly a bit over the bottom of the curve, but not an ideal solution at all.
Update:
Found out find_peaks_cwt from scipy.signal works for this too. It's a tad more complex as i have absolutely no clue how most of it works, even after reading up on it a bit. However, i managed to make a slightly better graph, i think, using: find_peaks_cwt(y, [3], noise_perc=2) However, the result seen below, is only a result of me dropping noise from 10 to 2, without really knowing how that affects the result.
Edit:
Here's is the 1D-array i'm working on: https://pastebin.com/GZrBBRce
Sorry for bad representation, but each line is the next value in the list. It's a bit large.
Edit: Minimum working examplem, y is from the pastebin, a bit large to include in minimum working example:
energy = []
for i in find_peaks_cwt(y, [3],noise_perc=2):
energy.append(y[i])
plt.plot([i for i in range(len(energy))], energy)
plt.show()
This was made with some guessing, and the result is seen in the last image in this question.
Update 2: Further progress, i smoothed out the y function using numpy.polyfit with a 15 degree approximation. It's surprisingly accurate. And since that is smooth, i can revert to the first function argrelextrema(y, np.greater), and it's gives me a pretty decent answer as well as not including false positives, as seen in the above graphs. (I got 30-40 maximas, when my graph only has a little above 20 of them.)
I'll let it stand a bit, before marking solved, in case anyone want to have a go at a better solution and approximating the graph with numpy.polyfit. However this solved the issue for my usecase.
I would use: scipy.signal.find_peaks_cwt().
From its documentation:
Attempt to find the peaks in a 1-D array.
The general approach is to smooth vector by convolving it with wavelet(width) for each width in widths. Relative maxima which appear at enough length scales, and with sufficiently high SNR, are accepted.
UPDATE (with actual data)
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import scipy.signal
y_arr = np.loadtxt('/home/raid1/metere/Downloads/1d_array.txt')
print('array size: ', y_arr.shape)
arr_size = len(y_arr)
expected_num = 30
expected_width = arr_size // expected_num // 2
print('expected width of peaks: ', expected_width)
peaks = sp.signal.find_peaks_cwt(y_arr, np.linspace(2, expected_width, 10))
print('num peaks: ', len(peaks))
print('peaks: ', peaks)
plt.plot(y_arr)
for peak in peaks:
plt.axvline(peak)
plt.show()
This can probably tweaked further, for example to increase the accuracy.
If it's not too big array the simplest way that I can think right now is just by one loop on the array's value. Somethong like:
if ar[i-1] < ar[i] & ar[i] > ar[i+1]
Add a[i] to maxArray.
You just need to check if ar[i] != ar[i+1]. In case that it equal, you should take the first or that value or the eqauls values.
Edit:
count=1
leng=len(list1)-1
while count < leng:
count=count+1
if ((list1[count-1] < list1[count]) & (list1[count] > list1[count+1])):
print list1[count]
21.55854026
4.205178829
16.6062412
16.60490417
13.14358751
11.76675489
10.71131948
10.34922208
9.703966466
4.440605216
9.557176225
9.163999162
4.530660664
9.067259599
4.482917884
8.628552441
4.443787789
8.340760319
7.9779415
4.411471328
4.415029139
7.840661767
7.858075487
4.413923668
7.555398794
7.533918443
4.445146914
7.58446368
7.56834833
7.264249919
7.34901701
7.349173404
7.315796894
7.235120197
4.577840109
7.24188544
7.243943576
7.205527364
4.480817125
4.483523464
4.526264151
6.90592723
6.903067763
6.905932124
4.513352307
4.464000858
6.848673936
6.831810008
6.819620162
4.485243384
6.606738091
Your data a bit noisy. so I got some "additional" values.
EDIT 2:
so you can add a filter, I sure that you can find a better way to do it but the simplest for me now is:
list2=[]
count=3
leng=len(list1)-4
while count < leng:
count=count+1
avr=(list1[count-3]+list1[count-2]+list1[count-1]+list1[count]+list1[count+1]+list1[count+2]+list1[count+3])/7
list2.append(avr)
count=1
leng=len(list2)-2
while count < leng:
count=count+1
if ((list2[count-1] < list2[count]) & (list2[count] > list2[count+1])):
print list2[count]
21.5255838514
16.5808501743
13.1294409014
11.75618281
10.7026162129
10.3274025343
9.68175366729
9.53899509229
9.15257714671
9.06056034386
8.615976868
8.33681455
7.971226556
7.84655894214
7.54856005157
7.57721360586
7.34372518657
7.23259654857
6.90384834786
6.83781572657
I am creating a simulation of diffusion in a complex system taking arbitrary images as a substrate and allowing arbitrary creation of diffusion fronts and allowing both surface reactions as well as deposition of new material on the starting substrates. The results I'm quite proud of so far, and you can check out the movies I made with it here for CVD and SFD deposition on particles.
CVD Movie
SFD Movie
Unfortunately I cannot generate more than 50 or so frames because it runs out of memory. I have tried clearing things as much as possible throughout the simulation, but I think I must be missing something. To summarize:
I start out by creating an empty list
ims = []
Then, each time my "simulation" runs, if frame number % frame "rate" == 0, it generates a frame which is:
displayed using plt.ion() through plt.draw() and
uses ims.append() to add the rendered plot to an array of animated frames.
Before each frame render, I run plt.clf() to prevent the plot from just having increasing numbers of overlaid plots.
Without the ims.append() step, the code consumes between 140 and 170MB of RAM. With that step, 50 frames consumes nearly 1.4GB of RAM. Obviously, this is very limiting. 50 frames is nice, but I'd really like at least 350. That may be impossible with this route, but this suggests a memory usage purely by the ims array of roughly 24MB per frame.
A workaround is to create the frame and render it to an .svg or .png file inside the loop and save it to disk. I find that this rendering process is very CPU intensive so doing that often makes the code quite slow. Additionally, creating 350 PNG files and then converting them manually into a video is pretty messy, so I'd love to somehow get it all inside of the program itself.
Does anyone have an idea for how to decrease the memory usage of this example code without resorting to rendering and writing each frame to disk?
In this toy code, I just used a random number generator to populate the two datasets as described in the comments to speed things up.
The code:
import matplotlib.pyplot as plt
import matplotlib.animation as anim
from numpy import *
from matplotlib import *
import time
# Defines the number of frames of animation to render.
outputframes = 50
# Defines the size of the canned simulation.
nx = 800
ny = 800
# Defines the number of actual simulation timesteps
nt = 100
# This gets the number of timesteps between outputframes.
framestep = 2
# For reporting.
framenum = 0
# Creates two steps, one for the stepped simulated step,
# and one for the prior state. There are two independently
# changing materials, each of which will have half the simulation
# space containing random values here, plus 10% overlap in the
# middle.
p1 = zeros((nx, ny, 2))
p1[360:800,:,0] = random.rand(440, ny)
p2 = zeros((nx, ny, 2))
p2[0:440,:,0] = random.rand(440, ny)
# Animation colormap setup
norm = colors.Normalize(vmin=0, vmax = 1)
# And sets up two corresponding colormaps, one blue and one
# red for p1 and p2 respectively (goal is overlaid).
cmap1 = cm.Blues
cmap2 = cm.Reds
# Sets up an empty array to hold animation frames.
ims = []
# Sets up and uses ion to draw the figure without blocking.
plt.ion()
fig = plt.figure()
plt.draw()
# Run the simulation.
for t in range(nt):
# This looks to see how far we are, and if we're at a point
# where t is an even multiple of framestep, we should render
# a new frame.
if (t%framestep == 0):
print('Frame ' + str(framenum))
framenum = framenum + 1
plt.clf()
# In here I did a bunch of stuff to get special colors in
# the colormap to get substrates and surfaces and other
# features clearly identified. I am creating a new frame1
# and frame2 object because in reality I will be doing a
# log plot math to convert to the graphic frame.
frame1 = p1[:,:,0]
# This part is necessary in my real program because
# I manually modify the colormap after it's created
# to include the above mentioned special colors.
frame1_colors = cmap1(norm(frame1))
# This is my (not quite right) attempt to do overlaid plots.
plt.imshow(frame1_colors, alpha = 0.5)
# Do the same for the second set of data.
frame2 = p2[:,:,0]
frame2_colors = cmap2(norm(frame2))
# The goal here was to take the combined output and make
# it into an animation frame to append to ims, the image
# array.
# This is where I start to run into problems. Without the
# ims.append, the program has constant memory usage. With
# it, I am using 1340MB by the 50th frame. This is the
# biggest issue. Even throwing away all other simulation
# data, this image array for animation is *enormous*.
# With the ims.append line replaced with the plt.imshow
# line alone, memory usage is much smaller, ranging from
# 140-170MB depending on execution point, but relatively
# constant.
ims.append([plt.imshow(frame2_colors, alpha = 0.5)])
# plt.imshow(frame2_colors, alpha = 0.5)
# Then try to draw updating animation to show progress
# using draw(). As best I can tell, this basically works,
# in that the plot is displaying with all components.
plt.draw()
# I'll put in a timer so that this doesn't go too fast, since
# the actual calculation is very complex.
time.sleep(0.01)
# Proxy for the actual calculation. Just overwrite with new
# random data in the overlapping ranges to show some change
# visually.
p1[360:800,:,1] = random.rand(440, ny)
p2[0:440,:,1] = random.rand(440, ny)
# In this version, it is trivial, but in the real simulation
# p1[:,:,1] does not end up equal to p1[:,:,0], so the following
# resets the simulation for the next timestep, overwriting the
# old values to avoid memory overflow from the p1 and p2 arrays
# being enormous.
# Copy new values into old values.
p1[:,:,0] = p1[:,:,1]
p2[:,:,0] = p2[:,:,1]
# This is just a repeat for the final frame.
plt.clf()
frame1 = p1[:,:,0]
frame1_colors = cmap1(norm(frame1))
plt.imshow(frame1_colors, alpha = 0.5)
frame2 = p2[:,:,0]
frame2_colors = cmap2(norm(frame2))
# As above, the ims.append uses tons of memory, the imshow alone works well.
ims.append([plt.imshow(frame2_colors, alpha = 0.5)])
# plt.imshow(frame2_colors, alpha = 0.5)
plt.draw()
anim = anim.ArtistAnimation(fig, ims, blit=True)
anim.save('test.mp4', fps=10, writer='avconv')
In the end, I decided the only reasonable way to do this was by rendering each frame I needed to disk as a .png and then generating a movie from the images afterwards with avconv.
Thanks for all the suggestions, but it looks like this is just a limitation due to the RAM usage of uncompressed images.
I have a 3D ndarry object, which contains spectral data (i.e. spatial xy dimensions, and an energy dimension). I would like to extract and plot the spectra from each individual pixel in a line plot. At present, I am doing this using np.ndenumerate along the axis I'm interested in, but it's quite slow. I was hoping to try np.apply_along_axis, to see if it was faster, but I keep getting a strange error.
What works:
# Setup environment, and generate sample data (much smaller than real thing!)
import numpy as np
import matplotlib.pyplot as plt
ax = range(0,10) # the scale to use when plotting the axis of interest
ar = np.random.rand(4,4,10) # the 3D data volume
# Plot all lines along axis 2 (i.e. the spectrum contained in each pixel)
# on a single line plot:
for (x,y) in np.ndenumerate(ar[:,:,1]):
plt.plot(ax,ar[x[0],x[1],:],alpha=0.5,color='black')
It is my understanding that this is basically a loop, which is less efficient than array-based methods, so I would like to try an approach using np.apply_along_axis, to see if it's faster. This is my first attempt at python, however, and am still finding out how it works, so please put me right if this idea is fundamentally flawed!
What I would like to try:
# define a function to pass to apply_along_axis
def pa(y,x):
if ~all(np.isnan(y)): # only do the plot if there is actually data there...
plt.plot(x,y,alpha=0.15,color='black')
return
# check that the function actually works...
pa(ar[1,1,:],ax) # should produce a plot - does for me :)
# try to apply to to the whole array, along the axis of interest:
np.apply_along_axis(pa,2,ar,ax) # does not work... booo!
The resulting error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-109-5192831ba03c> in <module>()
12 # pa(ar[1,1,:],ax)
13
---> 14 np.apply_along_axis(pa,2,ar,ax)
//anaconda/lib/python2.7/site-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, *args)
101 holdshape = outshape
102 outshape = list(arr.shape)
--> 103 outshape[axis] = len(res)
104 outarr = zeros(outshape, asarray(res).dtype)
105 outarr[tuple(i.tolist())] = res
TypeError: object of type 'NoneType' has no len()
Any ideas whats going wrong here/advice on how to do this better would be great.
Thanks!
apply_along_axis creates a new array from the output of your function.
You're returning None (by not returning anything). Thus the error. Numpy checks the length of the returned output to see if it makes sense for the new array.
Because you're not constructing a new array from the results, there's no reason to use apply_along_axis. It's not going to be any faster.
However, your current ndenumerate statement is exactly equivalent to:
import numpy as np
import matplotlib.pyplot as plt
ar = np.random.rand(4,4,10) # the 3D data volume
plt.plot(ar.reshape(-1, 10).T, alpha=0.5, color='black')
In general, you probably want to do something like:
for pixel in ar.reshape(-1, ar.shape[-1]):
plt.plot(x_values, pixel, ...)
That way you can easily iterate over the spectra at each pixel in your hyperspectral array.
You bottleneck here probably isn't how you're using the array. Plotting each line separately with identical parameters like this in matplotlib is going to be somewhat inefficient.
It will take slightly longer to construct, but a LineCollection will render much faster. (Basically, using a LineCollection tells matplotlib to not bother checking what the properties of each line are, and just pass them all to the low-level renderer to be drawn in the same way. You bypass a bunch of individual draw calls in favor of a single draw of a large object.)
On the downside, the code will be a bit less readable.
I'll add an example in a bit.