This is a very general question.
I have a series of data with a quantity (y) versus time (x). It is a very long series and the data are sometimes pretty noisy, some times better.
I would like to write a python code that allows me to take a look at these data with a given x-range per time (just a snapshot, so to say), and then allow me to decide if I want to "store" the sequence or not. Then pass to the next sequence and do the same, and so on. So at the end I will have a stacked amount of sequences that I can analyze separately.
I need some suggestions about the graphical part: I don't have a clue of which modules I need.
Matplotlib is probably one of the best options for the graphical part. For example:
import numpy as np
import matplotlib.pyplot as plt
plt.ion()
# make some data of some size
nblocks = 10
block_size = 1000
size = block_size*nblocks
data = np.random.normal(0.,1.,size=size)
# create a matplotlib figure with some plotting axes
fig = plt.figure()
ax = fig.add_subplot(111)
# display the figure
plt.show()
# storage for blocks to keep
kept_blocks = []
for block in data.reshape(nblocks,block_size):
#plot the block
ax.plot(block)
#force matplotlib to rerender
plt.draw()
# ask user for some input
answer = raw_input("Keep block? [Y/n]")
if answer.lower() != "n":
kept_blocks.append(block)
#clear the plotting axes
ax.cla()
# turn the kept blocks into a 2D array
kept_blocks = np.vstack(kept_blocks)
#or a 1D array
#kept_blocks = np.hstack(kept_blocks)
Matplotlib is well supported and is the de facto plotting standard in python.
Related
I'm trying to plot a number of segments of a timeseries, let's say 5 segments.
I want each segment to be plotted individually and one after another after a given input (key press)
For example, 1) plot first segment, 2) wait for input and only after my input 3) plot next segment. I need python to wait for an input (key press) before plotting the next segment.
I've manged to almost make it work, but on jupyter notebook all figures are displayed at once only after I input something for all the plots (i.e. 5 inputs)
segments = segments.iloc[0:5] # reduced number for testing
list = []
for i in segments.itertuples(): # loop over df
f, ax = plt.subplots()
ax.plot(time, yy) # plot timeseries
plt.xlim([segments.start_time, segments.end_time]) # only show between limits
plt.show()
# get user input
a = input()
list.append(a) # add input to the list
I've been banging my head but haven't managed to solve this. Any suggestion on how to solve this issue?
I have one that works from adapting an example I had used before, but note that I don't use subplot here!:
import matplotlib.pyplot as plt
inp_ = []
for i in range(3):
labels = ['part_1','part_2','part_3']
pie_portions = [5,6,7]
plt.pie(pie_portions,labels=labels,autopct = '%1.1f%%')
plt.title(f'figure_no : {i+1}')
plt.show()
# get user input
a = input()
inp_.append(a) # add input to the list
If you use subplot, then you get what you are seeing where it waits to show them all at the end because the figure is only complete and available to display after the last subplot is specified. Otherwise it is blocked. The easiest solution is to switch away from using subplots, like in my block of code posted above.
If you needed it to absolutely work with subplot, you can in fact update the figure after, like so;
#Using subplots based on https://matplotlib.org/stable/gallery/pie_and_polar_charts/pie_demo2.html
import matplotlib.pyplot as plt
import numpy as np
def update_subplot():
'''
based on https://stackoverflow.com/a/36279629/8508004
'''
global fig, axs
ax_list = axs.ravel()
# ax_list[0] refers to the first subplot
ax_list[1].imshow(np.random.randn(100, 100))
#plt.draw()
# Some data
labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'
fracs = [15, 30, 45, 10]
# Make figure and axes
fig, axs = plt.subplots(1, 3)
# A standard pie plot
axs[0].pie(fracs, labels=labels, autopct='%1.1f%%', shadow=True)
axs[1].axis('off') # based on https://stackoverflow.com/a/10035974/8508004
axs[2].axis('off')
plt.show()
import time
time.sleep(2)
update_subplot()
fig
However, if you run that, you'll see you get successive views with one plot and then two and the first (with just one of two subplots) stays around in the notebook output and so it is less than desirable.
Always best to provide a minimal reproducible example when posting your question. That way you get something close to what works for your case.
Also, it is a bad idea to use a built-in type as a name of variable. (list = []) It can lead to errors you aren't expecting later. Imagine you wanted to typecast a set back to a list later in your code example.
Compare:
list = []
my_set= {1,2,3}
a = list(my_set)
to
my_list = []
my_set= {1,2,3}
a = list(my_set)
The first will give TypeError: 'list' object is not callable.
I am on python 3.7. I am trying to read data from a serial port, it would be 7 different bytes. Then I would like to plot each different byte on a different subplot. I want to read the serial port every 500ms and each time I read add the new data to the subplots. Every read is giving one more data to plot on every subplot. That's basically sensor reading.
Here is the code I have written:
from time import sleep
import serial
import matplotlib.pyplot as plt
f=plt.figure(1)
ax=[0 for x in range(7)]
for i in range(0,7):
ax[i]=f.add_subplot(4,2,1+i)
ser = serial.Serial('COM3', 115200) # Establish the connection on a specific port
counter = 0
byte=ser.readline() #first line not to be plotted
while True:
counter +=1
ser.write(b'9') # send a command to the arduino
byte=ser.read(7) #read 7 bytes back
for i in range(0,7):
ax[i].plot(counter, byte[i]) # Trying to plot the new values to each different subplots
plt.pause(0.01)
sleep(.5) # Delay for one half of a second
The figure is showing and the x axis and y axis are adapting to the value I want to plt but there is no data at all on the plot. If I use scatter instead of plot it works, but then it is less versatile and I can't draw te type of graph I want.
I also try to reproduce the problem without using a serial data but just displaying points of a list one after the other like that:
import matplotlib.pyplot as plt
from time import sleep
f=plt.figure()
series=[[4,3,2,1],[8,7,6,5],[12,11,10,9]]
counter=0
ax=[0 for x in range(7)]
for i in range(0,3):
ax[i]=f.add_subplot(4,2,1+i)
for j in range (0,4):
counter=counter+1
for i in range(0,3):
ax[i].plot(counter,series[i][j])
plt.pause(0.01)
sleep(1)
And it is doing exactly the same thing, the final image I have on the graph is that:
Which shows axis took what I wanted to plot but did not plot anything.
The point is I do not want to clear the full plot and redraw everything because for the data sensor I will have about 30days of data to display in continuous.
What am I doing wrong with the code I have written?
EDIT:
After comment of ImportanceOfBeingErnest I have tried implementing the answer given here. The code is then:
from time import sleep
import serial
import matplotlib.pyplot as plt
import numpy
plt.ion()
f=plt.figure()
ax=[0 for x in range(7)]
lines=[0 for x in range(7)]
for i in range(0,7):
ax[i]=f.add_subplot(4,2,1+i)
lines[i]=ax[0].plot([],[])
def update_line(hl, new_datax, new_datay):
hl.set_xdata(numpy.append(hl.get_xdata(), new_datax))
hl.set_ydata(numpy.append(hl.get_ydata(), new_datay))
plt.draw()
ser = serial.Serial('COM3', 115200) # Establish the connection on a specific port
counter = 0
byte=ser.readline() #first line not to be plotted
while True:
counter +=1
ser.write(b'9') # send a command to the arduino
byte=ser.read(7) #read 7 bytes back
for i in range(0,7):
update_line(lines[i][0], counter, byte[i]) # Trying to plot the new values to each different subplots
plt.pause(0.01)
sleep(.5) # Delay for one half of a second
But it still does not show anything. I kind of guess I am missing a plot and/or clear somewhere but after trying several options can't get it to work.
As someone who worked in an optics lab and struggled to get Matplotlib to perform real-time plotting, I feel your pain and I strongly suggest choosing something other than Matplotlib for this purpose (such as pyqtgraph).
That said, I've gotten Matplotlib to perform some real-time plotting from sensor data. I've found it to be buggy. Here are some thoughts as well as a solution that uses matplotlib:
Use dictionaries where possible.
Why? Because accessing dictionaries is fast, and I find that a dictionary key is easier to use than a list index for these purposes.
Use lists instead of NumPy arrays.
Why? Because every time you resize or append a NumPy array it must be completely rewritten as a new object in memory. This is very costly. Lists can be resized and appended for negligible cost.
The code below uses random data to simulate incoming sensor data and to make troubleshooting easier.
1. Imports
from time import sleep
import matplotlib.pyplot as plt
import numpy as np
#import serial
2. Setup your matplotlib objects and data containers
# specify how many points to show on the x-axis
xwidth = 10
# use real-time plotting
plt.ion()
# setup each of the subplots
ax = []
fig, ax[0:7] = plt.subplots(7, 1, sharex=False, sharey=False)
# set up each of the lines/curves to be plotted on their respective subplots
lines = {index: Axes_object.plot([],[])[0] for index, Axes_object in enumerate(ax)}
# cache background of each plot for fast re-drawing, AKA Blit
ax_bgs = {index: fig.canvas.copy_from_bbox(Axes_object.bbox)
for index, Axes_object in enumerate(ax)}
# initial drawing of the canvas
fig.canvas.draw()
# setup variable to contain incoming serial port data
y_data = {index:[0] for index in range(len(ax))}
x_data = [-1]
3. Write functions for update the plot and for updating your data containers
def update_data(new_byte, ):
x_data.append(x_data[-1] + 1)
for i, val in enumerate(new_byte):
y_data[i].append(val)
def update_graph():
for i in y_data.keys():
# update each line object
lines[i].set_data(x_data, y_data[i])
# try to set new axes limits
try:
ax[i].set_xlim([x_data[-1] - xwidth, x_data[-1]])
if max(y_data[i][-xwidth:]) > ax[i].get_ylim()[1]:
new_min = min(y_data[i][-xwidth:])
new_max = max(y_data[i][-xwidth:])
ax[i].set_ylim([new_min-abs(new_min)*0.2, new_max+abs(new_max)*0.2])
except:
continue
fig.canvas.draw()
4. Finally, run the loop
#ser = serial.Serial('COM3', 115200) # Establish the connection on a specific port
#byte=ser.readline() #first line not to be plotted
while x_data[-1] < 30:
# ser.write(b'9') # send a command to the arduino
# byte=ser.read(7) #read 7 bytes back
byte = np.random.rand(7)
update_data(byte)
update_graph()
sleep(.1) # Delay for an arbitrary amount of time
I hope that helps.
I've got a sensor that I'm collecting data for. Basically, whether it's on or off depends on the distance it is from what it's designed to sense. So, I've got this little rig on wheels that I'm using to move the sensor towards the material, and then pulling it away...Then pushing it back and pulling it away again. Imagine the motion you would make when vacuuming.
So, anyway, When I'm far away, the sensor is off...When I'm really close, the sensor is on. No issues there.
What I'm interested in is that grey-zone, sweet-spot, butter-zone, whatever you want to call it where sometimes it turns on and maybe on the next pass, it turns on 5mm closer.
I'd like to encode that somehow in this graph of the data that I'm plotting with MatplotLib. My thought was to heat map the transitions based on the frequency they occur in the data?
As you can see, it's pretty boring and uninteresting. I'd like to make it look prettier and fancier by incorporating some other mechanism for conveying the variance of the transitions. At this point I might have transitioned from high to low and low to high at distance 15 1,000 times, and all the other transitions could have happened once...there's no way to tell.
How can I add that kind of mapping to this kind of data?
And here's the python script...pretty simple stuff:
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt('/home/dale/test1.txt', delimiter=',', skip_header=10, names=['x','y'])
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Distance vs Level")
ax1.set_xlabel('Distance')
ax1.set_ylabel('Logic Level')
ax1.step(data['x'],data['y'], c='r', label='the data')
leg = ax1.legend()
plt.show()
I disagree with Thomas' comment. The graph has several issues that are immediately apparent from the data.
1) Your distance measures are discrete, whereas you are plotting a continuous line. The graph should not hide that.
2) As the data is continuous, the line ends up being on top of itself a lot. That hides the true frequencies.
Instead of your suggested heatmap, I think that you just need a barplot:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# grab data
df = pd.read_csv('sensor_data.csv')
position = df.Counter
status = df.Value
# determine fraction on;
# I will use bincount for this, which only accepts positive arguments
min_pos = position.min()
position -= min_pos
# determine fraction on
counts_on = np.bincount(position[status==1], minlength=position.max()+1)
counts_all = np.bincount(position, minlength=position.max()+1)
fraction_on = counts_on.astype(float) / counts_all
# plot
x = np.arange(min_pos, position.max()+1+min_pos)
fig, ax = plt.subplots(1,1)
ax.bar(x-0.5, fraction_on, width=1., facecolor='gray', edgecolor=None)
ax.set_xlabel('Distance')
ax.set_ylabel(r'Logic Level $\frac{ON}{ON + OFF}$')
I have a 100.000.000 sample dataset and I want to make a histogram with pyplot. But reading this large file drains my memory critically (cursor not moving anymore, ...), so I'm looking for ways to 'help' pyplot.hist. I was thinking breaking up the file into several smaller files might help. But I wouldn't know how to combine them afterwards.
you can combine the output of pyplot.hist, or as #titusjan suggested numpy.histogram, as long as you keep your bins fixed each time you call it. For example:
import matplotlib.pyplot as plt
import numpy as np
# Generate some fake data
data=np.random.rand(1000)
# The fixed bins (change depending on your data)
bins=np.arange(0,1.1,0.1)
sub_hist = [], []
# Split into 10 sub histograms
for i in np.arange(0,1000,10):
sub_hist_temp, bins_out = np.histogram(data[i:i+10],bins=bins)
sub_hist.append(sub_hist_temp)
# Sum the histograms
hist_sum = np.array(sub_hist).sum(axis=0)
# Plot the new summed data, using plt.bar
fig=plt.figure()
ax1=fig.add_subplot(211)
ax1.bar(bins[:-1],hist_sum,width=0.1) # Change width depending on your bins
# Plot the histogram of all data to check
ax2=fig.add_subplot(212)
hist_all, bins_out, patches = all=ax2.hist(data,bins=bins)
fig.savefig('histsplit.png')
I am looping through a bunch of CSV files containing various measurements.
Each file might be from one of 4 different data sources.
In each file, I merge the data into monthly datasets, that I then plot in a 3x4 grid. After this plot has been saved, the loop moves on and does the same to the next file.
This part I got figured out, however I would like to add a visual clue to the plots, as to what data it is. As far as I understand it (and tried it)
plt.subplot(4,3,1)
plt.hist(Jan_Data,facecolor='Red')
plt.ylabel('value count')
plt.title('January')
does work, however this way, I would have to add the facecolor='Red' by hand to every 12 subplots. Looping through the plots wont work for this situation, since I want the ylabel only for the leftmost plots, and xlabels for the bottom row.
Setting facecolor at the beginning in
fig = plt.figure(figsize=(20,15),facecolor='Red')
does not work, since it only changes the background color of the 20 by 15 figure now, which subsequently gets ignored when I save it to a PNG, since it only gets set for screen output.
So is there just a simple setthecolorofallbars='Red' command for plt.hist(… or plt.savefig(… I am missing, or should I just copy n' paste it to all twelve months?
You can use mpl.rc("axes", color_cycle="red") to set the default color cycle for all your axes.
In this little toy example, I use the with mpl.rc_context block to limit the effects of mpl.rc to just the block. This way you don't spoil the default parameters for your whole session.
import matplotlib as mpl
import matplotlib.pylab as plt
import numpy as np
np.random.seed(42)
# create some toy data
n, m = 2, 2
data = []
for i in range(n*m):
data.append(np.random.rand(30))
# and do the plotting
with mpl.rc_context():
mpl.rc("axes", color_cycle="red")
fig, axes = plt.subplots(n, m, figsize=(8,8))
for ax, d in zip(axes.flat, data):
ax.hist(d)
The problem with the x- and y-labels (when you use loops) can be solved by using plt.subplots as you can access every axis seperately.
import matplotlib.pyplot as plt
import numpy.random
# creating figure with 4 plots
fig,ax = plt.subplots(2,2)
# some data
data = numpy.random.randn(4,1000)
# some titles
title = ['Jan','Feb','Mar','April']
xlabel = ['xlabel1','xlabel2']
ylabel = ['ylabel1','ylabel2']
for i in range(ax.size):
a = ax[i/2,i%2]
a.hist(data[i],facecolor='r',bins=50)
a.set_title(title[i])
# write the ylabels on all axis on the left hand side
for j in range(ax.shape[0]):
ax[j,0].set_ylabel(ylabel[j])
# write the xlabels an all axis on the bottom
for j in range(ax.shape[1]):
ax[-1,j].set_xlabel(xlabels[j])
fig.tight_layout()
All features (like titles) which are not constant can be put into arrays and placed at the appropriate axis.