Python - Graphing contents of mutliple files - python

I have lists of ~10 corresponding input files containing columns of tab separated data approx 300 lines/datapoints each.
I'm looking to plot the contents of each set of data such that I have a 2 plots for each set of data one is simply of x vs (y1,y2,y3,...) and one which is transformed by a function e.g. x vs (f(y1), f(y2),f(y3),...).
I am not sure of the best way to achieve it, I thought about using a simple array of filenames then couldn't work out how to store them all without overwriting the data - something like this:
import numpy as np
import matplotlib.pyplot as plt
def ReadDataFile(file):
print (file)
x,y = np.loadtxt(file, unpack=True, usecols=(8,9))
return x, y
inputFiles = ['data1.txt','data2.txt','data2.txt',...]
for file in inputFiles:
x1,y1 = ReadDataFile(file) ## ? ##
p1,q1 = function(x1,y1) ## ? ##
plt.figure(1)
plt.plot(x1,y1)
plt.plot(x2,y2)
...
# plt.savefig(...)
plt.figure(2)
plt.plot(p1,q1)
plt.plot(p2,q2)
...
# plt.savefig(...)
plt.show()
I guess my question is how to best read and store all the data and maintain tha ability to access it without needing to put all the code in the readloop. Can I read two data sets into a list of pairs? Is that a thing in Python? if so, how do I access them?
Thanks in advance for any help!
Best regards!

Basically, I think you should put all your code in the readloop, because that will work easily. There's a slightly different way of using matplotlib that makes it easy to use the existing organization of your data AND write shorter code. Here's a toy, but complete, example:
import matplotlib.pyplot as plt
from numpy.random import random
fig, axs = plt.subplots(2)
for c in 'abc': # In your case, for filename in [file-list]:
x, y = random((2, 5))
axs[0].plot(x, y, label=c) # filename instead of c in your case
axs[1].plot(x, y**2, label=c) # Plot p(x,y), q(x,y) in your case
axs[0].legend() # handy to get this from the data list
fig.savefig('two_plots.png')
You can also create two figures and plot into each of them explicitly, if you need them in different files for page layout, etc:
import matplotlib.pyplot as plt
from numpy.random import random
fig1, ax1 = plt.subplots(1)
fig2, ax2 = plt.subplots(1)
for c in 'abc': # or, for filename in [file-list]:
x, y = random((2, 5))
ax1.plot(x, y, label=c)
ax2.plot(x, y**2, label=c)
ax1.legend()
ax2.legend()
fig1.savefig('two_plots_1.png')
fig2.savefig('two_plots_2.png')

Related

Proper Matplotlib axes construction / reuse

I currently am building a set of scatter plot charts using pandas plot.scatter. In this construction off of two base axes.
My current construction looks akin to
ax1 = pandas.scatter.plot()
ax2 = pandas.scatter.plot(ax=ax1)
for dataframe in list:
output_ax = pandas.scatter.plot(ax2)
output_ax.get_figure().save("outputfile.png")
total_output_ax = total_list.scatter.plot(ax2)
total_output_ax.get_figure().save("total_output.png")
This seems inefficient. For 1...N permutations I want to reuse a base axes that has 50% of the data already plotted. What I am trying to do is:
Add base data to scatter plot
For item x in y: (save data to base scatter and save image)
Add all data to scatter plot and save image
here's one way to do it with plt.scatter.
I plot column 0 on x-axis, and all other columns on y axis, one at a time.
Notice that there is only 1 ax object, and I don't replot all points, I just add points using the same axes with a for loop.
Each time I get a corresponding png image.
import numpy as np
import pandas as pd
np.random.seed(2)
testdf = pd.DataFrame(np.random.rand(20,4))
testdf.head(5) looks like this
0 1 2 3
0 0.435995 0.025926 0.549662 0.435322
1 0.420368 0.330335 0.204649 0.619271
2 0.299655 0.266827 0.621134 0.529142
3 0.134580 0.513578 0.184440 0.785335
4 0.853975 0.494237 0.846561 0.079645
#I put the first axis out of a loop, that can be in the loop as well
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(testdf[0],testdf[1], color='red')
fig.legend()
fig.savefig('fig_1.png')
colors = ['pink', 'green', 'black', 'blue']
for i in range(2,4):
ax.scatter(testdf[0], testdf[i], color=colors[i])
fig.legend()
fig.savefig('full_' + str(i) + '.png')
Then you get these 3 images (fig_1, fig_2, fig_3)
Axes objects cannot be simply copied or transferred. However, it is possible to set artists to visible/invisible in a plot. Given your ambiguous question, it is not fully clear how your data are stored but it seems to be a list of dataframes. In any case, the concept can easily be adapted to different input data.
import matplotlib.pyplot as plt
#test data generation
import pandas as pd
import numpy as np
rng = np.random.default_rng(123456)
df_list = [pd.DataFrame(rng.integers(0, 100, (7, 2))) for _ in range(3)]
#plot all dataframes into an axis object to ensure
#that all plots have the same scaling
fig, ax = plt.subplots()
patch_collections = []
for i, df in enumerate(df_list):
pc = ax.scatter(x=df[0], y=df[1], label=str(i))
pc.set_visible(False)
patch_collections.append(pc)
#store individual plots
for i, pc in enumerate(patch_collections):
pc.set_visible(True)
ax.set_title(f"Dataframe {i}")
fig.savefig(f"outputfile{i}.png")
pc.set_visible(False)
#store summary plot
[pc.set_visible(True) for pc in patch_collections]
ax.set_title("All dataframes")
ax.legend()
fig.savefig(f"outputfile_0_{i}.png")
plt.show()

How to save multiple plots in a folder

Here is my program in python and I am trying to save multiple plots in a single folder but it doesn't seem to work. How could I do this please?
for i in range(0:244):
plt.figure()
y = numpy.array(Data_EMG[i,:])
x = pylab.linspace(EMG_start, EMG_stop, Amount_samples)
plt.xlabel('Time(ms)')
plt.ylabel('EMG voltage(microV)')
pylab.plot(x, y)
pylab.show(block=True)
You can use the savefig function.
for i in range(0:244):
plt.figure()
y = numpy.array(Data_EMG[i,:])
x = pylab.linspace(EMG_start, EMG_stop, Amount_samples)
plt.xlabel('Time(ms)')
plt.ylabel('EMG voltage(microV)')
plt.savefig('EMG {0}.jpg'.format(i))
plt.close()
First of all check the identation. Hopefully your code actually reads
for i in range(0:244):
plt.figure()
y = numpy.array(Data_EMG[i,:])
x = pylab.linspace(EMG_start, EMG_stop, Amount_samples)
plt.xlabel('Time(ms)')
plt.ylabel('EMG voltage(microV)')
pylab.plot(x, y)
pylab.show(block=True)
At each iteration you completely generate a new figure. That´s very ineffective. Also you just plot your figure on the screen and not actually save it. Better is
from os import path
data = numpy.array(Data_EMG) # convert complete dataset into numpy-array
x = pylab.linspace(EMG_start, EMG_stop, Amount_samples) # doesn´t change in loop anyway
outpath = "path/of/your/folder/"
fig, ax = plt.subplots() # generate figure with axes
image, = ax.plot(x,data[0]) # initialize plot
ax.xlabel('Time(ms)')
ax.ylabel('EMG voltage(microV)')
plt.draw()
fig.savefig(path.join(outpath,"dataname_0.png")
for i in range(1, len(data)):
image.set_data(x,data[i])
plt.draw()
fig.savefig(path.join(outpath,"dataname_{0}.png".format(i))
Should be much faster.

How to plot a CVS file with python? My plot comes up blank

I have the code below that seems to run without issues until I try to plot it. A blank plot will show when asked to plot.
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt('/home/oem/Documents/620157.csv', delimiter=',', skip_header=01, skip_footer=01, names=['x', 'y'])
plt.plot(data,'o-')
plt.show()
I'm not sure what your data looks like, but I believe you need to do something like this:
data = np.genfromtxt('/home/oem/Documents/620157.csv',
delimiter=',',
skip_header=1,
skip_footer=1)
name, x, y, a, b = zip(*data)
plt.plot(x, y, 'o-')
As per your comment, the data is currently an array containing tuples of the station name and the x and y data. Using zip with the * symbol assigns them back to individual variables which can then be used for plotting.

How to assign a plot to a variable and use the variable as the return value in a Python function

I am creating two Python scripts to produce some plots for a technical report. In the first script I am defining functions that produce plots from raw data on my hard-disk. Each function produces one specific kind of plot that I need. The second script is more like a batch file which is supposed to loop around those functions and store the produced plots on my hard-disk.
What I need is a way to return a plot in Python. So basically I want to do this:
fig = some_function_that_returns_a_plot(args)
fig.savefig('plot_name')
But what I do not know is how to make a plot a variable that I can return. Is this possible? Is so, how?
You can define your plotting functions like
import numpy as np
import matplotlib.pyplot as plt
# an example graph type
def fig_barh(ylabels, xvalues, title=''):
# create a new figure
fig = plt.figure()
# plot to it
yvalues = 0.1 + np.arange(len(ylabels))
plt.barh(yvalues, xvalues, figure=fig)
yvalues += 0.4
plt.yticks(yvalues, ylabels, figure=fig)
if title:
plt.title(title, figure=fig)
# return it
return fig
then use them like
from matplotlib.backends.backend_pdf import PdfPages
def write_pdf(fname, figures):
doc = PdfPages(fname)
for fig in figures:
fig.savefig(doc, format='pdf')
doc.close()
def main():
a = fig_barh(['a','b','c'], [1, 2, 3], 'Test #1')
b = fig_barh(['x','y','z'], [5, 3, 1], 'Test #2')
write_pdf('test.pdf', [a, b])
if __name__=="__main__":
main()
If you don't want the picture to be displayed and only get a variable in return, then you can try the following (with some additional stuff to remove axis):
def myplot(t,x):
fig = Figure(figsize=(2,1), dpi=80)
canvas = FigureCanvasAgg(fig)
ax = fig.add_subplot()
ax.fill_between(t,x)
ax.autoscale(tight=True)
ax.axis('off')
canvas.draw()
buf = canvas.buffer_rgba()
X = np.asarray(buf)
return X
The returned variable X can be used with OpenCV for example and do a
cv2.imshow('',X)
These import must be included:
from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvasAgg
The currently accepted answer didn't work for me as such, as I was using scipy.stats.probplot() to plot. I used matplotlib.pyplot.gca() to access an Axes instance directly instead:
"""
For my plotting ideas, see:
https://pythonfordatascience.org/independent-t-test-python/
For the dataset, see:
https://github.com/Opensourcefordatascience/Data-sets
"""
# Import modules.
from scipy import stats
import matplotlib.pyplot as plt
import pandas as pd
from tempfile import gettempdir
from os import path
from slugify import slugify
# Define plot func.
def get_plots(df):
# plt.figure(): Create a new P-P plot. If we're inside a loop, and want
# a new plot for every iteration, this is important!
plt.figure()
stats.probplot(diff, plot=plt)
plt.title('Sepal Width P-P Plot')
pp_p = plt.gca() # Assign an Axes instance of the plot.
# Plot histogram. This uses pandas.DataFrame.plot(), which returns
# an instance of the Axes directly.
hist_p = df.plot(kind = 'hist', title = 'Sepal Width Histogram Plot',
figure=plt.figure()) # Create a new plot again.
return pp_p, hist_p
# Import raw data.
df = pd.read_csv('https://raw.githubusercontent.com/'
'Opensourcefordatascience/Data-sets/master//Iris_Data.csv')
# Subset the dataset.
setosa = df[(df['species'] == 'Iris-setosa')]
setosa.reset_index(inplace= True)
versicolor = df[(df['species'] == 'Iris-versicolor')]
versicolor.reset_index(inplace= True)
# Calculate a variable for analysis.
diff = setosa['sepal_width'] - versicolor['sepal_width']
# Create plots, save each of them to a temp file, and show them afterwards.
# As they're just Axes instances, we need to call get_figure() at first.
for plot in get_plots(diff):
outfn = path.join(gettempdir(), slugify(plot.title.get_text()) + '.png')
print('Saving a plot to "' + outfn + '".')
plot.get_figure().savefig(outfn)
plot.get_figure().show()

Get the list of figures in matplotlib

I would like to:
pylab.figure()
pylab.plot(x)
pylab.figure()
pylab.plot(y)
# ...
for i, figure in enumerate(pylab.MagicFunctionReturnsListOfAllFigures()):
figure.savefig('figure%d.png' % i)
What is the magic function that returns a list of current figures in pylab?
Websearch didn't help...
Pyplot has get_fignums method that returns a list of figure numbers. This should do what you want:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(100)
y = -x
plt.figure()
plt.plot(x)
plt.figure()
plt.plot(y)
for i in plt.get_fignums():
plt.figure(i)
plt.savefig('figure%d.png' % i)
The following one-liner retrieves the list of existing figures:
import matplotlib.pyplot as plt
figs = list(map(plt.figure, plt.get_fignums()))
Edit: As Matti Pastell's solution shows, there is a much better way: use plt.get_fignums().
import numpy as np
import pylab
import matplotlib._pylab_helpers
x=np.random.random((10,10))
y=np.random.random((10,10))
pylab.figure()
pylab.plot(x)
pylab.figure()
pylab.plot(y)
figures=[manager.canvas.figure
for manager in matplotlib._pylab_helpers.Gcf.get_all_fig_managers()]
print(figures)
# [<matplotlib.figure.Figure object at 0xb788ac6c>, <matplotlib.figure.Figure object at 0xa143d0c>]
for i, figure in enumerate(figures):
figure.savefig('figure%d.png' % i)
This should help you (from the pylab.figure doc):
call signature::
figure(num=None, figsize=(8, 6),
dpi=80, facecolor='w', edgecolor='k')
Create a new figure and return a
:class:matplotlib.figure.Figure
instance. If num = None, the
figure number will be incremented and
a new figure will be created.** The
returned figure objects have a
number attribute holding this number.
If you want to recall your figures in a loop then a good aproach would be to store your figure instances in a list and to call them in the loop.
>> f = pylab.figure()
>> mylist.append(f)
etc...
>> for fig in mylist:
>> fig.savefig()
Assuming you haven't manually specified num in any of your figure constructors (so all of your figure numbers are consecutive) and all of the figures that you would like to save actually have things plotted on them...
import matplotlib.pyplot as plt
plot_some_stuff()
# find all figures
figures = []
for i in range(maximum_number_of_possible_figures):
fig = plt.figure(i)
if fig.axes:
figures.append(fig)
else:
break
Has the side effect of creating a new blank figure, but better if you don't want to rely on an unsupported interface
I tend to name my figures using strings rather than using the default (and non-descriptive) integer. Here is a way to retrieve that name and save your figures with a descriptive filename:
import matplotlib.pyplot as plt
figures = []
figures.append(plt.figure(num='map'))
# Make a bunch of figures ...
assert figures[0].get_label() == 'map'
for figure in figures:
figure.savefig('{0}.png'.format(figure.get_label()))

Categories

Resources