I want to create a python programm that is able to plot multiple graphs into one PDF file, however the number of subplots is variable. I did this already with one plot per page. However, since i got someteimes arround 100 plots that makes a lot of scrolling and is not really clearly shown. Therefore I would like to get like 5X4 subpltots per page.
I wrote code for that alreaedy, the whole code is long and since im very new to pyhton it looks terrible to someone who knows what to do, however the ploting part looks like this:
rows = (len(tags))/5
fig = plt.figure()
count = 0
for keyInTags in tags:
count = count + 1
ax = fig.add_subplot(int(rows), 5, count)
ax.set_title("cell" + keyInTags)
ax.plot(x, y_green, color='k')
ax.plot(x, y_red, color='k')
plt.subplots_adjust(hspace=0.5, wspace=0.3)
pdf.savefig(fig)
The idea is that i get an PDF with all "cells" (its for biological research) ploted. The code I wrote is working fine so far, however if I got more than 4 rows of subplots I would like to do a "pageprake". In some cases i got over 21 rows on one page, that makes it impossible to see anything.
So, is there a solution to, for example, tell Python to do a page break after 4 rows? In the case with 21 rows id like to have 6 pages with nice visible plots. Or is it done by doing 5x4 plots and then iterating somehow over the file?
I would be really happy if someone could help a little or give a hint. Im sitting here since 4 hours, not finding a solution.
A. Loop over pages
You could find out how many pages you need (npages) and create a new figure per page.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
tags = ["".join(np.random.choice(list("ABCDEFG123"), size=5)) for _ in range(53)]
N = len(tags) # number of subplots
nrows = 5 # number of rows per page
ncols = 4 # number of columns per page
# calculate number of pages needed
npages = N // (nrows*ncols)
if N % (nrows*ncols) > 0:
npages += 1
pdf = PdfPages('out2.pdf')
for page in range(npages):
fig = plt.figure(figsize=(8,11))
for i in range(min(nrows*ncols, N-page*(nrows*ncols))):
# Your plot here
count = page*ncols*nrows+i
ax = fig.add_subplot(nrows, ncols, i+1)
ax.set_title(f"{count} - {tags[count]}")
ax.plot(np.cumsum(np.random.randn(33)))
# end of plotting
fig.tight_layout()
pdf.savefig(fig)
pdf.close()
plt.show()
B. Loop over data
Or alternatively you could loop over the tags themselves and create a new figure once it's needed:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
tags = ["".join(np.random.choice(list("ABCDEFG123"), size=5)) for _ in range(53)]
nrows = 5 # number of rows per page
ncols = 4 # number of columns per page
pdf = PdfPages('out2.pdf')
for i, tag in enumerate(tags):
j = i % (nrows*ncols)
if j == 0:
fig = plt.figure(figsize=(8,11))
ax = fig.add_subplot(nrows, ncols,j+1)
ax.set_title(f"{i} - {tags[i]}")
ax.plot(np.cumsum(np.random.randn(33)))
# end of plotting
if j == (nrows*ncols)-1 or i == len(tags)-1:
fig.tight_layout()
pdf.savefig(fig)
pdf.close()
plt.show()
You can use matplotlib's PdfPages as follows.
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np
pp = PdfPages('multipage.pdf')
x=np.arange(1,10)
y=np.arange(1,10)
fig=plt.figure()
ax1=fig.add_subplot(211)
# ax1.set_title("cell" + keyInTags)
# ax1.plot(x, y, color='k')
# ax.plot(x, y_red, color='k')
ax2=fig.add_subplot(212)
pp.savefig(fig)
fig2=plt.figure()
ax1=fig2.add_subplot(321)
ax1.plot(x, y, color='k')
ax2=fig2.add_subplot(322)
ax2.plot(x, y, color='k')
ax3=fig2.add_subplot(313)
pp.savefig(fig2)
pp.close()
Play with these subplot numbers a little bit, so you would understand how to handle which graph goes where.
Related
I have a dataset containing 10 features and corresponding labels. I am using scatterplot to plot distinct pair of features to see which of them describe the labels perfectly (which means that total 45 plots will be created). In order to do that, I used a nested loop format. The code shows no error and I obtained all the plots as well. However, there is clearly something wrong with the code because each new scatterplot that gets created and saved is accumulating points from the previous ones as well. I am attaching the complete code which I used. How to fix this problem? Below is the link for raw dataset:
https://github.com/IITGuwahati-AI/Learning-Content/raw/master/Phase%203%20-%202020%20(Summer)/Week%201%20(Mar%2028%20-%20Apr%204)/assignment/data.txt
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
data_url ='https://raw.githubusercontent.com/diwakar1412/Learning-Content/master/DiwakarDas_184104503/datacsv.csv'
df = pd.read_csv(data_url)
df.head()
def transform_label(value):
if value >= 2:
return "BLUE"
else:
return "RED"
df["Label"] = df.Label.apply(transform_label)
df.head()
colors = {'RED':'r', 'BLUE':'b'}
fig, ax = plt.subplots()
for i in range(1,len(df.columns)):
for j in range(i+1,len(df.columns)):
for k in range(len(df[str(i)])):
ax.scatter(df[str(i)][k], df[str(j)][k], color=colors[df['Label'][k]])
ax.set_title('F%svsF%s' %(i,j))
ax.set_xlabel('%s' %i)
ax.set_ylabel('%s' %j)
plt.savefig('F%svsF%s' %(i,j))
Dataset
You have to create a new figure each time. Try to put
fig, ax = plt.subplots()
inside your loop:
for i in range(1,len(df.columns)):
for j in range(i+1,len(df.columns)):
fig, ax = plt.subplots() # <-------------- here
for k in range(len(df[str(i)])):
ax.scatter(df[str(i)][k], df[str(j)][k], color=colors[df['Label'][k]])
ax.set_title('F%svsF%s' %(i,j))
ax.set_xlabel('%s' %i)
ax.set_ylabel('%s' %j)
plt.savefig('/Users/Alessandro/Desktop/tmp/F%svsF%s' %(i,j))
I currently am building a set of scatter plot charts using pandas plot.scatter. In this construction off of two base axes.
My current construction looks akin to
ax1 = pandas.scatter.plot()
ax2 = pandas.scatter.plot(ax=ax1)
for dataframe in list:
output_ax = pandas.scatter.plot(ax2)
output_ax.get_figure().save("outputfile.png")
total_output_ax = total_list.scatter.plot(ax2)
total_output_ax.get_figure().save("total_output.png")
This seems inefficient. For 1...N permutations I want to reuse a base axes that has 50% of the data already plotted. What I am trying to do is:
Add base data to scatter plot
For item x in y: (save data to base scatter and save image)
Add all data to scatter plot and save image
here's one way to do it with plt.scatter.
I plot column 0 on x-axis, and all other columns on y axis, one at a time.
Notice that there is only 1 ax object, and I don't replot all points, I just add points using the same axes with a for loop.
Each time I get a corresponding png image.
import numpy as np
import pandas as pd
np.random.seed(2)
testdf = pd.DataFrame(np.random.rand(20,4))
testdf.head(5) looks like this
0 1 2 3
0 0.435995 0.025926 0.549662 0.435322
1 0.420368 0.330335 0.204649 0.619271
2 0.299655 0.266827 0.621134 0.529142
3 0.134580 0.513578 0.184440 0.785335
4 0.853975 0.494237 0.846561 0.079645
#I put the first axis out of a loop, that can be in the loop as well
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(testdf[0],testdf[1], color='red')
fig.legend()
fig.savefig('fig_1.png')
colors = ['pink', 'green', 'black', 'blue']
for i in range(2,4):
ax.scatter(testdf[0], testdf[i], color=colors[i])
fig.legend()
fig.savefig('full_' + str(i) + '.png')
Then you get these 3 images (fig_1, fig_2, fig_3)
Axes objects cannot be simply copied or transferred. However, it is possible to set artists to visible/invisible in a plot. Given your ambiguous question, it is not fully clear how your data are stored but it seems to be a list of dataframes. In any case, the concept can easily be adapted to different input data.
import matplotlib.pyplot as plt
#test data generation
import pandas as pd
import numpy as np
rng = np.random.default_rng(123456)
df_list = [pd.DataFrame(rng.integers(0, 100, (7, 2))) for _ in range(3)]
#plot all dataframes into an axis object to ensure
#that all plots have the same scaling
fig, ax = plt.subplots()
patch_collections = []
for i, df in enumerate(df_list):
pc = ax.scatter(x=df[0], y=df[1], label=str(i))
pc.set_visible(False)
patch_collections.append(pc)
#store individual plots
for i, pc in enumerate(patch_collections):
pc.set_visible(True)
ax.set_title(f"Dataframe {i}")
fig.savefig(f"outputfile{i}.png")
pc.set_visible(False)
#store summary plot
[pc.set_visible(True) for pc in patch_collections]
ax.set_title("All dataframes")
ax.legend()
fig.savefig(f"outputfile_0_{i}.png")
plt.show()
I want to plot data in matplotlib in real time. I want to open a figure once at the start of the programme, then update the figure when new data is acquired. Despite there being a few similar questions out there, none quite answer my specific question.
I want each set of data points new_data1 and new_data2 to be plotted on the same figure at the end of each while loop i.e. one line after the first while loop, two lines on the same figure after the second while loop etc. Currently they are all plotted together, but only right at the end of the programme, which is no use for real time data acquisition.
import matplotlib.pyplot as plt
import numpy
hl, = plt.plot([], [])
def update_line(hl, new_datax, new_datay):
hl.set_xdata(numpy.append(hl.get_xdata(), new_datax))
hl.set_ydata(numpy.append(hl.get_ydata(), new_datay))
plt.xlim(0, 50)
plt.ylim(0,200)
plt.draw()
x = 1
while x < 5:
new_data1 = []
new_data2 = []
for i in range(500):
new_data1.append(i * x)
new_data2.append(i ** 2 * x)
update_line(hl, new_data1, new_data2)
x += 1
else:
print("DONE")
This programme plots all 5 lines, but at the end of the programme. I want each line to be plotted after one another, after the while loop is completed. I have tried putting in plt.pause(0.001) in the function, but it has not worked.
This programme is different from the one that has been put forward - that programme only plots one graph and does not update with time.
If I correctly understood your specifications, you can modify just a bit your MWE as follows:
import matplotlib.pyplot as plt
import numpy
fig = plt.figure(figsize=(11.69,8.27))
ax = fig.gca()
ax.set_xlim(0, 50)
ax.set_ylim(0,200)
hl, = plt.plot([], [])
def update_line(hl, new_datax, new_datay):
# re initialize line object each time if your real xdata is not contiguous else comment next line
hl, = plt.plot([], [])
hl.set_xdata(numpy.append(hl.get_xdata(), new_datax))
hl.set_ydata(numpy.append(hl.get_ydata(), new_datay))
fig.canvas.draw_idle()
fig.canvas.flush_events()
x = 1
while x < 10:
new_data1 = []
new_data2 = []
for i in range(500):
new_data1.append(i * x)
new_data2.append(i ** 2 * x)
update_line(hl, new_data1, new_data2)
# adjust pause duration here
plt.pause(0.5)
x += 1
else:
print("DONE")
which displays :
Not sure, if I am reading the requirements right but below is a blueprint. Please change it to suit your requirements. You may want to change the function Redraw_Function and edit the frames (keyword parameter, which is np.arange(1,5,1) ) in the FuncAnimation call. Also interval=1000 means 1000 milliseconds of delay.
If you are using Jupyter then comment out the second last line (where it says plt.show()) and uncomment the last line. This will defeat your purpose of real time update but I am sorry I had trouble making it work real time in Jupyter. However if you are using python console or official IDLE please run the code as it is. It should work nicely.
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.animation import FuncAnimation
fig, ax = plt.subplots()
plot, = plt.plot([],[])
def init_function():
ax.set_xlim(0,50)
ax.set_ylim(0,250)
return plot,
def Redraw_Function(UpdatedVal):
new_x = np.arange(500)*UpdatedVal
new_y = np.arange(500)**2*UpdatedVal
plot.set_data(new_x,new_y)
return plot,
Animated_Figure = FuncAnimation(fig,Redraw_Function,init_func=init_function,frames=np.arange(1,5,1),interval=1000)
plt.show()
# Animated_Figure.save('MyAnimated.gif',writer='imagemagick')
When you run the code, you obtain the below result. I tried to keep very little code but I am sorry, if your requirement was totally different.
Best Wishes,
I am trying to loop through chunks of pandas dataframe and append chart to pdf. here is sample code:
import random
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from matplotlib.backends import backend_pdf
df = pd.DataFrame({'a':[a + + random.random() for a in range(12)] ,
'b':[ b + random.random() for b in range(12,24)]})
print(df)
chunk_size = 3 # number of rows in heatmap
n_chunks = len(df)//chunk_size # number of pages in heatmap pdf
with backend_pdf.PdfPages('chart.pdf') as pdf_pages:
for e,(k,g) in enumerate(df.groupby(np.arange(len(df))//chunk_size)):
#print(k,g.shape)
snsplot = sns.heatmap(g, annot=True, cbar=False, linewidths=.5) #fmt="d",cmap="YlGnBu",
pdf_pages.savefig(snsplot.figure)
This code adds pages alright, but all the annotation from previous pages seems to be overlayed (preserved) in all the pages that follow.
Every time you call sns.heatmap it is using plt.gca() so all of your plotting is going to the same Axes object (each loop might be getting slower too as all of the previous artists are rendered, but just occluded by the latest one).
I suggest something like
fig, ax = plt.subplots()
with backend_pdf.PdfPages('chart.pdf') as pdf_pages:
for e,(k,g) in enumerate(df.groupby(np.arange(len(df))//chunk_size)):
#print(k,g.shape)
ax.cla()
snsplot = sns.heatmap(g, annot=True, cbar=False, linewidths=.5, ax=ax)
pdf_pages.savefig(snsplot.figure)
Which passes in an Axes object so seaborn knows where to draw and explicitly clears it in each loop.
I am plotting a confusion matrix with matplotlib with the following code:
from numpy import *
import matplotlib.pyplot as plt
from pylab import *
conf_arr = [[33,2,0,0,0,0,0,0,0,1,3], [3,31,0,0,0,0,0,0,0,0,0], [0,4,41,0,0,0,0,0,0,0,1], [0,1,0,30,0,6,0,0,0,0,1], [0,0,0,0,38,10,0,0,0,0,0], [0,0,0,3,1,39,0,0,0,0,4], [0,2,2,0,4,1,31,0,0,0,2], [0,1,0,0,0,0,0,36,0,2,0], [0,0,0,0,0,0,1,5,37,5,1], [3,0,0,0,0,0,0,0,0,39,0], [0,0,0,0,0,0,0,0,0,0,38] ]
norm_conf = []
for i in conf_arr:
a = 0
tmp_arr = []
a = sum(i,0)
for j in i:
tmp_arr.append(float(j)/float(a))
norm_conf.append(tmp_arr)
plt.clf()
fig = plt.figure()
ax = fig.add_subplot(111)
res = ax.imshow(array(norm_conf), cmap=cm.jet, interpolation='nearest')
cb = fig.colorbar(res)
savefig("confmat.png", format="png")
But I want to the confusion matrix to show the numbers on it like this graphic (the right one). How can I plot the conf_arr on the graphic?
You can use text to put arbitrary text in your plot. For example, inserting the following lines into your code will write the numbers (note the first and last lines are from your code to show you where to insert my lines):
res = ax.imshow(array(norm_conf), cmap=cm.jet, interpolation='nearest')
for i, cas in enumerate(conf_arr):
for j, c in enumerate(cas):
if c>0:
plt.text(j-.2, i+.2, c, fontsize=14)
cb = fig.colorbar(res)
The only way I could really see of doing it was to use annotations. Try these lines:
for i,j in ((x,y) for x in xrange(len(conf_arr))
for y in xrange(len(conf_arr[0]))):
ax.annotate(str(conf_arr[i][j]),xy=(i,j))
before saving the figure. It adds the numbers, but I'll let you figure out how to get the sizes of the numbers how you want them.