searborn annotate overwrites previous - python

I am trying to loop through chunks of pandas dataframe and append chart to pdf. here is sample code:
import random
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from matplotlib.backends import backend_pdf
df = pd.DataFrame({'a':[a + + random.random() for a in range(12)] ,
'b':[ b + random.random() for b in range(12,24)]})
print(df)
chunk_size = 3 # number of rows in heatmap
n_chunks = len(df)//chunk_size # number of pages in heatmap pdf
with backend_pdf.PdfPages('chart.pdf') as pdf_pages:
for e,(k,g) in enumerate(df.groupby(np.arange(len(df))//chunk_size)):
#print(k,g.shape)
snsplot = sns.heatmap(g, annot=True, cbar=False, linewidths=.5) #fmt="d",cmap="YlGnBu",
pdf_pages.savefig(snsplot.figure)
This code adds pages alright, but all the annotation from previous pages seems to be overlayed (preserved) in all the pages that follow.

Every time you call sns.heatmap it is using plt.gca() so all of your plotting is going to the same Axes object (each loop might be getting slower too as all of the previous artists are rendered, but just occluded by the latest one).
I suggest something like
fig, ax = plt.subplots()
with backend_pdf.PdfPages('chart.pdf') as pdf_pages:
for e,(k,g) in enumerate(df.groupby(np.arange(len(df))//chunk_size)):
#print(k,g.shape)
ax.cla()
snsplot = sns.heatmap(g, annot=True, cbar=False, linewidths=.5, ax=ax)
pdf_pages.savefig(snsplot.figure)
Which passes in an Axes object so seaborn knows where to draw and explicitly clears it in each loop.

Related

Plotting Errorbars from different DataFrame into SubPlots with matplotlib

i just stumpled upon a problem I simply cannot solve. I have a dataset with raw data which I will upload here: https://file.io/oJqkZjAGyqV1
Its an excel file with the data inside.
I then created some code to open it, read it, generate a mean and sem of my data as below.
# Import required packages
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from pylab import cm
df = pd.read_excel("Chlorophyll_data_mod.xlsx")
#----Calculation of meanvalues and sem from raw_data---------
meandf2 = df.set_index(["Group"])
sets = []
for x in ["A","B","AB","xc"]:
meandf3 = meandf2.filter(like=f"Chl_{x}_").reset_index()
sets.append(meandf3)
#---------Grouping DataFrame----------#
means = []
ster = []
for x in range(len(sets)):
meandf = sets[x].groupby(["Group"]).mean()
meandf = meandf.reset_index()
means.append(meandf)
sems = sets[x].groupby("Group").sem()
sems = sems.reset_index()
ster.append(sems)
#----Selecting Dataframe from List-----#
plotdf = means[0]
ploter = ster[0]
plotgroup = plotdf.iloc[:,[0,]]
plotdata = plotdf.iloc[:,[1,]]
grouparray = plotgroup.to_numpy()
dataarray = plotdata.to_numpy()
#-----CreatePlot------#
fig, ax = plt.subplots(nrows=3, ncols=1, sharex="all", figsize=(10,8))
plotdf.plot(ax=ax[0,],x="Group",y="Chl_A_0D", kind="bar", legend=False, color="black")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_0D"],yerr=ploter["Chl_A_0D"])
plotdf.plot(ax=ax[1,],x="Group",y="Chl_A_10DaT", kind="bar", legend=False, color="blue")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_10DaT"],yerr=ploter["Chl_A_10DaT"])
plotdf.plot(ax=ax[2,],x="Group",y="Chl_A_7DaR", kind="bar", legend=False, color="magenta")
plt.errorbar(x=plotdf["Group"], y=plotdf["Chl_A_7DaR"],yerr=ploter["Chl_A_7DaR"])
#----Legend of the Plot-----#
fig.legend(loc="lower center", bbox_to_anchor=(0.5,0), fancybox=True, ncol=6)
#----Layout------#
plt.tight_layout(rect=[0, 0.02, 1,1])
plt.show()
And I manage to create a subplot, which shows 3 of my interested data points. However, I struggle with the error bars.
My approach was to calculate the sem and store it into a new dataframe. And then just read it from there for the yerr. However, this doesn't work.
plotdf.plot(ax=ax[2,],x="Group",y="Chl_A_7DaR", kind="bar", legend=False, color="magenta", yerr=ploter["Chl_A_7DaR"])
Results in an array error because of the structure.
And my current approach, as in the main code above only draws the error bars in the last subplot, but not in each individual plot.
Maybe here is someone who could help me understanding this function?
Best regards

Not able to plot box plot separately

I have lot of feature in data and i want to make box plot for each feature. So for that
import pandas as pd
import seaborn as sns
plt.figure(figsize=(25,20))
for data in train_df.columns:
plt.subplot(7,4,i+1)
plt.subplots_adjust(hspace = 0.5, wspace = 0.5)
ax =sns.boxplot(train_df[data])
I did this
and the output is
All the plot are on one image i want something like
( not with skew graphs but with box plot )
What changes i need to do ?
In your code, I cannot see where the i is coming from and also it's not clear how ax was assigned.
Maybe try something like this, first an example data frame:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
train_df = pd.concat([pd.Series(np.random.normal(i,1,100)) for i in range(12)],axis=1)
Set up fig and a flattened ax for each subplot:
fig,ax = plt.subplots(4,3,figsize=(10,10))
ax = ax.flatten()
The most basic would be to call sns.boxplot assigning ax inside the function:
for i,data in enumerate(train_df.columns):
sns.boxplot(train_df[data],ax=ax[i])

Proper Matplotlib axes construction / reuse

I currently am building a set of scatter plot charts using pandas plot.scatter. In this construction off of two base axes.
My current construction looks akin to
ax1 = pandas.scatter.plot()
ax2 = pandas.scatter.plot(ax=ax1)
for dataframe in list:
output_ax = pandas.scatter.plot(ax2)
output_ax.get_figure().save("outputfile.png")
total_output_ax = total_list.scatter.plot(ax2)
total_output_ax.get_figure().save("total_output.png")
This seems inefficient. For 1...N permutations I want to reuse a base axes that has 50% of the data already plotted. What I am trying to do is:
Add base data to scatter plot
For item x in y: (save data to base scatter and save image)
Add all data to scatter plot and save image
here's one way to do it with plt.scatter.
I plot column 0 on x-axis, and all other columns on y axis, one at a time.
Notice that there is only 1 ax object, and I don't replot all points, I just add points using the same axes with a for loop.
Each time I get a corresponding png image.
import numpy as np
import pandas as pd
np.random.seed(2)
testdf = pd.DataFrame(np.random.rand(20,4))
testdf.head(5) looks like this
0 1 2 3
0 0.435995 0.025926 0.549662 0.435322
1 0.420368 0.330335 0.204649 0.619271
2 0.299655 0.266827 0.621134 0.529142
3 0.134580 0.513578 0.184440 0.785335
4 0.853975 0.494237 0.846561 0.079645
#I put the first axis out of a loop, that can be in the loop as well
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(testdf[0],testdf[1], color='red')
fig.legend()
fig.savefig('fig_1.png')
colors = ['pink', 'green', 'black', 'blue']
for i in range(2,4):
ax.scatter(testdf[0], testdf[i], color=colors[i])
fig.legend()
fig.savefig('full_' + str(i) + '.png')
Then you get these 3 images (fig_1, fig_2, fig_3)
Axes objects cannot be simply copied or transferred. However, it is possible to set artists to visible/invisible in a plot. Given your ambiguous question, it is not fully clear how your data are stored but it seems to be a list of dataframes. In any case, the concept can easily be adapted to different input data.
import matplotlib.pyplot as plt
#test data generation
import pandas as pd
import numpy as np
rng = np.random.default_rng(123456)
df_list = [pd.DataFrame(rng.integers(0, 100, (7, 2))) for _ in range(3)]
#plot all dataframes into an axis object to ensure
#that all plots have the same scaling
fig, ax = plt.subplots()
patch_collections = []
for i, df in enumerate(df_list):
pc = ax.scatter(x=df[0], y=df[1], label=str(i))
pc.set_visible(False)
patch_collections.append(pc)
#store individual plots
for i, pc in enumerate(patch_collections):
pc.set_visible(True)
ax.set_title(f"Dataframe {i}")
fig.savefig(f"outputfile{i}.png")
pc.set_visible(False)
#store summary plot
[pc.set_visible(True) for pc in patch_collections]
ax.set_title("All dataframes")
ax.legend()
fig.savefig(f"outputfile_0_{i}.png")
plt.show()

Plotting 2 table objects as subplots using matplotlib

I have 2 matplotlib table objects in a list, and I'm trying to plot each table as a subplot. So far all the answers on Stack Exchange appear to be concerned with either subplotting figures, or plotting single tables.
The following code produces only the second table I want to plot, but not the first.
import matplotlib.pyplot as plt
import numpy as np
list_of_tables = []
a = np.empty((16,16))
for i in range(0, 2):
a.fill(i)
the_table = plt.table(
cellText=a,
loc='center',
)
list_of_tables.append(the_table)
plt.show()
So I followed advice from various tutorials and came up with the following:
import matplotlib.pyplot as plt
import numpy as np
list_of_tables = []
a = np.empty((16,16))
for i in range(0, 2):
a.fill(i)
the_table = plt.table(
cellText=a,
loc='center',
)
list_of_tables.append(the_table)
fig = plt.figure()
ax1 = fig.add_subplot(list_of_tables[0])
ax2 = fig.add_subplot(list_of_tables[1])
ax1.plot(list(of_tables[0])
ax2.plot(list_of_tables[1])
plt.show()
But when this code calls the add_subplot method, the following error is produced.
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Table'.
How can I plot each table as a subplot?
You are saving the tables instances in a list and then trying to plot them using plt.plot which expects a list of numbers.
A possibility would be to create the subplots, then use the object-oriented API in order to plot the table to a specific axes:
import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(1, 2)
a = np.empty((16, 16))
for i in range(0, 2):
a.fill(i)
the_table = axes[i].table(
cellText=a,
loc='center',
)
axes[i].axis("off")
plt.show()
Which gives:

Change Error Bar Markers (Caplines) in Pandas Bar Plot

so I am plotting error bar of pandas dataframe. Now the error bar has a weird arrow at the top, but what I want is a horizontal line. For example, a figure like this:
But now my error bar ends with arrow instead of a horinzontal line.
Here is the code i used to generate it:
plot = meansum.plot(
kind="bar",
yerr=stdsum,
colormap="OrRd_r",
edgecolor="black",
grid=False,
figsize=(8, 2),
ax=ax,
position=0.45,
error_kw=dict(ecolor="black", elinewidth=0.5, lolims=True, marker="o"),
width=0.8,
)
So what should I change to make the error become the one I want. Thx.
Using plt.errorbar from matplotlib makes it easier as it returns several objects including the caplines which contain the marker you want to change (the arrow which is automatically used when lolims is set to True, see docs).
Using pandas, you just need to dig the correct line in the children of plot and change its marker:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5, lolims=True),width=0.8)
for ch in plot.get_children():
if str(ch).startswith('Line2D'): # this is silly, but it appears that the first Line in the children are the caplines...
ch.set_marker('_')
ch.set_markersize(10) # to change its size
break
plt.show()
The result looks like:
Just don't set lolim = True and you are good to go, an example with sample data:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=(8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5),width=0.8)
plt.show()

Categories

Resources