pandas multiple plots not working as hists - python

with a dataframe df in Pandas, I am trying to plot histograms on the same page filtering by 3 different variables; the intended outcome is histograms of the values for each of the three types. The following code works for me as PLOTs, but when I change to HISTs, they all stack on top of each other. Any suggestions?
plt.figure(1)
plt.subplot(311)
df[df.Type == 'Type1'].values.plot()
plt.subplot(312)
df[df.Type == 'Type2'].values.plot()
plt.subplot(313)
df[df.Type == 'Type3'].values.plot()
plt.savefig('output.pdf')
Is this a bug in Pandas or are plot and hist not interchangable?
Thanks in advance

Plot and hist aren't interchangabe: hist is its own DataFrame and Series methods. You might be able to force the plots to be independent by passing in an ax keyword. So
fig = plt.figure()
ax1 = fig.add_subplot(3,1,1)
df[df.Type == 'Type1'].values.hist(ax=ax1)
....

Related

Is there something similar like pandas DataFrame.hist() method that creates connected subplots for barcharts?

I like how the DataFrame method hist selects all numeric columns in a DataFrame and then simply returns a comprehensive plot of histogram subplots. With code as simple and effective as this:
df.hist(bins=50, figsize=(15,10))
plt.show()
But I can't seem to create something similar for all categorical columns that simply returns a plot of barchart subplots.
df.select_dtypes("object").plot(kind="bar", subplots=True) # Error because not numeric values
Is there some variation for the code above so that it works? How is the subplots argument actually used?
Or is there else another similarly quicky and simple way to get what I'm seeking?
Thanks in advance!
You can try value_counts to get the count of each categorical variables:
import matplotlib.pyplot as plt
from itertools import zip_longest
cols = df.select_dtypes('object').columns
ncols = 3
nrows = len(cols) // ncols + (len(cols) % 3 != 0)
fig, axs = plt.subplots(nrows, ncols, figsize=(4*ncols, 4*nrows))
for col, ax in zip_longest(cols, axs.flat):
if col:
df[col].value_counts(sort=False).plot(kind='bar', ax=ax, rot=45, title=col)
else:
fig.delaxes(ax)
plt.tight_layout()
plt.show()
Output:

Personalize pandas boxplot with colors

I've been trying to make a boxplot of some gender data that I divided into two sapareted dataframes, one for male, and one for female.
I managed to make the graph basically how I wanted it, but now I would like to make it look better. I'd like to make it look like a seaborn graph, but I wasn't able to find a way to make this using the seaborn library. I tried some ideas I found for coloring the pandas boxpplot, but nothing worked.
Is there a way to color these graphs? Or is there a way to make these side-by-side boxplots with seaborn?
dados_generos = dados_sem_zeros[["NU_NOTA_CN","NU_NOTA_CH","NU_NOTA_MT","NU_NOTA_LC","NU_NOTA_REDACAO", "TP_SEXO"]]
sexo_f = dados_generos[dados_generos["TP_SEXO"].str.contains("F")]
sexo_m = dados_generos[dados_generos["TP_SEXO"].str.contains("M")]
labels = ["CN", "CH", "MT", "LC", "REDAÇÃO"]
fig, (ax, ax2) = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
#Setting axis titles
ax.set_xlabel('Provas')
ax2.set_xlabel('Provas')
ax.set_ylabel('Notas')
#Making plots
chart1 = sexo_f[provas].boxplot(ax=ax)
chart2 = sexo_m[provas].boxplot(ax=ax2)
#Setting axis labels
chart1.set_xticklabels(labels,rotation=45)
chart2.set_xticklabels(labels,rotation=45)
plt.show()
This is the result I have:
This is the link to the data I'm using:
https://github.com/KarolDuarte/dados_generos/blob/main/dados_generos.csv
Since sns is best suitable for long form data, let's try melting the data and use sns.
# melting the data
plot_data = df.melt('TP_SEXO')
fig, axes = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
for ax, (gender, data) in zip(axes, plot_data.groupby('TP_SEXO')) :
sns.boxplot(x='variable',y='value',data=data, ax=ax)
Output:

How to use a 3rd dataframe column as x axis ticks/labels in matplotlib scatter

I'm struggling to wrap my head around matplotlib with dataframes today. I see lots of solutions but I'm struggling to relate them to my needs. I think I may need to start over. Let's see what you think.
I have a dataframe (ephem) with 4 columns - Time, Date, Altitude & Azimuth.
I produce a scatter for alt & az using:
chart = plt.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
What's the most efficient way to set the values in the Time column as the labels/ticks on the x axis?
So:
the scale/gridlines etc all remain the same
the chart still plots alt and az
the y axis ticks/labels remain as is
only the x axis ticks/labels are changed to the Time column.
Thanks
This isn't by any means the cleanest piece of code but the following works for me:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
labels = list(ephem.Time)
ax.set_xticklabels(labels)
plt.show()
Here you will explicitly force the set_xticklabels to the dataframe Time column which you have.
In other words, you want to change the x-axis tick labels using a list of values.
labels = ephem.Time.tolist()
# make your plot and before calling plt.show()
# insert the following two lines
ax = plt.gca()
ax.set_xticklabels(labels = labels)
plt.show()

Creating a matrix of plots with sns distplot

I am plotting 20+ features like so:
for col in dsd_mod["ae_analysis"].columns[:len(dsd_mod["ae_analysis"].columns)]:
if col != "sae_flag":
sns.distplot(dsd_mod["ae_analysis"].loc[(dsd_mod["ae_analysis"]['sae_flag'] == 1),col],
color='r',
kde=True,
hist=False,
label='sae_ae = 1')
sns.distplot(dsd_mod["ae_analysis"].loc[(dsd_mod["ae_analysis"]['sae_flag'] == 0),col],
color='y',
kde=True,
hist=False,
label='sae_ae = 0')
Which creates a separate graph for each feature. How can I put these all on a matrix? Or like how pair plots outputs?
Right now I get 30 graphs like this all in one column:
How can I modify this so that I can get 6 rows and 5 columns ?
Thanks in advance!
displot can use whatever axes object you want to draw the plot. So you just need to create your axes with the desired geometry, and pass the relevant axes to your functions.
fig, axs = plt.subplots(6,5)
# axs is a 2D array with shape (6,5)
# you can keep track of counters in your for-loop to place the resulting graphs
# using ax=axs[i,j]
# or an alternative is to use a generator that you can use to get the next axes
# instance at every step of the loop
ax_iter = iter(axs.flat)
for _ in range(30):
ax = next(ax_iter)
sns.distplot(np.random.normal(loc=0, size=(1000,)), ax=ax)
sns.distplot(np.random.normal(loc=1, size=(1000,)), ax=ax)

How to plot multiple dataframes to the same plot axes

I have two dataframes, with unique x and y coordinates, and I want to plot them in the same figure.
I am now plotting two dataframes in same figure as such:
plt.plot(df1['x'],df1['y'])
plt.plot(df2['x'],df2['y'])
plt.show
However, pandas also has plotting functionality.
df.plot()
How could I achieve the same as my first example but use the pandas functionality?
To plot all columns against the index as line plots.
ax = df1.plot()
df2.plot(ax=ax)
A single pandas.DataFrame.plot (not subplots=True) returns a matplotlib.axes.Axes, which you can then pass to the second dataframe.
To plot specific columns as x and y. Specifying x and y is required for scatter plots (kind='scatter').
ax = df1.plot(x='Lat', y='Lon', figsize=(8, 8))
df2.plot(ax=ax, x='Lat', y='Lon')

Categories

Resources