I'm trying to make a subplot of histograms for each of the features in the dataset.
The following code is what I have already tried to fix the problem. Consider train dataset, which has 9 columns and which I want to be plotted in a 3*3 subplot.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=3, ncols=3)
i=0
for row in ax:
for col in row:
train.iloc[:,i].hist()
i=i+1
I'm getting all histograms in the last subplot.
here my suggestion:
import matplotlib.pyplot as plt
import random
for i in range(1,7):
# Cut your figure into 3 row and 3 columns
# and create the plot in the i subplot.
# here I used the f-string formatting that is available from python3.6
plt.subplot(f'33{i}')
plt.hist(random.randrange(0, 10))
you can find more ideas at this amazing website: The Python Graph Gallery
pandas.DataFrame.hist can take an ax parameter which is the Matplotlib axes to use.
Related
How can I fix the x-axis on each of the plots in the following situation? Using xlim only affects the second plot axis, not both.
import pandas as pd
import matplotlib.pyplot as plt
sample = pd.DataFrame({'mean':[1,2,3,4,5], 'median':[10,20,30,40,50]})
sample.hist()
plt.xlim(0, 100)
Bonus, what is the correct pandas terminology for the two plots here? Subplots? Facets?
The correct terminology would be subplot or axes since hist returns the matplotlib axis instances:
axes = sample.hist()
for ax in axes.ravel():
ax.set_xlim(0,100)
Output:
I am trying to create a figure which contains 9 subplots (3 x 3). X, and Y axis data is coming from the dataframe using groupby. Here is my code:
fig, axs = plt.subplots(3,3)
for index,cause in enumerate(cause_list):
df[df['CAT']==cause].groupby('RYQ')['NO_CONSUMERS'].mean().axs[index].plot()
axs[index].set_title(cause)
plt.show()
However, it does not produce the desired output. In fact it returned the error. If I remove the axs[index]before plot() and put inside the plot() function like plot(ax=axs[index]) then it worked and produces nine subplot but did not display the data in it (as shown in the figure).
Could anyone guide me where am I making the mistake?
You need to flatten axs otherwise it is a 2d array. And you can provide the ax in plot function, see documentation of pandas plot, so using an example:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
cause_list = np.arange(9)
df = pd.DataFrame({'CAT':np.random.choice(cause_list,100),
'RYQ':np.random.choice(['A','B','C'],100),
'NO_CONSUMERS':np.random.normal(0,1,100)})
fig, axs = plt.subplots(3,3,figsize=(8,6))
axs = axs.flatten()
for index,cause in enumerate(cause_list):
df[df['CAT']==cause].groupby('RYQ')['NO_CONSUMERS'].mean().plot(ax=axs[index])
axs[index].set_title(cause)
plt.tight_layout()
I am trying to write some code in order to create an animation of scatter plot data through tine. In order to do this I have a dataset with multiple columns where each column represents a numbered timestep.
I would like the code to cycle through each timestep column for the y axis and use a constant x axis, so that a separate scatter plot is generated for each timestep. I tried to do this by coding a for loop that specifies an incrementing column number for the y axis.
My current code generates three out of seven scatter plots in my sample data but returns the following error:
IndexError: index 9 is out of bounds for axis 0 with size 9
I have tried other similar solutions on stack overflow but that didn't correct my problem.
The data is here if anyone wants to use what I am using: https://www.dropbox.com/s/7vwa0lud44td2ak/test_splot_anim_noTS.csv?dl=0data file
Any help or advice would be much appreciated.
import numpy as np
import pandas a pd
import matplotlib as mpl
import matplotlib.pyplot as plt
data=pd.read_csv("test_splot_anim_noTS.csv")
for n in range (6, 13):
data.plot(kind='scatter', x='metres', y=n)
plt.ylim(-4,4)
plt.savefig('n.jpeg')
data=pd.read_csv("test_splot_anim_noTS.csv")
for column in data.columns[1:]:
data.plot(kind='scatter', x='metres',y=column)
plt.ylim(-4,4)
plt.savefig('{}.jpeg'.format(column))
I may have done it!
panda.DataFrame.plot, single line plot
data=pd.read_csv("test_splot_anim_noTS.csv")
data.set_index('metres', drop=True, inplace=True)
data.plot()
With matplotlib, single plot with all columns:
import matplotlib.pyplot as plt
plt.plot(data)
plt.show()
Separate scatter plots, files saved:
for col in data.columns:
plt.scatter(data.index, data[col])
plt.ylim(-4, 4)
plt.savefig(f'{col}.jpeg')
plt.show()
With Seaborn:
for col in data.columns:
sns.scatterplot(data.index, data[col])
plt.ylim(-4,4)
plt.savefig(f'{col}.jpeg')
plt.show()
I tried to plot the subplots using the below code .But I am getting 'AttributeError: 'numpy.ndarray' object has no attribute 'boxplot'.
but changing the plt.subplots(1,2) it is plotting the box plot with indexerror.
import matplotlib.pyplot as plt
import seaborn as sns
fig = plt.Figure(figsize=(10,5))
x = [i for i in range(100)]
fig , axes = plt.subplots(2,2)
for i in range(4):
sns.boxplot(x, ax=axes[i])
plt.show();
I am expecting four subplots should be plotted but AttributeError is throwing
Couple of issues in your plot:
You are defining the figure twice which is not needed. I merged them into one.
You were looping 4 times using range(4) and using axes[i] for accessing the subplots. This is wrong for the following reason: Your axes is 2 dimensional so you need 2 indices to access it. Each dimension has length 2 because you have 2 rows and 2 columns so the only indices you can use are 0 and 1 along each axis. For ex. axes[0,1], axes[1,0] etc.
As #DavidG pointed out, you don't need the list comprehension. YOu can directly use range(100)
The solution is to expand/flatten make your 2d axes object and then directly iterate over it which gives you individual subplot, one at a time. The order of subplots will be row wise.
Complete working code
import matplotlib.pyplot as plt
import seaborn as sns
x = range(100)
fig , axes = plt.subplots(2,2, figsize=(10,5))
for ax_ in axes.flatten():
sns.boxplot(x, ax=ax_)
plt.show()
I am plotting several data types which share the x axis so I am using the matplotlib.pylot subplots command
The shared x axis is time (in years AD). The last subplot I have is the number of independent observations as a function of the time. I have the following code
import numpy as np
import matplotlib.pyplot as plt
#
# There's a bunch of data analysis here
#
f, ax = plt.subplots(4, sharex=True)
# Here I plot the first 3 subplots with no issue
x = np.arange(900, 2000, 1)#make x array in steps of 1
ax[3].plot(x[0:28], np.ones(len(x[0:28])),'k')#one observation from 900-927 AD
ax[3].plot(x[29:62], 2*np.ones(len(x[29:62])),'k')#two observations from 928-961 AD
Now when I run this code, the subplot I get only shows the second ax[3] plot and not the first. How can I fix this?? Thanks
Ok, I think I found an answer. The first plot was plotting but I couldn't see it with the axes so I changed the y limits
ax[3].axes.set_ylim([0 7])
That seemed to work, although is there a way to connect these horizontal lines, perhaps with dashed lines?