I am plotting one column of a pandas dataframe as line plot, using plot() :
df.iloc[:,1].plot()
and get the desired result:
Now I want to plot another column of the same dataframe as bar chart using
ax=df.iloc[:,3].plot(kind='bar',width=1)
with the result:
And finally I want to combine both by
spy_price_data.iloc[:,1].plot(ax=ax)
which doesn't produce any plot.
Why are the x-ticks of the bar plot so different to the x-ticks of the line plot? How can I combine both plots in one plot?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
some data
df = pd.DataFrame(np.random.randn(5,2))
print (df)
0 1
0 0.008177 -0.121644
1 0.643535 -0.070786
2 -0.104024 0.872997
3 -0.033835 0.067264
4 -0.576762 0.571293
then we create an axes object (ax). Notice that we pass ax to both plots
_, ax = plt.subplots()
df[0].plot(ax=ax)
df[1].plot(kind='bar', ax=ax)
Related
I have a CSV with 3 data sets, each coresponding to a line to plot. I use Pandas plot() grouping to group the entries for the 3 lines. This generates 3 separate diagrams, but I would like to plot all 3 lines on the same diagram.
The CSV:
shop,timestamp,sales
north,2023-01-01,235
north,2023-01-02,147
north,2023-01-03,387
north,2023-01-04,367
north,2023-01-05,197
south,2023-01-01,235
south,2023-01-02,98
south,2023-01-03,435
south,2023-01-04,246
south,2023-01-05,273
east,2023-01-01,197
east,2023-01-02,389
east,2023-01-03,87
east,2023-01-04,179
east,2023-01-05,298
The code (tested in Jupyter Lab):
import pandas as pd
csv = pd.read_csv('./tmp/sample.csv')
csv.timestamp = pd.to_datetime(csv.timestamp)
csv.plot(x='timestamp', by='shop')
This gives the following:
Any idea how to render them 3 on one single diagram?
You can create manually your subplot:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
for name, df in csv.groupby('shop'):
df.plot(x='timestamp', y='sales', label=name, ax=ax)
ax.set_title('Sales')
plt.show()
[Seaborn alternative (to the native Pandas.Dataframe.plot answer]
This is posted as an alternate 'answer'; for clarity and not to lump them together.
Seaborn plots the sales per shop (designated by the hue) against the timestamp (formatted as days).
## import seaborn
import seaborn as sns
## data formater
import matplotlib.dates as mdates
## plot timestamp on horizontal (formated to days), sales on vertical
## with hue set to shop, seaborn plots sales per shop
ax = sns.lineplot(data=df_csv, x='timestamp', y='sales', hue='shop')
## set datetime to days. Ensure this is set AFTER setting ax
ax.xaxis.set_major_locator(locator=mdates.DayLocator())
Plot using the ax keyword.
df_csv.groupby('shop').plot(x='timestamp', ax=plt.gca())
Working code below.
## load libraries
import pandas as pd
import matplotlib.pyplot as plt
## load dataset
df_csv = pd.read_csv('datasets/SO_shop_timestamp_sale.csv')
## check dataset
df_csv.head(3)
df_csv.describe()
df_csv.shape
## ensure data type
df_csv.timestamp = pd.to_datetime(df_csv.timestamp)
df_csv.sales = pd.to_numeric(df_csv.sales)
## Pandas plot of sales against timestamp grouped by shop, using `ax` keyword to subplot.
df_csv.groupby('shop').plot(x='timestamp', ax=plt.gca())
## Pandas plot of timestamp and sales grouped by shop, use `ax` keyword to plot on combined axes.
df_csv.groupby('shop').plot(x='timestamp', kind='kde', ax=plt.gca())
I have dataframe which looks like
df = pd.DataFrame(data={'ID':[1,1,1,2,2,2], 'Value':[13, 12, 15, 4, 2, 3]})
Index ID Value
0 1 13
1 1 12
2 1 15
3 2 4
4 2 2
5 2 3
and I want to plot it by the IDs (categories) so that each category would have different bar plot,
so in this case I would have two figures,
one figure with bar plot of ID=1,
and second separate figure bar plot of ID=2.
Can I do it (preferably without loops) with something like df.plot(y='Value', kind='bar')?
2 options are possible, one using matplotlib and the other seaborn that you should absolutely now as it works well with Pandas.
Pandas with matplotlib
You have to create a subplot with a number of columns and rows you set. It gives an array axes in 1-D if either nrows or ncols is set to 1, or in 2-D otherwise. Then, you give this object to the Pandas plot method.
If the number of categories is not known or high, you need to use a loop.
import pandas as pd
import matplotlib.pyplot as plt
fig, axes = plt.subplots( nrows=1, ncols=2, sharey=True )
df.loc[ df["ID"] == 1, 'Value' ].plot.bar( ax=axes[0] )
df.loc[ df["ID"] == 2, 'Value' ].plot.bar( ax=axes[1] )
plt.show()
Pandas with seaborn
Seaborn is the most amazing graphical tool that I know. The function catplot enables to plot a series of graph according to the values of a column when you set the argument col. You can select the type of plot with kind.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
df['index'] = [1,2,3] * 2
sns.catplot(kind='bar', data=df, x='index', y='Value', col='ID')
plt.show()
I added a column index in order to compare with the df.plot.bar. If you don't want to, remove x='index' and it will display an unique bar with errors.
I have sample data in dataframe as below
Header=['Date','EmpCount','DeptCount']
2009-01-01,100,200
print(df)
Date EmpCount DeptCount
0 2009-01-01 100 200
Can we generate Scatter plot(or any Line chart etc..) only with this one record.
I tried multiple approaches but i am getting
TypeError: no numeric data to plot
In X Axis: Dates
In Y Axis: Two dots one for Emp Count , and other one is for dept count
Starting from #the-cauchy-criterion, try this:
import pandas as pd
import matplotlib.pyplot as plt
header=['Date','EmpCount','DeptCount']
df = pd.DataFrame([['2009-01-01',100,200]],columns=header)
b=df.set_index('Date')
ax = plt.plot(b, linewidth=3, markersize=10, marker='.')
What are you using to plot the scatter plot?
Here's how to do it with pyplot.
import pandas as pd
import matplotlib.pyplot as plt
header=['Date','EmpCount','DeptCount']
df = pd.DataFrame([['2009-01-01',100,200]],columns=header)
plt.scatter(*df.iloc[0][1:])
plt.show()
iloc[0] gets the first entry, [1:] takes all the columns except the first and the * operator unpacks the arguments.
I have two DataFrames like below. They are exactly the same in terms of column names and structure. The difference is one dataframe is prediction and one is observed data. I need to plot a figure with subplots where each subplot title is the column names, X-axis is the index values and Y-axis is the values in the table. I need two lines drawn in each graph, 1 from the dataframe with prediction, one with observed. Below is what I have been trying to do but its not working.
08FB006 08FC001 08FC005 08GD004 08GD005
----------------------------------------------------------------
0 1.005910075 0.988765247 0.00500000 0.984376392 5.099999889
1 1.052696367 1.075232414 0.00535313 1.076066586 5.292135227
2 1.101749034 1.169026145 0.005731682 1.176162168 5.491832766
3 1.153183046 1.270744221 0.006137526 1.285419625 5.699405829
4 1.207119522 1.381030962 0.006572672 1.404662066 5.915181534
5 1.263686077 1.500580632 0.007039282 1.534784937 6.139501445
6 1.323017192 1.630141078 0.007539681 1.676762214 6.372722261
7 1.38525461 1.770517606 0.008076372 1.831653101 6.615216537
8 1.450547748 1.922577115 0.008652045 2.000609283 6.867373442
9 1.519054146 2.087252499 0.009269598 2.184882781 7.129599561
10 1.590939931 2.265547339 0.009932148 2.385834436 7.402319731
The sample code, I have 81 stations/columns
import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(nrows=81, ncols=1)
newdf1.plot(ax=axes[0,0])
newdf1_pred.plot(ax=axes[0,1])
I set up a dummy dataframe to help visualize what I did.
import pandas as pd
import numpy as np
obs = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
pred = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
Plotting two dataframes on the same plot is no different than plotting two columns. You just have to include them on the same axis. Here is how I did it:
import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(figsize=(5,7), nrows=4, ncols=1)
for col in range(len(obs.columns)): # loop over the number of columns to plot
obs.iloc[col].plot(ax=axes[col], label='observed') # plot observed data
pred.iloc[col].plot(ax=axes[col], label='predicted') # plot predicted data
axes[col].legend(loc='upper right') # legends to know which is which
axes[col].set_title(obs.columns[col]) # titles to know which column/variable
plt.tight_layout() # just to make it easier to read the plots
Here was my output:
I hope this helps.
In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()