Pandas bar plot of means aggregates all items - python

I want to display the average of some items in a bar plot and I am using the following code:
import pandas as pd
import matplotlib.pyplot as plt
items = ['a', 'b', 'c']
df = pd.DataFrame({
'a':[1,2,3,4,5],
'b':[3,4,5,6,7],
'c':[5,6,7,8,9]
})
df_mean = df.mean().to_frame().T
print(df_mean)
df_mean.plot.bar()
plt.legend(items)
plt.show()
It works, but all the bars are aggregated under a single x value of 0. Can I break this?

If you remove the transposition (i.e., do df_mean = df.mean().to_frame()), you get this:
You can also use something like plt.legend(['Value']) to make a more sensible legend.

Related

Plotly graph : show number of occurrences in bar-chart

I try to plot a bar-chart from a givin dataframe.
x-axis = dates
y-axis = number of occurences for each month
The result should be a barchart. Each x is an occurrence.
x
xx
x
2020-1
2020-2
2020-3
2020-4
2020-5
I tried but don't get the desired result as above.
import datetime as dt
import pandas as pd
import numpy as np
import plotly.offline as pyo
import plotly.graph_objs as go
# initialize list of lists
data = [['a', '2022-01-05'], ['a', '2022-02-14'], ['a', '2022-02-15'],['a', '2022-05-14']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Date'])
# print dataframe.
df['Date']=pd.to_datetime(df['Date'])
# plot dataframe
trace1=go.Bar(
#
x = df.Date.dt.month,
y = df.Name.groupby(df.Date.dt.month).count()
)
data=[trace1]
fig=go.Figure(data=data)
pyo.plot(fig)
Remove the last line and write instead:
fig.show()
Edit:
It's unclear to me whether you have 1 dimensional or 2 dimensional data here. Supposing you have 1d data, this is, just a bunch of dates that you want to aggregate in a bar chart, simply do this:
# initialize list of lists
data = ['2022-01-05', '2022-02-14', '2022-02-15', '2022-05-14']
# Create the pandas DataFrame
df = pd.DataFrame(data)
# plot dataframe
fig = px.bar(df)
If, instead, you have 2d data then what you want is a scatter plot, not a bar chart.

pd.categorical didn't sort bars by specified orders in plot

I was trying to use pd categorical to order the bars in a barplot but the result still didn't get sorted.
import pandas as pd
import numpy as np
np.random.seed(10)
df = pd.DataFrame({'x':np.random.randint(1,10,15),'y': ['x']*15})
df.loc[:,'group'] = df['x'].apply(lambda x:'>=5' if x>=5 else x)
df['group'] = df['group'].astype('string')
sample = df['group'].value_counts().reset_index()
sample['index'] = pd.Categorical(sample['index'],categories=['1','2','3','4','5','6','7','8','9','>=5'], ordered=True)
sample.plot(x='index',kind='bar')
After applied ordered=True, the categories still weren't in order and '>=5' were not at the end of the barplot. Not sure why.
DataFrame.plot.bar() plots the bars in order of occurrence (that is, against the range) and relabel the ticks with the column specified by x.
This is the case even with numerical data:
pd.DataFrame({'idx': [3,2,1], 'val':[4,5,6]}).plot.bar(x='idx')
would give:
In your case, you will need to sort the data before plot:
sample.sort_values('index').plot(x='index',kind='bar')
Output:

Bar plot and coloured categorical variable

I have a dataframe with 3 variables:
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
I want a barplot grouped by the Period column, showing all the values ​​contained in the Observations column and colored with the Result column.
How can I do this?
I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values).
sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None)
Plot output
Assuming that you want one bar for each row, you can do as follows:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
result_cat = df["Result"].astype("category")
result_codes = result_cat.cat.codes.values
cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0]))
patches = []
for code in result_cat.cat.codes.unique():
cat = result_cat.cat.categories[code]
patches.append(mpatches.Patch(color=cmap[code], label=cat))
df.plot.bar(x='Period',
y='Observations',
color=cmap[result_codes],
legend=False)
plt.ylabel("Observations")
plt.legend(handles=patches)
If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)

Plot Multiple DataFrames into one single plot

I have two dataFrames that I would like to plot into a single graph. Here's a basic code:
#!/usr/bin/python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
scenarios = ['scen-1', 'scen-2']
for index, item in enumerate(scenarios):
df = pd.DataFrame({'A' : np.random.randn(4)})
print df
df.plot()
plt.ylabel('y-label')
plt.xlabel('x-label')
plt.title('Title')
plt.show()
However, this only plots the last dataFrame. If I use pd.concat() it plots one line with the combined values.
How can I plot two lines, one for the first dataFrame and one for the second one?
You need to put your plot in the for loop.
If you want them on a single plot then you need to use plot's ax kwarg to put them to plot on the same axis. Here I have created a fresh axis using subplots but this could be an already populated axis,
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
scenarios = ['scen-1', 'scen-2']
fig, ax = plt.subplots()
for index, item in enumerate(scenarios):
df = pd.DataFrame({'A' : np.random.randn(4)})
print df
df.plot(ax=ax)
plt.ylabel('y-label')
plt.xlabel('x-label')
plt.title('Title')
plt.show()
The plot function is only called once, and as you say this is with the last value of df. Put df.plot() inside the loop.

Plot each column of Pandas dataframe pairwise against one column

I have a pandas dataframe where one of the columns is a set of labels that I would like to plot each of the other columns against in subplots. In other words, I want the y-axis of each subplot to use the same column, called 'labels', and I want a subplot for each of the remaining columns with the data from each column on the x-axis. I expected the following code snippet to achieve this, but I don't understand why this results in a single nonsensical plot:
examples.plot(subplots=True, layout=(-1, 3), figsize=(20, 20), y='labels', sharey=False)
The problem with that code is that you didn't specify an x value. It seems nonsensical because it's plotting the labels column against an index from 0 to the number of rows. As far as I know, you can't do what you want in pandas directly. You might want to check out seaborn though, it's another visualization library that has some nice grid plotting helpers.
Here's an example with your data:
import pandas as pd
import seaborn as sns
import numpy as np
examples = pd.DataFrame(np.random.rand(10,4), columns=['a', 'b', 'c', 'labels'])
g = sns.PairGrid(examples, x_vars=['a', 'b', 'c'], y_vars='labels')
g = g.map(plt.plot)
This creates the following plot:
Obviously it doesn't look great with random data, but hopefully with your data it will look better.

Categories

Resources