Plotly graph : show number of occurrences in bar-chart - python

I try to plot a bar-chart from a givin dataframe.
x-axis = dates
y-axis = number of occurences for each month
The result should be a barchart. Each x is an occurrence.
x
xx
x
2020-1
2020-2
2020-3
2020-4
2020-5
I tried but don't get the desired result as above.
import datetime as dt
import pandas as pd
import numpy as np
import plotly.offline as pyo
import plotly.graph_objs as go
# initialize list of lists
data = [['a', '2022-01-05'], ['a', '2022-02-14'], ['a', '2022-02-15'],['a', '2022-05-14']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Date'])
# print dataframe.
df['Date']=pd.to_datetime(df['Date'])
# plot dataframe
trace1=go.Bar(
#
x = df.Date.dt.month,
y = df.Name.groupby(df.Date.dt.month).count()
)
data=[trace1]
fig=go.Figure(data=data)
pyo.plot(fig)

Remove the last line and write instead:
fig.show()
Edit:
It's unclear to me whether you have 1 dimensional or 2 dimensional data here. Supposing you have 1d data, this is, just a bunch of dates that you want to aggregate in a bar chart, simply do this:
# initialize list of lists
data = ['2022-01-05', '2022-02-14', '2022-02-15', '2022-05-14']
# Create the pandas DataFrame
df = pd.DataFrame(data)
# plot dataframe
fig = px.bar(df)
If, instead, you have 2d data then what you want is a scatter plot, not a bar chart.

Related

pd.categorical didn't sort bars by specified orders in plot

I was trying to use pd categorical to order the bars in a barplot but the result still didn't get sorted.
import pandas as pd
import numpy as np
np.random.seed(10)
df = pd.DataFrame({'x':np.random.randint(1,10,15),'y': ['x']*15})
df.loc[:,'group'] = df['x'].apply(lambda x:'>=5' if x>=5 else x)
df['group'] = df['group'].astype('string')
sample = df['group'].value_counts().reset_index()
sample['index'] = pd.Categorical(sample['index'],categories=['1','2','3','4','5','6','7','8','9','>=5'], ordered=True)
sample.plot(x='index',kind='bar')
After applied ordered=True, the categories still weren't in order and '>=5' were not at the end of the barplot. Not sure why.
DataFrame.plot.bar() plots the bars in order of occurrence (that is, against the range) and relabel the ticks with the column specified by x.
This is the case even with numerical data:
pd.DataFrame({'idx': [3,2,1], 'val':[4,5,6]}).plot.bar(x='idx')
would give:
In your case, you will need to sort the data before plot:
sample.sort_values('index').plot(x='index',kind='bar')
Output:

Plotly: Change order of elements in Sunburst Chart

I am currently using plotly express to create a Sunburst Chart. However, i realized that children are ordered alphabetical for nominal values. Especially for plotting months that is pretty unlucky... Do you know how to handle that issue? Maybe a property or some workaround? Below there is an example so you can try it yourself. Thanks in advance!
import plotly.express as px
import pandas as pd
import calendar
months = [x for x in calendar.month_name if x]
#Create Dataframe
data = []
for m in months:
data.append(['2018', m, 2])
df = pd.DataFrame(data, columns=['Year', 'Month', 'Value'])
#Compute Sunburst
fig = px.sunburst(df, path=['Year', 'Month'], values='Value')
fig.show()
Please Check this out. I have just added values to each months instead of hardcoding 2. So the corresponding month matches with corresponding number.
January-1, February-2, ... December-12
import plotly.express as px
import pandas as pd
import calendar
months = [x for x in calendar.month_name if x]
#Create Dataframe
data = []
for i,m in enumerate(months):
data.append(['2018', m,i+1])
print(data)
df = pd.DataFrame(data, columns=['Year', 'Month', 'Value'])
#Compute Sunburst
fig = px.sunburst(df, path=['Year', 'Month'], values='Value')
fig.show()
The other solution gives each month an angle proportional to its number. A small tweak to line 8 as follows:
data.append(['2018', m,0.00001*i+1])
gives each month the same sized piece of the pie.
A better solution is to disable the auto-sorting of the elements:
fig.update_traces(sort=False, selector=dict(type='sunburst'))
which then adds the elements in the order that they are defined in the data.

Pandas bar plot of means aggregates all items

I want to display the average of some items in a bar plot and I am using the following code:
import pandas as pd
import matplotlib.pyplot as plt
items = ['a', 'b', 'c']
df = pd.DataFrame({
'a':[1,2,3,4,5],
'b':[3,4,5,6,7],
'c':[5,6,7,8,9]
})
df_mean = df.mean().to_frame().T
print(df_mean)
df_mean.plot.bar()
plt.legend(items)
plt.show()
It works, but all the bars are aggregated under a single x value of 0. Can I break this?
If you remove the transposition (i.e., do df_mean = df.mean().to_frame()), you get this:
You can also use something like plt.legend(['Value']) to make a more sensible legend.

Pandas+seaborn faceting with multidimensional dataframes

In Python pandas, I need to do a facet grid from a multidimensional DataFrame.
In columns a and b I hold scalar values, which represent conditions of an experiment.
In columns x and y instead I have two numpy arrays. Column x is the x-axis of the data and column y is the value of a function corresponding to f(x).
Obviously both x and y have the same number of elements.
I now would like to do a facet grid with rows and columns specifying the conditions, and in every cell of the grid, plot the value of column D vs column D.
This could be a minimal working example:
import pandas as pd
d = [0]*4 # initialize a list with 4 elements
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
pd.DataFrame(d) # create the pandas dataframe
How can I use already existing faceting functions to address the issue of plotting y vs x grouped by the conditions a and b?
Since I need to apply this function to general datasets with different column names, I would like to avoid resorting on hard-coded solutions, but rather see whether it is possible to extend seaborn FacetGrid function to this kind of problem.
I think the best way to go is to split the nested arrays first and then create a facet grid with seaborn.
Thanks to this post (Split nested array values from Pandas Dataframe cell over multiple rows) I was able to split the nested array in your dataframe:
unnested_lst = []
for col in df.columns:
unnested_lst.append(df[col].apply(pd.Series).stack())
result = pd.concat(unnested_lst, axis=1, keys=df.columns).fillna(method='ffill')
Then you can make the facet grid with this code:
import seaborn as sbn
fg = sbn.FacetGrid(result, row='b', col='a')
fg.map(plt.scatter, "x", "y", color='blue')
You need a long-form frame to be able to use FacetGrid, so your best bet is to explode the lists, then recombine and apply:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
d = [0]*4
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
df = pd.DataFrame(d)
df.set_index(['a','b'], inplace=True, drop=True)
x_long = pd.melt(df['x'].apply(pd.Series).reset_index(),
id_vars=['a', 'b'], value_name='x')
y_long = pd.melt(df['y'].apply(pd.Series).reset_index(),
id_vars=['a', 'b'], value_name='y')
long_df = pd.merge(x_long, y_long).drop('variable', axis='columns')
grid = sns.FacetGrid(long_df, row='a', col='b')
grid.map(plt.scatter, 'x', 'y')
plt.show()
This will show you the following:
I believe the best, shortest and most comprehensible solution is to define an appositely created lambda function. It has as input the mapping variables specified by the FacetGrid.map method, and takes its values in form of numpy arrays by the .values[0], as they are unique.
import pandas as pd
d = [0]*4 # initialize a list with 4 elements
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
df = pd.DataFrame(d) # create the pandas dataframe
import seaborn as sns
import matplotlib.pyplot as plt
grid = sns.FacetGrid(df,row='a',col='b')
grid.map(lambda _x,_y,**kwargs : plt.scatter(_x.values[0],_y.values[0]),'x','y')

Box Plot of a many Pandas Dataframes

I have three dataframes containing 17 sets of data with groups A, B, and C. A shown in the following code snippet
import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C'])
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C'])
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C'])
I would like to plot a box plot to compare the three groups as shown in the figure below
I am trying make the plot using seaborn's box plot as follows
import seaborn as sns
sns.boxplot(data1, groupby='A','B','C')
but obviously this does not work. Can someone please help?
Consider assigning an indicator like Location to distinguish your three sets of data. Then concatenate all three and melt the data to retrieve one value column, one Letter categorical column, and one Location column, all inputs into sns.boxplot:
import pandas as pd
import numpy as np
from matplotlib.pyplot as plt
import seaborn as sns
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C']).assign(Location=1)
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C']).assign(Location=2)
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C']).assign(Location=3)
cdf = pd.concat([data1, data2, data3])
mdf = pd.melt(cdf, id_vars=['Location'], var_name=['Letter'])
print(mdf.head())
# Location Letter value
# 0 1 A 0.223565
# 1 1 A 0.515797
# 2 1 A 0.377588
# 3 1 A 0.687614
# 4 1 A 0.094116
ax = sns.boxplot(x="Location", y="value", hue="Letter", data=mdf)
plt.show()

Categories

Resources