Matplotlib vs PivotChart: Grouped Axis Labels - python

How can I format Matplotlib plots on multi-indexed data to resemble Excel's PivotChart axis layout? Excel's PivotChart feature groups similar axis labels together, whereas MPL labels each tick individually as (Index1,Index2). Using the Sample Data, I've provided the outputs for both Excel and MPL; notice how Index1 is grouped in the Excel chart, but not in the MPL plot.
data = {
'Index1': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'Index2': {0: 1, 1: 2, 2: 1, 3: 2},
'Value': {0: 50, 1: 100, 2: 50, 3: 100}
}
Matplotlib Chart
Excel Chart
Does anyone have a solution? Ideally, the number of multi-index levels will not matter. Thanks for the help!

Related

python seaborn: customize line plot and scatterplot together (also legend)

df = pd.DataFrame({
'id': {0: -3, 1: 2, 2: -3, 3: 1},
'val': {0: 0.4, 1: 0.03, 2: 0.88, 3: 1.3},
'indicator': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'count': {0: 40000, 1: 5779, 2: 3000, 3: 31090}
})
df
and I hope to get a plot like the following:
I know with code of following, I can get a close plot but I want to have the line size also varies with "count" variable, but when I tried to add size = 'count', I did not get a meaningful plot and also for the legend, I want to only have one legend for "indicator" rather than two:
plt.figure()
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df)
sns.scatterplot(x = 'id', y = 'val', hue = 'indicator', size = 'count', data = df)
To answer the second part of your question - you can disable the lineplot legend like so:
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df, legend=False)
This will leave you with two legend groups - one for colours and one for sizes. This is the easiest way, but you can also tinker with plt.legend() and build your own from scratch.
As for making the lines vary their thickness dynamically from one point to another, I don't think you can do it using seaborn. For something like that you'd need a more low-level library, like bokeh or use matplotlib directly to draw connecting lines between line markers, adjusting for their varying size.

Plotly: How to specify categorical x-axis elements in a plotly express animation?

I have the following data.
I am using a slider, so that I can slide through the different dates (please see the picture below to see what a slider is in case).
Now, as my category may change between the dates I want to initialize my x-axis range with A,B,C,E,F no matter what my date is. So sometimes I will have no data points in a category but this does not matter to me.
So how can initialize my x-axis range and make my data points adapt to the initialized x-axis?
I am using python3 and plotly express.
This is my code for now :
data.columns = ['price', 'category', 'date']
data = data.sort_values(by=['date', 'price'])
fig = px.scatter(data, x = "category", y = "price", animation_frame="date")
fig.update_layout(
yaxis_title="Price (€)",
)
fig['layout']['updatemenus'][0]['pad']['t'] = 180
fig['layout']['sliders'][0]['pad']['t'] = 200
fig.write_html("/home/**/Desktop/1.html", auto_play=True)
Ihope I was clear enough. Please let me know if you need any extra information. Any ideas or tips is welcome :)
The answer:
The only way you can make sure that all categories are represented on the x-axis for all animation frames is to make sure they appear in the first Date = X. So you can't actually fix the x-axis ranges in the figure itself. You'll have to do it through your representation of the data source.
The details:
So sometimes I will have no data points in a category but this does not matter to me.
Maybe not, but it will matter to plotly.express. Particularly if you by "have no data" mean that you do not have records for all categories in your dataset for all dates. You see, plotly seems to set the x-axis values to the categories it finds in the first unique values for Date = X which is A, B ,C. But don't worry, we'll handle that too. Let's use a slightly altered version of your data screenshot (next time, do this). I've added actual dates instead of X, Y and reduced the range of the numbers a bit since your particular data messes up the animation a bit.
If we use an animation approach like this:
fig = px.scatter(df1, x="Category", y="Price", animation_frame="Date",
color="Category", range_y=[0,20])
... you'll get two animation frames:
Plot 1, frame 1
Plot 1, frame 2
Now, lets use an approach to make sure alle categories are represented for all dates as you can find in the post Pandas: How to include all columns for all rows although value is missing in a dataframe with a long format?
Now you'll get:
Plot 2, frame 1
Plot 2, frame 2
I hope this is what you were looking for. Don't hesitate to let me know if not!
You'll get a slightly different result if you drop the df1.fillna(0) part. But I'll leave it up to you to mess around with all available options in the
Complete code:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'Date': {0: '24.08.2020',
1: '24.08.2020',
2: '24.08.2020',
3: '25.08.2020',
4: '25.08.2020',
5: '25.08.2020'},
'Category': {0: 'A', 1: 'B', 2: 'C', 3: 'C', 4: 'E', 5: 'F'},
'Price': {0: 1, 1: 2, 2: 3, 3: 3, 4: 10, 5: 13}})
# make sure that all category variables are represented for
# all dates even though not all variables have values.
df['key']=df.groupby(['Date','Category']).cumcount()
df1 = pd.pivot_table(df,index='Date',columns=['key','Category'],values='Price')
df1 = df1.stack(level=[0,1],dropna=False).to_frame('Price').reset_index()
df1 = df1[df1.key.eq(0) | df1['Price'].notna()]
df1=df1.fillna(0)
# ploty express animation
fig = px.scatter(df1, x="Category", y="Price", animation_frame="Date",
color="Category", range_y=[0,20])
# some extra settings.
fig.update_layout(transition = {'duration': 20000})
fig.show()

Is there any way to implement Stacked or Grouped Bar charts in plotly express

I am trying to implement a grouped-bar-chart (or) stacked-bar-chart in plotly express
I have implemented it using plotly (which is pretty straight forward) and below is code for it. There are altogether six columns in dataframe ['Rank', 'NOC', 'Gold', 'Silver', 'Bronze', 'Total']
`
trace1=go.Bar(x=olympics_data['NOC'],y=olympics_data['Gold'],marker=dict(color='green',opacity=0.5),name="Gold")
trace2=go.Bar(x=olympics_data['NOC'],y=olympics_data['Silver'],marker=dict(color='red',opacity=0.5),name="Silver")
trace3=go.Bar(x=olympics_data['NOC'],y=olympics_data['Bronze'],marker=dict(color='blue',opacity=0.5),name="Bronze")
data=[trace1,trace2,trace3]
layout = go.Layout(title="number of medals in each category for various countries",xaxis=dict(title="countries"),yaxis=dict(title="number of medals"),
barmode="stack")
fig = go.Figure(data,layout)
fig.show()`
Output:
I am expecting a similar output using plotly-express.
You can arrange your data to use px.bar() as in this link.
Or you can consider using relative in the barmode().
barmode (str (default 'relative')) – One of 'group', 'overlay' or
'relative' In 'relative' mode, bars are stacked above zero for
positive values and below zero for negative values. In 'overlay' mode,
bars are drawn on top of one another. In 'group' mode, bars are placed
beside each other.
Using overlay:
import plotly.express as px
iris = px.data.iris()
display(iris)
fig = px.histogram(iris, x='sepal_length', color='species',
nbins=19, range_x=[4,8], width=600, height=350,
opacity=0.4, marginal='box')
fig.update_layout(barmode='overlay')
fig.update_yaxes(range=[0,20],row=1, col=1)
fig.show()
Using relative:
fig.update_layout(barmode='relative')
fig.update_yaxes(range=[0,20],row=1, col=1)
fig.show()
Using group:
fig.update_layout(barmode='group')
fig.show()
Yes, Plotly Express support both stacked and grouped bars with px.bar(). Full documentation with examples is here https://plot.ly/python/bar-charts/
Here is a reusable function to do this.
def px_stacked_bar(df, color_name='category', y_name='y', **pxargs):
'''Row-wise stacked bar using plot-express.
Equivalent of `df.T.plot(kind='bar', stacked=True)`
`df` must be single-indexed'''
idx_col = df.index.name
m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)
return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs)
Example use
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
'B': {0: 1, 1: 3, 2: 5},
'C': {0: 2, 1: 4, 2: 6}})
px_stacked_bar(df.set_index('A'))

How to plot a pie chart in matplotlib with 3 columns?

I need to plot a pie chart using matplotlib but my DataFrame has 3 columns namely gender, segment and total_amount.
I have tried playing with plt.pie() arguments but it only takes x and labels for data. I tried setting gender as a legend but then it doesn't look right.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'gender': {0: 'Female',
1: 'Female',
2: 'Female',
3: 'Male',
4: 'Male',
5: 'Male'},
'Segment': {0: 'Gold',
1: 'Platinum',
2: 'Silver',
3: 'Gold',
4: 'Platinum',
5: 'Silver'},
'total_amount': {0: 2110045.0,
1: 2369722.0,
2: 1897545.0,
3: 2655970.0,
4: 2096445.0,
5: 2347134.0}})
plt.pie(data = df,x="claim_amount",labels="Segment")
plt.legend(d3.gender)
plt.show()
The result I want is a pie chart of total_amount and its labels as gender and segment. If I can get the percentage, it will be a bonus.
I suggest the following:
# Data to plot
# Take the information from the segment and label columns and join them into one string
labels = df["Segment"]+ " " + df["gender"].map(str)
# Extract the sizes of the segments
sizes = df["total_amount"]
# Plot with labels and percentage
plt.pie(sizes, labels=labels,autopct='%1.1f%%')
plt.show()
You should get this:

bar chart with Matplotlib

Here is my data structure:
data = {'2013': {1:25,2:81,3:15}, '2014': {1:28, 2:65, 3:75}, '2015': {1:78,2:91,3:86 }}
My x-axis is the number [1,2,3]
My y-axis is the quantity of each number. For example: In 2013, 1 is x axis while its quantity is 25.
Print each individual graph for each year
I would like to graph a bar chart, which uses matplotlib with legend on it.
import matplotlib.pyplot as plt
import pandas as pd
data = {'2013': {1:25,2:81,3:15}, '2014': {1:28, 2:65, 3:75}, '2015': {1:78,2:91,3:86 }}
df = pd.DataFrame(data)
df.plot(kind='bar')
plt.show()
I like pandas because it takes your data without having to do any manipulation to it and plot it.
You can access the keys of a dictionary via dict.keys() and the values via dict.values()
If you wanted to plot, say, the data for 2013 you can do:
import matplotlib.pyplot as pl
x_13 = data['2013'].keys()
y_13 = data['2013'].values()
pl.bar(x_13, y_13, label = '2013')
pl.legend()
That should do the trick. More elegantly, do can simply do:
year = '2013'
pl.bar(data[year].keys(), data[year].values(), label=year)
which woud allow you to loop it:
for year in ['2013','2014','2015']:
pl.bar(data[year].keys(), data[year].values(), label=year)
You can do this a few ways.
The Functional way using bar():
data = {'2013': {1: 25, 2: 81, 3: 15}, '2014': {1: 28, 2: 65, 3: 75}, '2015': {1: 78, 2: 91, 3: 86}}
df = pd.DataFrame(data)
X_axis = np.arange(len(df))
plt.bar(X_axis - 0.1,height=df["2013"], label='2013',width=.1)
plt.bar(X_axis, height=df["2014"], label='2014',width=.1)
plt.bar(X_axis + 0.1, height=df["2015"], label='2015',width=.1)
plt.legend()
plt.show()
More info here.
The Object-Oriented way using figure():
data = {'2013': {1: 25, 2: 81, 3: 15}, '2014': {1: 28, 2: 65, 3: 75}, '2015': {1: 78, 2: 91, 3: 86}}
df = pd.DataFrame(data)
fig= plt.figure()
axes = fig.add_axes([.1,.1,.8,.8])
X_axis = np.arange(len(df))
axes.bar(X_axis -.25,df["2013"], color ='b', width=.25)
axes.bar(X_axis,df["2014"], color ='r', width=.25)
axes.bar(X_axis +.25,df["2015"], color ='g', width=.25)

Categories

Resources