How to created stacked bar plot with spesific value? - python

I was trying to explore this dataset
https://www.kaggle.com/datasets/thedevastator/analyzing-credit-card-spending-habits-in-india/code?datasetId=2731425&sortBy=voteCount
and I want to create a stacked bar of the 4 countries with the highest spend
I use this syntax
dfg = df.groupby(['City']).sum().sort_values(by='Amount', ascending = False).head(4).reset_index()
fig = px.histogram(dfg, x='City', y = 'Amount')
fig.show()
but I found it difficult to make it stacked, I tried using pivot but it ain't work too, any way to make this possible?

If you want a grouped bar plot, you should use the px.bar command, not px.histogram. To have stacked bars you need to add a new column with a dummy group (or meaningful if you have several countries):
px.bar(dfg.assign(country='India'), x='country', color='City', y = 'Amount')
Output:
To get the country from the original City column:
df[['City', 'Country']] = df['City'].str.split(', ', n=1, expand=True)
dfg = (df.groupby(['City', 'Country']).sum().sort_values(by='Amount', ascending = False)
.groupby('Country').head(4).reset_index()
)
px.bar(dfg, x='Country', color='City', y = 'Amount')

Related

How to maintain a fixed width and positioning of bars in a chart if there is data for less bars than expected?

I have a 12 months rolling indicator which I need to display as a bar chart.
The source is a .csv file (a few lines shown, number of records could be any number >=1):
Measurement Month,KRI Value
2022-02,25
2022-03,28
2022-04,26
If there are 12 or more months of measurements, the charts look like this:
Meanwhile there are indicators that have been measured only for two months, and the chart looks like this (not nice from the design perspective):
Instead of those two bars taking up the whole space, I'd like to have a chart, where the bar width would be the same as in the 12-month example, yet, only the bars for the months with measurement data would be shown.
To build the chart I am using a function that takes a Pandas' dataframe (data), adds the bar color (based on threshold) and opacity columns (so that current month is in focus) to the dataframe. Then it loops through the dataframe and creates the bars. I have kept only the parts of the function that are related to the plotting the bars.
def create_kri_chart(data, amber_threshold, red_threshold):
# data - Pandas df with the input data
MAX_MONTHS = 12
# MARGIN_VALUE = 5
actual_months = len(data)
if actual_months > MAX_MONTHS:
recent_data = data[-MAX_MONTHS:]
else:
recent_data = data
recent_data = recent_data.reset_index(drop=True)
recent_data['color'] = 'green'
recent_data.loc[(recent_data['KRI Value'] >= amber_threshold) & (recent_data['KRI Value'] < red_threshold), 'color'] = 'orange'
recent_data.loc[recent_data['KRI Value'] >= red_threshold, 'color'] = 'red'
opacities = [1.0 if i >= len(recent_data) - 1 else 0.6 for i in range(len(recent_data))]
recent_data['opacity'] = opacities
fig, ax = plt.subplots(figsize=(6,3)) #
# this is where the bars are added to the chart
for i, row in recent_data.iterrows():
ax.bar(row['Measurement Month'], row['KRI Value'], color=row['color'], alpha=row['opacity'], width=0.6)
return fig, ax
I just tried to add some dummy bars to fill up the empty slots, like this:
# create the meaningful bars
for i, row in recent_data.iterrows():
ax.bar(row['Measurement Month'], row['KRI Value'], color=row['color'], alpha=row['opacity'], width=0.6)
# create dummy bars
for i in range(0, MAX_MONTHS-actual_months):
ax.bar("", 0, color="white", alpha=0, width=0.6)
However, this approach does not work, and I get a StopIteration exception which is triggered by this code:
ax.bar("", 0, color="white", alpha=0, width=0.6)
This is my first time with matplotlib, so any advice how to solve the problem is strongly appreciated! I reckon there might also be better solutions than creating dummy invisible bars.

How to plot a chart with text variables using plotly dash

I have a df that looks like this:
image of the dataframe
my goal is to make a line chart that sums up the codes for each month and, after this, add a dropdown to be able to filter between 'type', group' and 'Spec.'
If I didn't want the dropdown filter, I could achieve this with
`df.groupby('month')['code'].count().reset_index()`
Since I need the filters, the ideal is to be able to do this sum in the graph code in plotly, so I don't lose the 'type', group' and 'Spec.' columns.
I tryed this code:
`line_fig1 = px.line(data_frame = df,
x= 'month',
y='code',
labels={'month':'','code':''},
title='',
width=450,
height=250,
template='plotly_white',
color_discrete_sequence= ["rgb(1, 27, 105)"],
markers=True,
text='code'
)`
and this was the result:
image of the chart
I also tryed something like
`line_fig1 = px.line(data_frame = df,
x= 'month',
y='code'.count()`
or even tryed to add a column with a number one, so the chart could aggregate
`df['assign_value'] = 1
line_fig1 = px.line(data_frame = df,
x= 'month',
y='assign_value'`
But this also don't work.
Any help here?
I think you should groupby by month and code and then use new dataframe to make line graph. Something as below:
df2 = df.groupby(['month', 'code'])['code'].count().reset_index(name='counts')
fig = px.line(df2,x='month',y='counts', color='code')
fig.show()

How to plot a pie chart from an object dataframe column in python?

I want to plot a column information as a pie chart. How to make it?
redemption_type = redemptions['redemption_type']
redemption_type.describe()
count 641493
unique 12
top MPPAID
freq 637145
Name: redemption_type, dtype: object
This pie chart should consist of 12 different values with their frequency.
Here is the easiest way
redemptions['redemption_type'].value_counts().plot(kind='pie')
Here is one with plotly-express
temp = pd.DataFrame(redemptions['redemption_type'].value_counts())
temp.index.name = 'val'
temp.columns = ['count']
temp = temp.reset_index()
temp
fig = px.pie(temp, names='val', values='count')
# fig.update_traces(textinfo='value') # uncomment this line if you want actual value on the chart instead of %
fig.show()

Stacked Area Chart in Python

I'm working on an assignment from school, and have run into a snag when it comes to my stacked area chart.
The data is fairly simple: 4 columns that look similar to this:
Series id
Year
Period
Value
LNS140000
1948
M01
3.4
I'm trying to create a stacked area chart using Year as my x and Value as my y and breaking it up over Period.
#Stacked area chart still using unemployment data
x = d.Year
y = d.Value
plt.stackplot(x, y, labels = d['Period'])
plt.legend(d['Period'], loc = 'upper left')
plt.show()enter code here`
However, when I do it like this it only picks up M01 and there are M01-M12. Any thoughts on how I can make this work?
You need to preprocess your data a little before passing them to the stackplot function. I took a look at this link to work on an example that could be suitable for your case.
Since I've seen one row of your data, I add some random values to the dataset.
import pandas as pd
import matplotlib.pyplot as plt
dd=[[1948,'M01',3.4],[1948,'M02',2.5],[1948,'M03',1.6],
[1949,'M01',4.3],[1949,'M02',6.7],[1949,'M03',7.8]]
d=pd.DataFrame(dd,columns=['Year','Period','Value'])
years=d.Year.unique()
periods=d.Period.unique()
#Now group them per period, but in year sequence
d.sort_values(by='Year',inplace=True) # to ensure entire dataset is ordered
pds=[]
for p in periods:
pds.append(d[d.Period==p]['Value'].values)
plt.stackplot(years,pds,labels=periods)
plt.legend(loc='upper left')
plt.show()
Is that what you want?
So I was able to use Seaborn to help out. First I did a pivot table
df = d.pivot(index = 'Year',
columns = 'Period',
values = 'Value')
df
Then I set up seaborn
plt.style.use('seaborn')
sns.set_style("white")
sns.set_theme(style = "ticks")
df.plot.area(figsize = (20,9))
plt.title("Unemployment by Year and Month\n", fontsize = 22, loc = 'left')
plt.ylabel("Values", fontsize = 22)
plt.xlabel("Year", fontsize = 22)
It seems to me that the problem you are having relates to the formatting of the data. Look how the values are formatted in this matplotlib example. I would try to groupby the data by period, or pivot it in the correct format, and then graphing again.

Is there a way to set a custom baseline for a stacked area chart in Plotly?

For context, what I'm trying to do is make an emission abatement chart that has the abated emissions being subtracted from the baseline. Mathematically, this is the same as adding the the abatement to the residual emission line:
Residual = Baseline - Abated
The expected results should look something like this:
Desired structure of stacked area chart:
I've currently got the stacked area chart to look like this:
As you can see, the way that the structure of stacked area chart is that the stacking starts at zero, however, I'm trying to get the stacking to either be added to the residual (red) line, or to be subtracted from the baseline (black).
I would do this in excel by just defining a blank area as the first stacked item, equal the residual line, so that the stacking occurs ontop of that. However, I'm not sure if there is a pythonic way to do this in plotly, while mainting the structure and interactivity of the chart.
The shaping of the pandas dataframes is pretty simple, just a randomly generated series of abatement values for each of the subcategories I've set up, that are then grouped to form the baseline and the residual forecasts:
scenario = 'High'
# The baseline data as a line
baseline_line = baselines.loc[baselines['Scenario']==scenario].groupby(['Year']).sum()
# The abatement and residual data
df2 = deepcopy(abatement).drop(columns=['tCO2e'])
df2['Baseline'] = baselines['tCO2e']
df2['Abatement'] = abatement['tCO2e']
df2 = df2.fillna(0)
df2['Residual'] = df2['Baseline'] - df2['Abatement']
df2 = df2.loc[abatement['Scenario']==scenario]
display(df2)
# The residual forecast as a line
emissions_lines = df2.loc[df2['Scenario']==scenario].groupby(['Year']).sum()
The charting is pretty simple as well, using the plotly express functionality:
# Just plotting
fig = px.area(df2,
x = 'Year',
y = 'Abatement',
color = 'Site',
line_group = 'Fuel_type'
)
fig2 = px.line(emissions_lines,
x = emissions_lines.index,
y = 'Baseline',
color_discrete_sequence = ['black'])
fig3 = px.line(emissions_lines,
x = emissions_lines.index,
y = 'Residual',
color_discrete_sequence = ['red'])
fig.add_trace(
fig2.data[0],
)
fig.add_trace(
fig3.data[0],
)
fig.show()
To summarise, I wish to have the Plotly stacked area chart be 'elevated' so that it fits between the residual and baseline forecasts.
NOTE: I've used the term 'baseline' with two meanings here. One specific to my example, and one generic to stacked area chart (in the title). The first usage, in the title, is meant to be the series for which the stacked area chart starts. Currently, this series is just the x-axis, or zero, I'm wishing to have this customised so that I can define a series (in this example, the red residual line) that the stacking can start from.
The second usage of the term 'baseline' refers to the 'baseline forecast', or BAU.
I think I've found a workaround, it is not ideal, but is similar to the approach I have taken in excel. I've ultimately added the 'residual' emissions in the same structure as the categories and concatenated it at the start of the DataFrame, so it bumps everything else up in between the residual and baseline forecasts.
Concatenation step:
# Me trying to make it cleanly at the residual line
df2b = deepcopy(emissions_lines)
df2b['Fuel_type'] = "Base"
df2b['Site'] = "Base"
df2b['Abatement'] = df2b['Residual']
df2c = pd.concat([df2b.reset_index(),df2],ignore_index=True)
Rejigged plotting step, with some recolouring/reformatting of the chart:
# Just plotting
fig = px.area(df2c,
x = 'Year',
y = 'Abatement',
color = 'Site',
line_group = 'Fuel_type'
)
# Making the baseline invisible and ignorable
fig.data[0].line.color='rgba(255, 255, 255,1)'
fig.data[0].showlegend = False
fig2 = px.line(emissions_lines,
x = emissions_lines.index,
y = 'Baseline',
color_discrete_sequence = ['black'])
fig3 = px.line(emissions_lines,
x = emissions_lines.index,
y = 'Residual',
color_discrete_sequence = ['red'])
fig.add_trace(
fig2.data[0],
)
fig.add_trace(
fig3.data[0],
)
fig.show()
Outcome:
I'm going to leave this unresolved, as I see this as not what I originally intended. It currently 'works', but this is not ideal and causes some issues with the interaction with the legend function in the Plotly object.

Categories

Resources