I have a df that looks like this:
image of the dataframe
my goal is to make a line chart that sums up the codes for each month and, after this, add a dropdown to be able to filter between 'type', group' and 'Spec.'
If I didn't want the dropdown filter, I could achieve this with
`df.groupby('month')['code'].count().reset_index()`
Since I need the filters, the ideal is to be able to do this sum in the graph code in plotly, so I don't lose the 'type', group' and 'Spec.' columns.
I tryed this code:
`line_fig1 = px.line(data_frame = df,
x= 'month',
y='code',
labels={'month':'','code':''},
title='',
width=450,
height=250,
template='plotly_white',
color_discrete_sequence= ["rgb(1, 27, 105)"],
markers=True,
text='code'
)`
and this was the result:
image of the chart
I also tryed something like
`line_fig1 = px.line(data_frame = df,
x= 'month',
y='code'.count()`
or even tryed to add a column with a number one, so the chart could aggregate
`df['assign_value'] = 1
line_fig1 = px.line(data_frame = df,
x= 'month',
y='assign_value'`
But this also don't work.
Any help here?
I think you should groupby by month and code and then use new dataframe to make line graph. Something as below:
df2 = df.groupby(['month', 'code'])['code'].count().reset_index(name='counts')
fig = px.line(df2,x='month',y='counts', color='code')
fig.show()
Related
I was trying to explore this dataset
https://www.kaggle.com/datasets/thedevastator/analyzing-credit-card-spending-habits-in-india/code?datasetId=2731425&sortBy=voteCount
and I want to create a stacked bar of the 4 countries with the highest spend
I use this syntax
dfg = df.groupby(['City']).sum().sort_values(by='Amount', ascending = False).head(4).reset_index()
fig = px.histogram(dfg, x='City', y = 'Amount')
fig.show()
but I found it difficult to make it stacked, I tried using pivot but it ain't work too, any way to make this possible?
If you want a grouped bar plot, you should use the px.bar command, not px.histogram. To have stacked bars you need to add a new column with a dummy group (or meaningful if you have several countries):
px.bar(dfg.assign(country='India'), x='country', color='City', y = 'Amount')
Output:
To get the country from the original City column:
df[['City', 'Country']] = df['City'].str.split(', ', n=1, expand=True)
dfg = (df.groupby(['City', 'Country']).sum().sort_values(by='Amount', ascending = False)
.groupby('Country').head(4).reset_index()
)
px.bar(dfg, x='Country', color='City', y = 'Amount')
With plotly express I've built a bar chart similar to as shown on their website.
As px.bar did not allow me to run the animation frame on datetime64[ns] I transformed the datetime into a string as follows.
eu_vaccine_df['date_str'] = eu_vaccine_df['date'].apply(lambda x: str(x))
eu_vaccine_df[['date_str', 'date', 'country', 'people_vaccinated_per_hundred']].head()
The dataset on which I then run the px.bar looks as follows and contains 30 different countries.
The code for my barchart including animation looks as follows.
fig = px.bar(
eu_vaccine_df,
x='country', y='people_vaccinated_per_hundred',
color='country',
animation_frame='date_str',
animation_group='country',
hover_name='country',
range_y=[0,50],
range_x=[0,30]
)
fig.update_layout(
template='plotly_dark',
margin=dict(r=10, t=25, b=40, l=60)
)
fig.show()
In the end result the date on the animation frame is wrong. It first shows all results from 2021 and then all results from 2020 as shown at the bottom of the following screenshot.
Sorting my df by the date solved the issue.
covid_df['date'] = pd.to_datetime(covid_df['date'])
covid_df = covid_df.sort_values('date', ascending=True)
covid_df['date'] = covid_df['date'].dt.strftime('%m-%d-%Y')
I'm working on an assignment from school, and have run into a snag when it comes to my stacked area chart.
The data is fairly simple: 4 columns that look similar to this:
Series id
Year
Period
Value
LNS140000
1948
M01
3.4
I'm trying to create a stacked area chart using Year as my x and Value as my y and breaking it up over Period.
#Stacked area chart still using unemployment data
x = d.Year
y = d.Value
plt.stackplot(x, y, labels = d['Period'])
plt.legend(d['Period'], loc = 'upper left')
plt.show()enter code here`
However, when I do it like this it only picks up M01 and there are M01-M12. Any thoughts on how I can make this work?
You need to preprocess your data a little before passing them to the stackplot function. I took a look at this link to work on an example that could be suitable for your case.
Since I've seen one row of your data, I add some random values to the dataset.
import pandas as pd
import matplotlib.pyplot as plt
dd=[[1948,'M01',3.4],[1948,'M02',2.5],[1948,'M03',1.6],
[1949,'M01',4.3],[1949,'M02',6.7],[1949,'M03',7.8]]
d=pd.DataFrame(dd,columns=['Year','Period','Value'])
years=d.Year.unique()
periods=d.Period.unique()
#Now group them per period, but in year sequence
d.sort_values(by='Year',inplace=True) # to ensure entire dataset is ordered
pds=[]
for p in periods:
pds.append(d[d.Period==p]['Value'].values)
plt.stackplot(years,pds,labels=periods)
plt.legend(loc='upper left')
plt.show()
Is that what you want?
So I was able to use Seaborn to help out. First I did a pivot table
df = d.pivot(index = 'Year',
columns = 'Period',
values = 'Value')
df
Then I set up seaborn
plt.style.use('seaborn')
sns.set_style("white")
sns.set_theme(style = "ticks")
df.plot.area(figsize = (20,9))
plt.title("Unemployment by Year and Month\n", fontsize = 22, loc = 'left')
plt.ylabel("Values", fontsize = 22)
plt.xlabel("Year", fontsize = 22)
It seems to me that the problem you are having relates to the formatting of the data. Look how the values are formatted in this matplotlib example. I would try to groupby the data by period, or pivot it in the correct format, and then graphing again.
I want to make line chart for the different categories where one is a different country, and one is a different country for weekly based line charts. Initially, I was able to draft line plots using seaborn but it is not quite handy like setting its label, legend, color palette and so on. I am wondering is there any way to easily reshape this data with multiple categorical variables and render line charts. In initial attempt, I tried seaborn.relplot but it is not easy to tune its parameter and hard to customize the resulted plot. Can anyone point me to any efficient way to reshape dataframe with multiple categorical columns and render a clear line chart? Any thoughts?
reproducible data & my attempt:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
dff = pd.read_csv(url, parse_dates=['weekly'])
dff.drop('Unnamed: 0', axis=1, inplace=True)
df2_bf = dff.groupby(['destination', 'weekly'])['FCF_Beef'].sum().unstack()
df2_bf = df2_bf.fillna(0)
mm = df2_bf.T
mm.columns.name = None
mm = mm[~(mm.isna().sum(1)/mm.shape[1]).gt(0.9)].fillna(0)
#Total sum per column:
mm.loc['Total',:]= mm.sum(axis=0)
mm1 = mm.T
mm1 = mm1.nlargest(6, columns=['Total'])
mm1.drop('Total', axis=1, inplace=True)
mm2 = mm1.T
mm2.reset_index(inplace=True)
mm2['weekly'] = pd.to_datetime(mm2['weekly'])
mm2['year'] = mm2['weekly'].dt.year
mm2['week'] = mm2['weekly'].dt.isocalendar().week
df = mm2.melt(id_vars=['weekly','week','year'], var_name='country')
df_ = df.groupby(['country', 'year', 'week'], as_index=False)['value'].sum()
sns.relplot(data=df_, x='week', y='value', hue='year', row='country', kind='line', height=6, aspect=2, facet_kws={'sharey': False, 'sharex': False}, sizes=(20, 10))
current plot
this is one of current plot that I made with seaborn.relplot
structure of plot is okay for me, but in seaborn.replot, it is hard to tune parameter and it is as flexible as using matplotlib. Also, I realized that the way of aggregating my data is not very efficient. I think there might be a shortcut to make the above code snippet more efficient like:
plt_data = []
for i in dff.loc[:, ['FCF_Beef','FCF_Beef']]:
...
but doing this way I faced a couple of issues to make the right plot. Can anyone point me out how to make this simple and efficient in order to make the expected line chart with matplotlib? Does anyone know any better way of doing this? Any idea? Thanks
desired output
In my desired plot, first I need to iterate list of countries, where each country has one subplot, in each subplot, x-axis shows 52 weeks and y-axis shows weeklyExport amount of different years for each country. Here is draft plot that I made with seaborn.relplot.
note that, I don't like the output from seaborn.relplot, so I am wondering how can I make above attempt more efficient with matplotlib attempt. Any idea?
As requested by the OP, following is an iterative way to plot the data.
The following example plots each year, for a given 'destination' in a single figure
This is similar to the answer for this question.
import pandas as pd
import matplotlib.pyplot as plt
# load the data
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
df = pd.read_csv(url, parse_dates=['weekly'], usecols=range(1, 6))
# groupby destination and iterate through for plotting
for g, d in df.groupby(['destination']):
# create the figure
fig, ax = plt.subplots(figsize=(7, 4))
# add lines for specific years
for year in d.weekly.dt.year.unique():
data = d[d.weekly.dt.year == year].copy() # select the data from d, by year
data['week'] = data.weekly.dt.isocalendar().week # create a week column
data.sort_values('weekly', inplace=True)
display(data.head()) # display is for jupyter, if it causes an error, use pring
data.plot(x='week', y='FCF_Beef', ax=ax, label=year)
plt.show()
Single sample plot
If we look at the tail of one of the dataframes, data.weekly.dt.isocalendar().week as putting the last day of the year as week 1, so a line is drawn back to the last data point being placed at week 1.
This function rests on datetime.datetime(2018, 12, 31).isocalendar() and is the expected behavior from the datetime module, as per this closed pandas bug.
Removing the last row with .iloc[:-1, :], is a work around
Alternatively, replace data['week'] = data.weekly.dt.isocalendar().week with data['week'] = data.weekly.dt.strftime('%W').astype('int')
data.iloc[:-1, :].plot(x='week', y='FCF_Beef', ax=ax, label=year)
Updated with all code from OP
# load the data
url = 'https://gist.githubusercontent.com/adamFlyn/cb0553e009933574ac7ec3109ffb5140/raw/a277bc00dc08e526a7d5b7ead5425905f7206bfa/export.csv'
dff = pd.read_csv(url, parse_dates=['weekly'], usecols=range(1, 6))
df2_bf = dff.groupby(['destination', 'weekly'])['FCF_Beef'].sum().unstack()
df2_bf = df2_bf.fillna(0)
mm = df2_bf.T
mm.columns.name = None
mm = mm[~(mm.isna().sum(1)/mm.shape[1]).gt(0.9)].fillna(0)
#Total sum per column:
mm.loc['Total',:]= mm.sum(axis=0)
mm1 = mm.T
mm1 = mm1.nlargest(6, columns=['Total'])
mm1.drop('Total', axis=1, inplace=True)
mm2 = mm1.T
mm2.reset_index(inplace=True)
mm2['weekly'] = pd.to_datetime(mm2['weekly'])
mm2['year'] = mm2['weekly'].dt.year
mm2['week'] = mm2['weekly'].dt.strftime('%W').astype('int')
df = mm2.melt(id_vars=['weekly','week','year'], var_name='country')
# groupby destination and iterate through for plotting
for g, d in df.groupby(['country']):
# create the figure
fig, ax = plt.subplots(figsize=(7, 4))
# add lines for specific years
for year in d.weekly.dt.year.unique():
data = d[d.weekly.dt.year == year].copy() # select the data from d, by year
data.sort_values('weekly', inplace=True)
display(data.head()) # display is for jupyter, if it causes an error, use pring
data.plot(x='week', y='value', ax=ax, label=year, title=g)
plt.show()
I would like to use seaborn (matplotlib would be good, too) to create a barplot from my DataFrame.
But judging from the docs, the barplot function expects a list of values that looks like this:
Then you can plot it with:
tips = sns.load_dataset("tips")
sns.barplot(x="day", y="total_bill", hue="sex", data=tips)
My data looks different, I have created a multi_index in the columns. Since I can't publish my original data, here is a mockup on how it would look for the tips dataset:
And here is the code that creates the above dataframe:
index_tuples=[]
for sex in ["Male", "Female"]:
for day in ["Sun", "Mon"]:
index_tuples.append([sex, day])
index = pd.MultiIndex.from_tuples(index_tuples, names=["sex", "day"])
dataframe = pd.DataFrame(columns = index)
total_bill = {"Male":{"Sun":5, "Mon":3},"Female":{"Sun":10, "Mon":5}}
dataframe = dataframe.append(pd.DataFrame.from_dict(total_bill).unstack().rename('total_bill'))
Now, my question is: How can I create a barplot from this multiindex ?
The solution should group the bars correctly, as does the hue argument of seaborn. Simply getting the data as an array and passing it to matplotlib doesn't work.
My solution so far is to converting the multiindex into columns by repeatedly stacking the DataFrame. Like this:
stacked_frame = dataframe.stack().stack().to_frame().reset_index()
It results in the data layout expected by seaborn:
And you can plot it with
sns.barplot(x="day", y=0, hue="sex", data=stacked_frame)
plt.show()
Can I create a barplot directly from the multiindex ?
Is this what you are looking for?
idx = pd.MultiIndex.from_product([['M', 'F'], ['Mo', 'Tu', 'We']], names=['sex', 'day'])
df = pd.DataFrame(np.random.randint(2, 10, size=(idx.size, 1)), index=idx, columns=['total bill'])
df.unstack(level=0)['total bill'].plot(
kind='bar'
)
plt.ylabel('total bill');