I am currently using plotly express to create a Sunburst Chart. However, i realized that children are ordered alphabetical for nominal values. Especially for plotting months that is pretty unlucky... Do you know how to handle that issue? Maybe a property or some workaround? Below there is an example so you can try it yourself. Thanks in advance!
import plotly.express as px
import pandas as pd
import calendar
months = [x for x in calendar.month_name if x]
#Create Dataframe
data = []
for m in months:
data.append(['2018', m, 2])
df = pd.DataFrame(data, columns=['Year', 'Month', 'Value'])
#Compute Sunburst
fig = px.sunburst(df, path=['Year', 'Month'], values='Value')
fig.show()
Please Check this out. I have just added values to each months instead of hardcoding 2. So the corresponding month matches with corresponding number.
January-1, February-2, ... December-12
import plotly.express as px
import pandas as pd
import calendar
months = [x for x in calendar.month_name if x]
#Create Dataframe
data = []
for i,m in enumerate(months):
data.append(['2018', m,i+1])
print(data)
df = pd.DataFrame(data, columns=['Year', 'Month', 'Value'])
#Compute Sunburst
fig = px.sunburst(df, path=['Year', 'Month'], values='Value')
fig.show()
The other solution gives each month an angle proportional to its number. A small tweak to line 8 as follows:
data.append(['2018', m,0.00001*i+1])
gives each month the same sized piece of the pie.
A better solution is to disable the auto-sorting of the elements:
fig.update_traces(sort=False, selector=dict(type='sunburst'))
which then adds the elements in the order that they are defined in the data.
Related
I try to plot a bar-chart from a givin dataframe.
x-axis = dates
y-axis = number of occurences for each month
The result should be a barchart. Each x is an occurrence.
x
xx
x
2020-1
2020-2
2020-3
2020-4
2020-5
I tried but don't get the desired result as above.
import datetime as dt
import pandas as pd
import numpy as np
import plotly.offline as pyo
import plotly.graph_objs as go
# initialize list of lists
data = [['a', '2022-01-05'], ['a', '2022-02-14'], ['a', '2022-02-15'],['a', '2022-05-14']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Date'])
# print dataframe.
df['Date']=pd.to_datetime(df['Date'])
# plot dataframe
trace1=go.Bar(
#
x = df.Date.dt.month,
y = df.Name.groupby(df.Date.dt.month).count()
)
data=[trace1]
fig=go.Figure(data=data)
pyo.plot(fig)
Remove the last line and write instead:
fig.show()
Edit:
It's unclear to me whether you have 1 dimensional or 2 dimensional data here. Supposing you have 1d data, this is, just a bunch of dates that you want to aggregate in a bar chart, simply do this:
# initialize list of lists
data = ['2022-01-05', '2022-02-14', '2022-02-15', '2022-05-14']
# Create the pandas DataFrame
df = pd.DataFrame(data)
# plot dataframe
fig = px.bar(df)
If, instead, you have 2d data then what you want is a scatter plot, not a bar chart.
I'm trying to create an animated line plot to illustrate the price increase of 3 different asset classes over time (year), but it doesn't work and I don't know why!
What I've done so far:
get closing price data for each asset
start = datetime.datetime(2010,7,01)
end = datetime.datetime(2021,7,01)
data = pdr.get_data_yahoo(['BTC-USD', 'GC=F','^GSPC'],startDate,endDate)['Adj Close']
transpose columns into rows to avoid a lot of calculations
data['Date'] = data.index
data['Year'] = data.index.year
dataNew =data.melt(['Date', 'Year'], var_name='Asset')
dataNew = dataNew.rename(columns = {'value': 'Price'})
plot
fig = px.line(dataNew,
x = 'Date',
y = 'Price',
range_y=[0,50000],
color = 'Asset',
animation_frame = 'Year')
st.write(fig)
Output:
Short answer:
This is in fact very much possible, and the only additions you'll have to make to a standard px.line() time series using axes of type date plot setup is this:
# input data
dfi = px.data.stocks().head(50)
# new datastructure for animation
df = pd.DataFrame() # container for df with new datastructure
for i in np.arange(start,obs):
dfa = dfi.head(i).copy()
dfa['ix']=i
df = pd.concat([df, dfa])
The cool details:
Contrary to what seems to be the most common belief, and contrary to my own comments just a few minutes ago, this is in fact possible to do with px.line and a set of time series as you're describing. As long, as you massage the dataset just a little bit. The only real drawback seems to be that it might not work very well for larger datasets, since the amount of data that the figure structure will contain will be huge. But let's get back to the boring details after the cool stuff. The snippet below and the dataset px.data.stocks() will produce the following figure:
Plot 1 - Animation using the play button:
When the animation has come to an end, you can also subset the lines however you'd like.
Plot 2 - Animation using the slider:
The boring details:
I'll get back to this if the OP or anyone else is interested
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
# input data
dfi = px.data.stocks().head(50)
dfi['date'] = pd.to_datetime(dfi['date'])
start = 12
obs = len(dfi)
# new datastructure for animation
df = pd.DataFrame() # container for df with new datastructure
for i in np.arange(start,obs):
dfa = dfi.head(i).copy()
dfa['ix']=i
df = pd.concat([df, dfa])
# plotly figure
fig = px.line(df, x = 'date', y = ['GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT'],
animation_frame='ix',
# template = 'plotly_dark',
width=1000, height=600)
# attribute adjusments
fig.layout.updatemenus[0].buttons[0]['args'][1]['frame']['redraw'] = True
fig.show()
I am trying to write a for loop that for distplot subplots.
I have a dataframe with many columns of different lengths. (not including the NaN values)
fig = make_subplots(
rows=len(assets), cols=1,
y_title = 'Hourly Price Distribution')
i=1
for col in df_all.columns:
fig = ff.create_distplot([[df_all[[col]].dropna()]], col)
fig.append()
i+=1
fig.show()
I am trying to run a for loop for subplots for distplots and get the following error:
PlotlyError: Oops! Your data lists or ndarrays should be the same length.
UPDATE:
This is an example below:
df = pd.DataFrame({'2012': np.random.randn(20),
'2013': np.random.randn(20)+1})
df['2012'].iloc[0] = np.nan
fig = ff.create_distplot([df[c].dropna() for c in df.columns],
df.columns,show_hist=False,show_rug=False)
fig.show()
I would like to plot each distribution in a different subplot.
Thank you.
Update: Distribution plots
Calculating the correct values is probably both quicker and more elegant using numpy. But I often build parts of my graphs using one plotly approach(figure factory, plotly express) and then use them with other elements of the plotly library (plotly.graph_objects) to get what I want. The complete snippet below shows you how to do just that in order to build a go based subplot with elements from ff.create_distplot. I'd be happy to give further explanations if the following suggestion suits your needs.
Plot
Complete code
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import plotly.graph_objects as go
df = pd.DataFrame({'2012': np.random.randn(20),
'2013': np.random.randn(20)+1})
df['2012'].iloc[0] = np.nan
df = df.reset_index()
dfm = pd.melt(df, id_vars=['index'], value_vars=df.columns[1:])
dfm = dfm.dropna()
dfm.rename(columns={'variable':'year'}, inplace = True)
cols = dfm.year.unique()
nrows = len(cols)
fig = make_subplots(rows=nrows, cols=1)
for r, col in enumerate(cols, 1):
dfs = dfm[dfm['year']==col]
fx1 = ff.create_distplot([dfs['value'].values], ['distplot'],curve_type='kde')
fig.add_trace(go.Scatter(
x= fx1.data[1]['x'],
y =fx1.data[1]['y'],
), row = r, col = 1)
fig.show()
First suggestion
You should:
1. Restructure your data with pd.melt(df, id_vars=['index'], value_vars=df.columns[1:]),
2. and the use the occuring column 'variable' to build subplots for each year through the facet_row argument to get this:
In the complete snippet below you'll see that I've changed 'variable' to 'year' in order to make the plot more intuitive. There's one particularly convenient side-effect with this approach, namely that running dfm.dropna() will remove the na value for 2012 only. If you were to do the same thing on your original dataframe, the corresponding value in the same row for 2013 would also be removed.
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'2012': np.random.randn(20),
'2013': np.random.randn(20)+1})
df['2012'].iloc[0] = np.nan
df = df.reset_index()
dfm = pd.melt(df, id_vars=['index'], value_vars=df.columns[1:])
dfm = dfm.dropna()
dfm.rename(columns={'variable':'year'}, inplace = True)
fig = px.histogram(dfm, x="value",
facet_row = 'year')
fig.show()
I have retail beef ad counts time series data, and I intend to make stacked line chart aim to show On a three-week average basis, quantity of average ads that grocers posted per store last week. To do so, I managed to aggregate data for plotting and tried to make line chart that I want. The main motivation is based on context of the problem and desired plot. In my attempt, I couldn't get very nice line chart because it is not informative to understand. I am wondering how can I achieve this goal in matplotlib. Can anyone suggest me what should I do from my current attempt? Any thoughts?
reproducible data and current attempt
Here is minimal reproducible data that I used in my current attempt:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from datetime import timedelta, datetime
url = 'https://gist.githubusercontent.com/adamFlyn/96e68902d8f71ad62a4d3cda135507ad/raw/4761264cbd55c81cf003a4219fea6a24740d7ce9/df.csv'
df = pd.read_csv(url, parse_dates=['date'])
df.drop(columns=['Unnamed: 0'], inplace=True)
df_grp = df.groupby(['date', 'retail_item']).agg({'number_of_ads': 'sum'})
df_grp["percentage"] = df_grp.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
df_grp = df_grp.reset_index(level=[0,1])
for item in df_grp['retail_item'].unique():
dd = df_grp[df_grp['retail_item'] == item].groupby(['date', 'percentage'])[['number_of_ads']].sum().reset_index(level=[0,1])
dd['weakly_change'] = dd[['percentage']].rolling(7).mean()
fig, ax = plt.subplots(figsize=(8, 6), dpi=144)
sns.lineplot(dd.index, 'weakly_change', data=dd, ax=ax)
ax.set_xlim(dd.index.min(), dd.index.max())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
plt.gcf().autofmt_xdate()
plt.style.use('ggplot')
plt.xticks(rotation=90)
plt.show()
Current Result
but I couldn't get correct line chart that I expected, I want to reproduce the plot from this site. Is that doable to achieve this? Any idea?
desired plot
here is the example desired plot that I want to make from this minimal reproducible data:
I don't know how should make changes for my current attempt to get my desired plot above. Can anyone know any possible way of doing this in matplotlib? what else should I do? Any possible help would be appreciated. Thanks
Also see How to create a min-max plot by month with fill_between?
See in-line comments for details
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar
#################################################################
# setup from question
url = 'https://gist.githubusercontent.com/adamFlyn/96e68902d8f71ad62a4d3cda135507ad/raw/4761264cbd55c81cf003a4219fea6a24740d7ce9/df.csv'
df = pd.read_csv(url, parse_dates=['date'])
df.drop(columns=['Unnamed: 0'], inplace=True)
df_grp = df.groupby(['date', 'retail_item']).agg({'number_of_ads': 'sum'})
df_grp["percentage"] = df_grp.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
df_grp = df_grp.reset_index(level=[0,1])
#################################################################
# create a month map from long to abbreviated calendar names
month_map = dict(zip(calendar.month_name[1:], calendar.month_abbr[1:]))
# update the month column name
df_grp['month'] = df_grp.date.dt.month_name().map(month_map)
# set month as categorical so they are plotted in the correct order
df_grp.month = pd.Categorical(df_grp.month, categories=month_map.values(), ordered=True)
# use groupby to aggregate min mean and max
dfmm = df_grp.groupby(['retail_item', 'month'])['percentage'].agg([max, min, 'mean']).stack().reset_index(level=[2]).rename(columns={'level_2': 'mm', 0: 'vals'}).reset_index()
# create a palette map for line colors
cmap = {'min': 'k', 'max': 'k', 'mean': 'b'}
# iterate through each retail item and plot the corresponding data
for g, d in dfmm.groupby('retail_item'):
plt.figure(figsize=(7, 4))
sns.lineplot(x='month', y='vals', hue='mm', data=d, palette=cmap)
# select only min or max data for fill_between
y1 = d[d.mm == 'max']
y2 = d[d.mm == 'min']
plt.fill_between(x=y1.month, y1=y1.vals, y2=y2.vals, color='gainsboro')
# add lines for specific years
for year in [2016, 2018, 2020]:
data = df_grp[(df_grp.date.dt.year == year) & (df_grp.retail_item == g)]
sns.lineplot(x='month', y='percentage', ci=None, data=data, label=year)
plt.ylim(0, 100)
plt.margins(0, 0)
plt.legend(bbox_to_anchor=(1., 1), loc='upper left')
plt.ylabel('Percentage of Ads')
plt.title(g)
plt.show()
I was trying to visualize a facebook stock dataset, where the data for 2014 to 2018 is stored. The dataset looks like this: dataset screenshot
My goal is to visualize the closing column, but by year. That is, year 2014, then 2015 and so on, but they should be in one figure, and one after another. Something like this: expected graph image
But whatever I try, all the graph parts start from index 0, instead of continuing from the end of the previous one. Here's what I got: the graph I generated
Please help me to solve this problem. Thanks!
The most straightforward way is simply to create separate dataframes with empty
values for the non-needed dates.
Here I use an example dataset.
import pandas as pd
import numpy as np
df = pd.DataFrame(
np.random.randint(0, 100, size=100),
index=pd.date_range(start="2020-01-01", periods=100, freq="D"),
)
Then you can create and select the data to plot
df1 = df.copy()
df2 = df.copy()
df1[df.index > pd.to_datetime('2020-02-01')] = np.NaN
df2[df.index < pd.to_datetime('2020-02-01')] = np.NaN
And then simply plot these on the same axis.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(18, 8))
ax.plot(df1)
ax.plot(df2)
The result