Is it possible to add another x axis to a plotly chart? - python

I have a plotly chart that looks like this:
Is there a way to make a second x axis that only has the years? What I mean is that I want two x axes: a 'sub-axis' that has the months (Sep, Nov, Jan , ...), and another one that has the years (2021, 2022, 2023).

It is possible to handle this by making the x-axis a multiple list, but if the original data is in date units, it will be changed to a graph of one day in month units. To put it more simply, if the data is for one year, there are 365 points, but if the data is displayed in months only, there will be 12 points. The closest way to meet the request is to make it month name and day.
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import calendar
df = px.data.stocks()
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
multi_index = [df['year'].values,df['date'].dt.strftime('%b-%d').values]
fig = go.Figure()
fig.add_scatter(x=multi_index, y=df['GOOG'])
fig.show()

Related

Time series data visualization issue

I have a time series data like below where the data consists of year and week. So, the data is from 2014 1st week to 2015 52 weeks.
Now, below is the line plot of the above mentioned data
As you can see the x axis labelling is not quite what I was trying to achieve since the point after 201453 should be 201501 and there should not be any straight line and it should not be up to 201499. How can I rescale the xaxis exactly according to Due_date column? Below is the code
rand_products = np.random.choice(Op_2['Sp_number'].unique(), 3)
selected_products = Op_2[Op_2['Sp_number'].isin(rand_products)][['Due_date', 'Sp_number', 'Billing']]
plt.figure(figsize=(20,10))
plt.grid(True)
g = sns.lineplot(data=selected_products, x='Due_date', y='Billing', hue='Sp_number', ci=False, legend='full', palette='Set1');
the issue is because 201401... etc. are read as numbers and that is the reason the line chart has that gap. To fix it, you will need to change the numbers to date format and plot it.
As the full data is not available, below is the two column dataframe which has the Due_date in the form of integer YYYYWW. Billing column is a bunch of random numbers. Use the method here to convert the integers to dateformat and plot. The gap will be removed....
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
import seaborn as sns
Due_date = list(np.arange(201401,201454)) #Year 2014
Due_date.extend(np.arange(201501,201553)) #Year 2915
Billing = random.sample(range(500, 1000), 105) #billing numbers
df = pd.DataFrame({'Due_date': Due_date, 'Billing': Billing})
df.Due_date = df.Due_date.astype(str)
df.Due_date = pd.to_datetime(df['Due_date']+ '-1',format="%Y%W-%w") #Convert to date
plt.figure(figsize=(20,10))
plt.grid(True)
ax = sns.lineplot(data=df, x='Due_date', y='Billing', ci=False, legend='full', palette='Set1')
Output graph

Python pandas scatterplot of year against month-day

So I have a dataframe with a date column.
date
2021-06-17
2020-06-20
What I want to do is to do a scatterplot with the x-axis being the year, and the y-axis being month-day. So what I have already is this:
What I would like is for the y-axis ticks to be the actual month-day values and not the day number for the month-day-year. Not sure if this is possible, but any help is much appreciated.
Some Sample Data:
import pandas as pd
from matplotlib import pyplot as plt, dates as mdates
# Some Sample Data
df = pd.DataFrame({
'date': pd.date_range(
start='2000-01-01', end='2020-12-31', freq='D'
)
}).sample(n=100, random_state=5).sort_values('date').reset_index(drop=True)
Then one option would be to normalize the dates to the same year. Any year works as long as it's a leap year to handle the possibility of a February 29th (leap day).
This becomes the new y-axis.
# Create New Column with all dates normalized to same year
# Any year works as long as it's a leap year in case of a Feb-29
df['month-day'] = pd.to_datetime('2000-' + df['date'].dt.strftime('%m-%d'))
# Plot DataFrame
ax = df.plot(kind='scatter', x='date', y='month-day')
# Set Date Format On Axes
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y')) # Year Only
ax.yaxis.set_major_formatter(mdates.DateFormatter('%m-%d')) # No Year
plt.tight_layout()
plt.show()

Plotly: Change order of elements in Sunburst Chart

I am currently using plotly express to create a Sunburst Chart. However, i realized that children are ordered alphabetical for nominal values. Especially for plotting months that is pretty unlucky... Do you know how to handle that issue? Maybe a property or some workaround? Below there is an example so you can try it yourself. Thanks in advance!
import plotly.express as px
import pandas as pd
import calendar
months = [x for x in calendar.month_name if x]
#Create Dataframe
data = []
for m in months:
data.append(['2018', m, 2])
df = pd.DataFrame(data, columns=['Year', 'Month', 'Value'])
#Compute Sunburst
fig = px.sunburst(df, path=['Year', 'Month'], values='Value')
fig.show()
Please Check this out. I have just added values to each months instead of hardcoding 2. So the corresponding month matches with corresponding number.
January-1, February-2, ... December-12
import plotly.express as px
import pandas as pd
import calendar
months = [x for x in calendar.month_name if x]
#Create Dataframe
data = []
for i,m in enumerate(months):
data.append(['2018', m,i+1])
print(data)
df = pd.DataFrame(data, columns=['Year', 'Month', 'Value'])
#Compute Sunburst
fig = px.sunburst(df, path=['Year', 'Month'], values='Value')
fig.show()
The other solution gives each month an angle proportional to its number. A small tweak to line 8 as follows:
data.append(['2018', m,0.00001*i+1])
gives each month the same sized piece of the pie.
A better solution is to disable the auto-sorting of the elements:
fig.update_traces(sort=False, selector=dict(type='sunburst'))
which then adds the elements in the order that they are defined in the data.

Plotly: How to style a plotly figure so that it doesn't display gaps for missing dates?

I have a plotly graph of the EUR/JPY exchange rate across a few months in 15 minute time intervals, so as a result, there is no data from friday evenings to sunday evenings.
Here is a portion of the data, note the skip in the index (type: DatetimeIndex) over the weekend:
Plotting this data in plotly results in a gap over the missing dates Using the dataframe above:
import plotly.graph_objs as go
candlesticks = go.Candlestick(x=data.index, open=data['Open'], high=data['High'],
low=data['Low'], close=data['Close'])
fig = go.Figure(layout=cf_layout)
fig.add_trace(trace=candlesticks)
fig.show()
Ouput:
As you can see, there are gaps where the missing dates are. One solution I've found online is to change the index to text using:
data.index = data.index.strftime("%d-%m-%Y %H:%M:%S")
and plotting it again, which admittedly does work, but has it's own problem. The x-axis labels look atrocious:
I would like to produce a graph that plots a graph like in the second plot where there are no gaps, but the x-axis is displayed like as it is on the first graph. Or at least displayed in a much more concise and responsive format, as close to the first graph as possible.
Thank you in advance for any help!
Even if some dates are missing in your dataset, plotly interprets your dates as date values, and shows even missing dates on your timeline. One solution is to grab the first and last dates, build a complete timeline, find out which dates are missing in your original dataset, and include those dates in:
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks)])
This will turn this figure:
Into this:
Complete code:
import plotly.graph_objects as go
from datetime import datetime
import pandas as pd
import numpy as np
# sample data
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
# remove some dates to build a similar case as in the question
df = df.drop(df.index[75:110])
df = df.drop(df.index[210:250])
df = df.drop(df.index[460:480])
# build complete timepline from start date to end date
dt_all = pd.date_range(start=df['Date'].iloc[0],end=df['Date'].iloc[-1])
# retrieve the dates that ARE in the original datset
dt_obs = [d.strftime("%Y-%m-%d") for d in pd.to_datetime(df['Date'])]
# define dates with missing values
dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d").tolist() if not d in dt_obs]
# make fiuge
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['AAPL.Open'], high=df['AAPL.High'],
low=df['AAPL.Low'], close=df['AAPL.Close'])
])
# hide dates with no values
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks)])
fig.update_layout(yaxis_title='AAPL Stock')
fig.show()
Just in case someone here wants to remove gaps for outside trading hours and weekends,
As shown below, using rangebreaks is the way to do it.
fig = go.Figure(data=[go.Candlestick(x=df['date'], open=df['Open'], high=df['High'], low=df['Low'], close=df['Close'])])
fig.update_xaxes(
rangeslider_visible=True,
rangebreaks=[
# NOTE: Below values are bound (not single values), ie. hide x to y
dict(bounds=["sat", "mon"]), # hide weekends, eg. hide sat to before mon
dict(bounds=[16, 9.5], pattern="hour"), # hide hours outside of 9.30am-4pm
# dict(values=["2020-12-25", "2021-01-01"]) # hide holidays (Christmas and New Year's, etc)
]
)
fig.update_layout(
title='Stock Analysis',
yaxis_title=f'{symbol} Stock'
)
fig.show()
here's Plotly's doc.
thanks for the amazing sample! works on daily data but with intraday / 5min data rangebreaks only leave one day on chart
# build complete timepline
dt_all = pd.date_range(start=df.index[0],end=df.index[-1], freq="5T")
# retrieve the dates that ARE in the original datset
dt_obs = [d.strftime("%Y-%m-%d %H:%M:%S") for d in pd.to_datetime(df.index, format="%Y-%m-%d %H:%M:%S")]
# define dates with missing values
dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d %H:%M:%S").tolist() if not d in dt_obs]
To fix problem with intraday data, you can use the dvalue parameter of rangebreak with the right ms value.
For example, 1 hour = 3.6e6 ms, so use dvalue with this value.
Documentation here : https://plotly.com/python/reference/layout/xaxis/
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks, dvalue=3.6e6)])

How can I draw the histogram of date values group by month in each year in Python?

I wrote this code to draw the histogram of date values in each month. It shows the number of dates for each month in the whole dataset. But I want the histogram to be for each month in each year.That is, for example, I should have January through December for year1, and then January through December for year2 and so on.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.options.display.mpl_style = 'default'
sns.set_context("talk")
df = pd.read_csv("data.csv", names=['lender','loan','country','sector','amount','date'],header=None)
date=df['date']
df.date = date.astype("datetime64")
df.groupby(df.date.dt.month).count().plot(kind="bar")
According to the docstring the groupby docstring, the by parameter is:
list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups
So your code simply becomes:
df = pd.read_csv(...)
df['date'] = df['date'].astype("datetime64")
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df.groupby(by=['month', 'year']).count().plot(kind="bar")
But I would write this as:
ax = (
pandas.read_csv(...)
.assign(date=lambda df: df['date'].astype("datetime64"))
.assign(year=lambda df: df['date'].dt.year)
.assign(month=lambda df: df['date'].dt.month)
.groupby(by=['year', 'month'])
.count()
.plot(kind="bar")
)
And now you have a matplotlib axes object that you can use to modify the tick labels (e.g., matplotlib x-axis ticks dates formatting and locations)

Categories

Resources