How to overlay time series from each day on one plot - python

I am trying to plot data from a data frame with hourly frequency on a plot where each day is its own line and the x-axis is hour. The data frame is shown below and so is the resulting graph I get when simply setting the x-axis as 'hb' and y-axis as 'BaseCase'. The plot is close to what I want, but connects the end points to the starting point. How do I go about avoiding the straight lines across the plot?
scens = pd.read_csv(---)
scens['datetime'] = pd.to_datetime(scens['datetime'])
scens.drop(scens.tail(2).index,inplace=True)
source = ColumnDataSource(scens)
p = figure()
p.line(x='hb', y='BaseCase', source=source)
show(p)
The above code is how I get the plot at the bottom of the post

If you are opened to other packages, consider seaborn
import seaborn as sns
sns.lineplot(data=df,x='hb', y='BasedCase',
hue= df['datetime'].df.normalize())
Or with just pandas:
for date, d in df.groupby(df['datetime'].df.normalize()):
d.plot(x='hb', y='BasedCase', label=date)

Related

How can I stack a plotly scatter plot on top of a time series

I am doing a time series project and I am trying to figure out how to overlay a scatter plot on top of a time series plot
'''df = alaska
fig = px.scatter(df, x='week', y='travel_restrictions')
fig.show()'''
''' df = alaska
fig = px.line(df, x='week', y='depression')
fig.show()'''
simple plot I would like to combine
I have tried a couple of different ways of adding traces but I get value errors whenever I try to combine the two charts plot from code
'''' import matplotlib.pyplot as plt
model = ChangepointDetector()
res = model.find_trend_changepoints(
df=combine.reset_index(), # data df
time_col="week", # time column name
value_col="Least Restrictions", # value column name
yearly_seasonality_order=10, # yearly seasonality order, fit along with trend
regularization_strength=0.3, # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
resample_freq="7D", # data aggregation frequency, eliminate small fluctuation/seasonality
potential_changepoint_n=25, # the number of potential changepoints
yearly_seasonality_change_freq="365D", # varying yearly seasonality for every year
no_changepoint_distance_from_end="365D") # the proportion of data from end where changepoints are not allowed
fig = model.plot(
observation=True,
trend_estimate=False,
trend_change=True,
yearly_seasonality_estimate=False,
adaptive_lasso_estimate=True,
plot=False,
)
trace2 = go.Scatter(
x=df[cat],
y=df['cumulative_perc'],
name='Cumulative Percentage',
yaxis='y2'
fig.update_layout(title='Alaska')
plotly.io.show(fig)''''
Also I'm not sure if it was possible but originally I was trying to stack a scatter plot on top of this greykite plot but I only managed to have the two plot obscure each other
plot from code
Ideally I'd stack my scatter plot on top of my change plot but I don't know if it's possible

How to plot multiple lines in subplot using python and matplotlib

I've been following the solutions provided by Merge matplotlib subplots with shared x-axis. See solution 35. In each subplot, there is one line, but I would like to have multiple lines in each subplot. For example, the top plot has the price of IBM and a 30 day moving average. The bottom plot has a 180 day and 30 day variance.
To plot multiple lines in my other python programs I used (data).plot(figsize=(10, 7)) where data is a dataframe indexed by date, but in the author's solution he uses line0, = ax0.plot(x, y, color='r') to assign the data series (x,y) to the plot. In the case of multiple lines in solution 35, how does one assign a dataframe with multiple columns to the plot?
You'll need to use (data).plot(ax=ax0) to work with pandas plotting.
For the legend you can use:
handles0, labels0 = ax0.get_legend_handles_labels()
handles1, labels1 = ax1.get_legend_handles_labels()
ax0.legend(handles=handles0 + handles1, labels=labels0 + labels1)

How to properly display date from csv in matplotlib plot?

I have a csv with the following columns: recorded, humidity and temperature. I want to display the recorded values(date and time) on the x axis and the humidity on the y axis. How can I properly display the dates(it is quite a big csv), as my current plot has black instead of proper date numbers... My date format is like this: 2019-09-12T07:26:55, having the date and also the time displayed in the csv.
I have displayed the plot using this code:
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv('home_data.csv')
plt.plot(data.recorded, data.humidity)
plt.xlabel('date')
plt.ylabel('humidity')
plt.title('Visualizing date and humidity')
plt.show()
This is a print screen of the plot:
https://snipboard.io/d4hfS7.jpg
Actually, the plot is displaying every date in your dataset. They are so many that they seem just like a black blob. You can downsample the xticks in order to increase the readability. Do something like this:
fig, ax = plt.subplots()
ax.plot(data.recorded, data.humidity)
# some axes labelling
# Reduce now the number of the ticks printed in the figure
ax.set_xticks(ax.get_xticks()[::4])
ax.get_xticklabels(ax.get_xticks(), rotation=45)
In line ax.set_xticks(ax.get_xticks()[::4]) you are setting the ticks of the x-axis
picking 1 date every 4 using the property of the list. It will reduce the number of dates printed. You can increase the number as much as you want.
To increase the readibility, you can rotate the tick labels as I suggested in the line
ax.get_xticklabels(ax.get_xticks(), rotation=45).
Hope this helps.

Why doesn't Subplot using Pandas show x-axis

When I plot single plots with panda dataframes I have an x-axis.
However, when I make a subplot and try to make a shared x-axis the way I would when using numpy arrays without pandas, there are no numbers labels
I only want the numbers and label to appear on the last plot as they share the same x-axis.
The data loaded and the plot produced can be found here:
https://drive.google.com/open?id=1hTmTSkIcYl-usv_CCxLl8U6bAoO6tMRh
This is for combining and plotting the data logged from two different logging devices which represent the same time period.
import pandas as pd
import matplotlib.pyplot as plt
df1=pd.read_csv('data1.csv', sep=',',header=0)
df1.columns.values
cols1 = list(df1.columns.values)
df2=pd.read_csv('data2.dat', sep='\t',header=18)
df2.columns.values
cols2 = list(df2.columns.values)
start =10000
stop = 30000
fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 10))
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[1], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[2], ax=axes[0])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[3], ax=axes[2])
df1.iloc[start:stop].plot(x=cols1[0], y=cols1[4], ax=axes[2])
df2.iloc[start:stop].plot(x=cols2[0], y=cols2[3], ax=axes[3])
ax3.set_xlabel("Time [s]")
plt.show()
I expect there to be numbers and a label on the x-axis but instead, it only gives the pandas label "#timestamp"
UPDATE: I have found something that hints at the problem. I think the problem is due to the two files not having identical time spacing, the first column of each file is time, they are roughly 1 sample per second but not exactly. If I remove the x=cols[x] parts it then shows numbers on the x-axis but then there is a shift in time between the two plots as they are not plotting against time but rather against the index in the dataframe.
I am currently trying to interpolate the data so that they have the same x-axis but I would not have expected that to be necessary.

aligning xticks in matplotlib plot with lines and boxplot

i am unable to get the following plot to align properly along the x-axis. specifically, i want to plot a horizontal line representing the last value in the dataframe on top of a boxplot which describes the full sample. here is the code. currently i have commented out the line which would plot the boxplot
index = pd.date_range('1/1/2018', '2/1/2018')
data = pd.DataFrame({'a':np.random.randn(32)}, index=index)
fig, ax = plt.subplots(figsize=(6,3))
ax.hlines(data.iloc[-1],xmin=pd.RangeIndex(stop=len(list(data.columns)))+.15,xmax=pd.RangeIndex(stop=len(list(data.columns)))+.85,
**{'linewidth':1.5})
# ax.boxplot(data.values)
ax.set_xticks(pd.RangeIndex(stop=len(list(data.columns)))+0.5)
ax.set_xticklabels(list(data.columns), rotation=0)
ax.tick_params(axis='x',length=5, bottom=True)
here is the output from the above (so far so good)
if i uncomment the line from above, the code would produce this, which is misaligned:
any tips for how to get them to line up?
Apparently you have a very clear opinion about the boxplot to be positionned at x=0.5. But you forgot to tell the boxplot about that.
ax.boxplot(data.values, positions=[0.5])

Categories

Resources