Plot dates in a time series on a x axis [duplicate] - python

This question already has answers here:
Plotting a time series?
(2 answers)
How to draw vertical lines on a given plot
(6 answers)
Closed 1 year ago.
I am trying to plot dates (N=50) on a 5 year time series chart and I'm having trouble trying to figure out how to run through an iteration on a for loop. Below is an example of what I'm trying to plot the dates on.
Visual of what I'm plotting dates on
Currently, I am trying:
for date in dataframe_with_dates.DATE:
plt.axvline(x = date, color = 'g')
plt.show()
and I'm receiving an error of:
Failed to convert value(s) to axis units: 'DATE'
I'm not sure if this has something to do with the dtype being datetime, or if I need to try another approach, but any advice/guidance is greatly appreciated!
Thank you!
This is what I am trying to accomplish: Example image
EDIT: Code to produce the plot
def plot_df(df_1, x, y, title = '', xlabel = 'DATE', ylabel = 'VALUE', dpi = 100):
plt.figure(figsize = (25,5), dpi = dpi)
plt.plot(x, y, color = 'tab:red')
plt.gca().set(title = title, xlabel = xlabel, ylabel = ylabel)
plt.show()
plot_df(df_VIX, x = df_VIX.DATE, y = df_VIX.AVG_VALUE, title = 'Daily VIX since 1990')
`
data_test = [['2016-01-04', 22.48, 23.36, 20.67, 20.70, 21.8025],
['2016-01-05', 20.75, 21.06, 19.25, 19.34, 20.1],
['2016-01-06', 21.67, 21.86, 19.8, 20.59, 20.98],
['2016-01-07', 23.22, 25.86, 22.4, 24.99, 24.1175],
['2016-01-08', 22.96, 25.86, 22.40, 24.99, 24.89]]
df_test = pd.DataFrame(data_test, columns = ['DATE','OPEN','HIGH','LOW','CLOSE', 'AVG_VALUE'])
df_test['DATE'] = pd.to_datetime(df_test['DATE'])
This will reproduce a sample of the exact data I'm using.

I think this is what you want:
df_test.plot(x='DATE', y='OPEN')
Or replace y='OPEN' with another column to plot. The x-axis will be formatted automatically by pandas to be similar to what you showed in the figure.

Related

Stacked Area Chart in Python

I'm working on an assignment from school, and have run into a snag when it comes to my stacked area chart.
The data is fairly simple: 4 columns that look similar to this:
Series id
Year
Period
Value
LNS140000
1948
M01
3.4
I'm trying to create a stacked area chart using Year as my x and Value as my y and breaking it up over Period.
#Stacked area chart still using unemployment data
x = d.Year
y = d.Value
plt.stackplot(x, y, labels = d['Period'])
plt.legend(d['Period'], loc = 'upper left')
plt.show()enter code here`
However, when I do it like this it only picks up M01 and there are M01-M12. Any thoughts on how I can make this work?
You need to preprocess your data a little before passing them to the stackplot function. I took a look at this link to work on an example that could be suitable for your case.
Since I've seen one row of your data, I add some random values to the dataset.
import pandas as pd
import matplotlib.pyplot as plt
dd=[[1948,'M01',3.4],[1948,'M02',2.5],[1948,'M03',1.6],
[1949,'M01',4.3],[1949,'M02',6.7],[1949,'M03',7.8]]
d=pd.DataFrame(dd,columns=['Year','Period','Value'])
years=d.Year.unique()
periods=d.Period.unique()
#Now group them per period, but in year sequence
d.sort_values(by='Year',inplace=True) # to ensure entire dataset is ordered
pds=[]
for p in periods:
pds.append(d[d.Period==p]['Value'].values)
plt.stackplot(years,pds,labels=periods)
plt.legend(loc='upper left')
plt.show()
Is that what you want?
So I was able to use Seaborn to help out. First I did a pivot table
df = d.pivot(index = 'Year',
columns = 'Period',
values = 'Value')
df
Then I set up seaborn
plt.style.use('seaborn')
sns.set_style("white")
sns.set_theme(style = "ticks")
df.plot.area(figsize = (20,9))
plt.title("Unemployment by Year and Month\n", fontsize = 22, loc = 'left')
plt.ylabel("Values", fontsize = 22)
plt.xlabel("Year", fontsize = 22)
It seems to me that the problem you are having relates to the formatting of the data. Look how the values are formatted in this matplotlib example. I would try to groupby the data by period, or pivot it in the correct format, and then graphing again.

How do I change the amount of values shown on the x-axis of a pandas dataframe plot?

Once again I'm stuck in python: I can't find a nice way of representing my data.
I've got a bunch of discharges that I want to plot. However, when I do this the x-axis is too crowded with dates. How can I change the representation that it just shows years (or just a few months). My code is below, the figure of my current result is this:
bargraph
Thanks in advance!
time = pd.Series(pd.period_range('1/1/1970',
freq='M', periods=12*12))
y = np.zeros(int(len(discharge)/3))
for i in range(int(len(discharge)/3)):
y[i] = discharge.sum(axis=1)[i*3]
qdi_bar = pd.DataFrame()
qdi_bar['sum_discharge'] = y
qdi_bar.index = time
qdi_bar.plot.bar()
Edit 1: You can use this code with matplotlib:
time = pd.Series(pd.period_range('1/1/1970',
freq='M', periods=12*12))
# Change to timestamps instead or period ranges (the latter cause a TypeError when used with pyplot)
time = time.apply(lambda x : x.to_timestamp())
y = np.zeros(int(len(discharge)/3))
for i in range(int(len(discharge)/3)):
y[i] = discharge.iloc[i*3].sum()
qdi_bar = pd.DataFrame()
qdi_bar['sum_discharge'] = y
qdi_bar.index = time
#Plot with matplotlib.pyplot instead of pandas
plt.bar(qdi_bar.index, qdi_bar["sum_discharge"], width = 15)
plt.show()
This works fine for me: I am getting a barchart with only the years as ticks:
Still you can apply the suggestion from the original answer and customize the result further with matplotlib.dates.
Original post: I don't know of any way to do this with pandas.DataFrame.plot(), but if you are willing to use matplotlib instead, then you can customize your x-axis as you like with matplotlib.dates.

How to edit x-axis length but also maintain plot dates?

Below I have my code to plot my graph.
#can change the 'iloc[x:y]' component to plot sections of chart
#ax = df['Data'].iloc[300:].plot(color = 'black', title = 'Past vs. Expected Future Path')
ax = df.plot('Date','Data',color = 'black', title = 'Past vs. Expected Future Path')
df.loc[df.index >= idx, 'up2SD'].plot(color = 'r', ax = ax)
df.loc[df.index >= idx, 'down2SD'].plot(color = 'r', ax = ax)
df.loc[df.index >= idx, 'Data'].plot(color = 'b', ax = ax)
plt.show()
#resize the plot
plt.rcParams["figure.figsize"] = [10,6]
plt.show()
Lines 2 (commented out) and 3 both work to plot all of the lines together as seen, however I wish to have the dates on the x-axis and also be able to be able to plot sections of the graph (defined by x-axis, i.e. date1 to date2).
Using line 3 I can plot with dates on the x-axis, however using ".iloc[300:]" like in line 2 does not appear to work as the 3 coloured lines disconnect from the main line as seen below:
ax = df.iloc[300:].plot('Date','Data',color = 'black', title = 'Past vs. Expected Future Path')
Using line 2, I can edit the x-axis' length, however it doesn't have dates on the x-axis.
Does anyone have any advice on how to both have dates and be able to edit the x-axis periods?
For this to work as desired, you need to set the 'date' column as index of the dataframe. Otherwise, df.plot has no way to know what needs to be used as x-axis. With the date set as index, pandas accepts expressions such as df.loc[df.index >= '20180101', 'data2'] to select a time range and a specific column.
Here is some example code to demonstrate the concept.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
dates = pd.date_range('20160101', '20191231', freq='D')
data1 = np.random.normal(-0.5, 0.2, len(dates))
data2 = np.random.normal(-0.7, 0.2, len(dates))
df = pd.DataFrame({'date': dates, 'data1':data1, 'data2':data2})
df.set_index('date', inplace=True)
df['data1'].iloc[300:].plot(color='crimson')
df.loc[df.index >= '20180101', 'data2'].plot(color='dodgerblue')
plt.tight_layout()
plt.show()

How to plot Date in X Axis, Time in Y axis with Pandas/Matplotlib and present time in HH:MM format as tick labels?

I have date in one column and time in another which I retrieved from database through pandas read_sql. The dataframe looks like below (there are 30 -40 rows in my daaframe). I want to plot them in a time series graph. If I want I should be in a position to convert that to Histogram as well.
COB CALV14
1 2019-10-04 07:04
2 2019-10-04 05:03
3 2019-10-03 16:03
4 2019-10-03 05:15
First I got different errors - like not numeric field to plot etc. After searching a lot,the closest post I could find is : Matplotlib date on y axis
I followed and got some result - However the problem is:
I have to follow number of steps (convert to str then list and then to matplot lib datetime format) before I can plot them. (Please refer the code I am using) There must be a smarter and more precise way to do this.
This does not show the time beside the axis the way they exactly appear in the data frame. (eg it should show 07:03, 05:04 etc)
New to python - will appreciate any help on this.
Code
ob_frame['COB'] = ob_frame.COB.astype(str)
ob_frame['CALV14'] = ob_frame.CALV14.astype(str)
date = ob_frame.COB.tolist()
time = ob_frame.CALV14.tolist()
y = mdates.datestr2num(date)
x = mdates.datestr2num(time)
fig, ax = plt.subplots(figsize=(9,9))
ax.plot(x, y)
ax.yaxis_date()
ax.xaxis_date()
fig.autofmt_xdate()
plt.show()
I found the answer to it.I did not need to convert the data retrieved from DB to string type. Rest of the issue I was thought to be getting for not using the right formatting for the tick labels. Here goes the complete code - Posting in case this will help anyone.
In this code I have altered Y and X axis : i:e I plotted dates in x axis and time in Y axis as it looked better.
###### Import all the libraries and modules needed ######
import IN_OUT_SQL as IS ## IN_OUT_SQL.py is the file where the SQL is stored
import cx_Oracle as co
import numpy as np
import Credential as cd # Credentia.py is the File Where you store the DB credentials
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import dates as mdates
%matplotlib inline
###### Connect to DB, make the dataframe and prepare the x and y values to be plotted ######
def extract_data(query):
'''
This function takes the given query as input, Connects to the Databse, executes the SQL and
returns the result in a dataframe.
'''
cred = cd.POLN_CONSTR #POLN_CONSTR in the credential file stores the credential in '''USERNAME/PASSWORD#DB_NAME''' format
conn = co.connect(cred)
frame = pd.read_sql(query, con = conn)
return frame
query = IS.OUT_SQL
ob_frame = extract_data(query)
ob_frame.dropna(inplace = True) # Drop the rows with NaN values for all the columns
x = mdates.datestr2num(ob_frame['COB']) #COB is date in "01-MAR-2020" format- convert it to madates type
y = mdates.datestr2num(ob_frame['CALV14']) #CALV14 is time in "21:04" Format- convert it to madates type
###### Make the Timeseries plot of delivery time in y axis vs delivery date in x axis ######
fig, ax = plt.subplots(figsize=(15,8))
ax.clear() # Clear the axes
ax.plot(x, y, 'bo-', color = 'dodgerblue') #Plot the data
##Below two lines are to draw a horizontal line for 05 AM and 07 AM position
plt.axhline(y = mdates.date2num (pd.to_datetime('07:00')), color = 'red', linestyle = '--', linewidth = 0.75)
plt.axhline(y = mdates.date2num (pd.to_datetime('05:00')), color = 'green', linestyle = '--', linewidth = 0.75)
plt.xticks(x,rotation = '75')
ax.yaxis_date()
ax.xaxis_date()
#Below 6 lines are about setting the format with which I want my xor y ticks and their labels to be displayed
yfmt = mdates.DateFormatter('%H:%M')
xfmt = mdates.DateFormatter('%d-%b-%y')
ax.yaxis.set_major_formatter(yfmt)
ax.xaxis.set_major_formatter(xfmt)
ax.yaxis.set_major_locator(mdates.HourLocator(interval=1)) # Every 1 Hour
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1)) # Every 1 Day
####### Name the x,y labels, titles and beautify the plot #######
plt.style.use('bmh')
plt.xlabel('\nCOB Dates')
plt.ylabel('Time of Delivery (GMT/BST as applicable)\n')
plt.title(" Data readiness time against COBs (Last 3 months)\n")
plt.rcParams["font.size"] = "12" #Change the font
# plt.rcParams["font.family"] = "Times New Roman" # Set the font type if needed
plt.tick_params(left = False, bottom = False, labelsize = 10) #Remove ticks, make tick labelsize 10
plt.box(False)
plt.show()
Output:

Seaborn scatterplot - label data points [duplicate]

This question already has answers here:
Adding labels in x y scatter plot with seaborn
(6 answers)
Closed 4 years ago.
I have a Seaborn scatterplot using data from a dataframe. I would like to add data labels to the plot, using other values in the df associated with that observation (row). Please see below - is there a way to add at least one of the column values (A or B) to the plot? Even better, is there a way to add two labels (in this case, both the values in column A and B?)
I have tried to use a for loop using functions like the below per my searches, but have not had success with this scatterplot.
Thank you for your help.
df_so = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
scatter_so=sns.lmplot(x='C', y='D', data=df_so,
fit_reg=False,y_jitter=0, scatter_kws={'alpha':0.2})
fig, ax = plt.subplots() #stuff like this does not work
Use:
df_so = pd.DataFrame(np.random.randint(0,100,size=(20, 4)), columns=list('ABCD'))
scatter_so=sns.lmplot(x='C', y='D', data=df_so,
fit_reg=False,y_jitter=0, scatter_kws={'alpha':0.2})
def label_point(x, y, val, ax):
a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
for i, point in a.iterrows():
ax.text(point['x']+.02, point['y'], str(point['val']))
label_point(df_so['C'], df_so['D'], '('+df_so['A'].astype(str)+', '+df_so['B'].astype(str)+')', plt.gca())
Output:

Categories

Resources