I've been trying to plot my data on a line chart, and I expect it to show dates on the horizontal axis, i used index_col to set the index as date but that returns an empty dataframe.. can some one help please
data = pd.read_csv('good_btc_dataset.csv', warn_bad_lines= True,
index_col= ['date'])
data.dropna(inplace=True)
data.index = range(3169)
data.head()
I expect my chart to show dates on the horizontal axis but all it shows is numbers
thanks in advance
I recommend you to check this script (it is a copy and paste from the documentation). I think you just need to adapt your own data.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
years_fmt = mdates.DateFormatter('%Y')
# Load a numpy structured array from yahoo csv data with fields date, open,
# close, volume, adj_close from the mpl-data/example directory. This array
# stores the date as an np.datetime64 with a day unit ('D') in the 'date'
# column.
with cbook.get_sample_data('goog.npz') as datafile:
data = np.load(datafile)['price_data']
fig, ax = plt.subplots()
ax.plot('date', 'adj_close', data=data)
# format the ticks
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(years_fmt)
ax.xaxis.set_minor_locator(months)
# round to nearest years.
datemin = np.datetime64(data['date'][0], 'Y')
datemax = np.datetime64(data['date'][-1], 'Y') + np.timedelta64(1, 'Y')
ax.set_xlim(datemin, datemax)
# format the coords message box
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
ax.format_ydata = lambda x: '$%1.2f' % x # format the price.
ax.grid(True)
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
fig.autofmt_xdate()
plt.show()
Related
Data - we import historical yields of the ten and thirty year Treasury and calculate the spread (difference) between the two (this block of code is good; feel free so skip):
#Import statements
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
#Constants
start_date = "2018-01-01"
end_date = "2023-01-01"
#Pull in data
tenYear_master = yf.download('^TNX', start_date, end_date)
thirtyYear_master = yf.download('^TYX', start_date, end_date)
#Trim DataFrames to only include 'Adj Close columns'
tenYear = tenYear_master['Adj Close'].to_frame()
thirtyYear = thirtyYear_master['Adj Close'].to_frame()
#Rename columns
tenYear.rename(columns = {'Adj Close' : 'Adj Close - Ten Year'}, inplace= True)
thirtyYear.rename(columns = {'Adj Close' : 'Adj Close - Thirty Year'}, inplace= True)
#Join DataFrames
data = tenYear.join(thirtyYear)
#Add column for difference (spread)
data['Spread'] = data['Adj Close - Thirty Year'] - data['Adj Close - Ten Year']
data
This block is also good.
'''Plot data'''
#Delete top, left, and right borders from figure
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.left'] = False
plt.rcParams['axes.spines.right'] = False
fig, ax = plt.subplots(figsize = (15,10))
data.plot(ax = ax, secondary_y = ['Spread'], ylabel = 'Yield', legend = False);
'''Change left y-axis tick labels to percentage'''
left_yticks = ax.get_yticks().tolist()
ax.yaxis.set_major_locator(mticker.FixedLocator(left_yticks))
ax.set_yticklabels((("%.1f" % tick) + '%') for tick in left_yticks);
#Add legend
fig.legend(loc="upper center", ncol = 3, frameon = False)
fig.tight_layout()
plt.show()
I have questions concerning two features of the graph that I want to customize:
The x-axis currently has a tick and tick label for every year. How can I change this so that there is a tick and tick label for every 3 months in the form MMM-YY? (see picture below)
The spread was calculated as thirty year yield - ten year yield. Say I want to change the RIGHT y-axis tick labels so that their sign is flipped, but I want to leave both the original data and curves alone (for the sake of argument; bear with me, there is logic underlying this). In other words, the right y-axis tick labels currently go from -0.2 at the bottom to 0.8 at the top. How can I change them so that they go from 0.2 at the bottom to -0.8 at the top without changing anything about the data or curves? This is purely a cosmetic change of the right y-axis tick labels.
I tried doing the following:
'''Change right y-axis tick labels'''
right_yticks = (ax.right_ax).get_yticks().tolist()
#Loop through and multiply each right y-axis tick label by -1
for index, value in enumerate(right_yticks):
right_yticks[index] = value*(-1)
(ax.right_ax).yaxis.set_major_locator(mticker.FixedLocator(right_yticks))
(ax.right_ax).set_yticklabels(right_yticks)
But I got this:
Note how the right y-axis is incomplete.
I'd appreciate any help. Thank you!
Let's create some data:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
days = np.array(["2022-01-01", "2022-07-01", "2023-02-15", "2023-11-15", "2024-03-03"],
dtype = "datetime64")
val = np.array([20, 20, -10, -10, 10])
For the date in the x-axis, we import matplotlib.dates, which provides the month locator and the date formater. The locator sets the ticks each 3 months, and the formater sets the way the labels are displayed (month-00).
For the y-axis data, you require changing the sign of the data (hence the negative sign in ax2.plot(), but you want the curve in the same position, so afterwards you need to invert the axis. And so, the curves in both plots are identical, but the y-axis values have different signs and directions.
fig, (ax1, ax2) = plt.subplots(figsize = (10,5), nrows = 2)
ax1.plot(days, val, marker = "x")
# set the locator to Jan, Apr, Jul, Oct
ax1.xaxis.set_major_locator(mdates.MonthLocator( bymonth = (1, 4, 7, 10) ))
# set the formater for month-year, with lower y to show only two digits
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%b-%y"))
# change the sign of the y data plotted
ax2.plot(days, -val, marker = "x")
#invert the y axis
ax2.invert_yaxis()
# set the locator to Jan, Apr, Jul, Oct
ax2.xaxis.set_major_locator(mdates.MonthLocator( bymonth = (1, 4, 7, 10) ))
# set the formater for month-year, with lower y to show only two digits
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%b-%y"))
plt.show()
I am new to Python and learning data visualization using matplotlib.
I am trying to plot Date/Time vs Values using matplotlib from this CSV file:
https://drive.google.com/file/d/1ex2sElpsXhxfKXA4ZbFk30aBrmb6-Y3I/view?usp=sharing
Following is the code snippet which I have been playing around with:
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
plt.style.use('seaborn')
years = mdates.YearLocator()
months = mdates.MonthLocator()
days = mdates.DayLocator()
hours = mdates.HourLocator()
minutes = mdates.MinuteLocator()
years_fmt = mdates.DateFormatter('%H:%M')
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
fig, ax = plt.subplots()
ax.plot('Date/Time', 'Discharge', data=data)
# format the ticks
ax.xaxis.set_major_locator(minutes)
ax.xaxis.set_major_formatter(years_fmt)
ax.xaxis.set_minor_locator(hours)
datemin = min(data['Date/Time'])
datemax = max(data['Date/Time'])
ax.set_xlim(datemin, datemax)
ax.format_xdata = mdates.DateFormatter('%Y.%m.%d %H:%M')
ax.format_ydata = lambda x: '%1.2f' % x # format the price.
ax.grid(True)
fig.autofmt_xdate()
plt.show()
The code is plotting the graph but it is not labeling the X-Axis and also giving some unknown values (on mouse over) for x on the bottom right corner as shown in the below screenshot:
Screenshot of matplotlib figure window
Can someone please suggest what changes are needed to plot the x-axis dates and also make the correct values appear when I move the cursor over the graph?
Thanks
I haven't used matplotlib. Instead I used pandas plotting
import pandas as pd
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
data["Date/Time"] = pd.to_datetime(data["Date/Time"], format="%d.%m.%Y %H:%M")
ax = data.plot.line(x='Date/Time', y='Discharge')
Here, you need to convert the Date/Time to pandas datetime type.
The main issue you have there is that the date formats are mixed up - your data uses '%d.%m.%Y %H:%M', but you set '%Y.%m.%d %H:%M' and this is why you saw 'rubbish' values in x ticks labels. Anyway the number of lines in your code can be reduced heavily if you convert your Date/Time column to timestamps, ie.:
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
plt.style.use('seaborn')
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
data["Date/Time"] = pd.to_datetime(data["Date/Time"], format="%d.%m.%Y %H:%M")
data.sort_values('Date/Time', inplace=True)
fig, ax = plt.subplots()
ax.plot('Date/Time', 'Discharge', data=data)
ax.format_xdata = mdates.DateFormatter('%Y.%m.%d %H:%M')
ax.tick_params(axis='x', rotation=45)
ax.grid(True)
fig.autofmt_xdate()
plt.show()
Note that the format of labels in the plot will depend on the zoom level, so you will need to enlarge a portion of the graph to see hours and minutes in the tick labels, but the cursor locator on the bottom bar of the window should be always displaying the detailed timestamp under the cursor.
I wonder if it's possible to change the measurement milestones for graphs created by pandas. In my code the X-axis stands for time and is measured by month, but the measurement milestones are all over the place.
In the image below, the milestones for the X-axis are 2012M01, 2012M06, 2012M11, 2013M04 and 2013M09.
Is there any way I can choose how long the distance should be between every milestone? For example, to make it so it shows every year or every half year?
This is the code I used for the function making the graph:
def graph(dataframe):
graph = dataframe[["Profit"]].plot()
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
plt.grid(True)
plt.show()
The actual dataframe is just an excel-file with a bunch of months and monetary values in it.
I think the most straight forward is to use matplotlib.dates to format the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m') #see https://strftime.org/
major = mdates.MonthLocator([1,7]) #label only Jan and Jul
graph = dataframe[["Profit"]].plot(ax=ax) #link plot to the existing axes
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
graph.xaxis.set_major_locator(major) #set major locator tick on x-axis
graph.xaxis.set_major_formatter(xfmt) #format xtick label
plt.grid(True)
plt.show()
But a key point is you need to have your dates as Python's built-in datetime.date (not datetime.datetime); thanks to this answer. If your dates are str or a different type of datetime, you will need to convert, but there are many resources on SO and elsewhere for doing this like this or this:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
dr = [pd.to_datetime(date).date() for date in df.index] #explicitly converting to datetime with .date()
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
type(df.index.[0])
Out[0]:
datetime.date
Calling graph(df) using the example above gets this plot:
Just to expand on this, here's what happens when the index is pandas.Timestamp instead of datetime.date:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
graph(df)
Out[0]:
The x-axis is improperly formatted:
However, if you are willing to just create the plot directly through matplotlib, rather than pandas (pandas is using matplotlib anyway), this can handle more types of dates:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
def graph_2(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m')
major = mdates.MonthLocator([1,7])
ax.plot(dataframe.index,dataframe['Profit'], label='Profit')
ax.set_title('Statistics')
ax.set_ylabel('Thousand $')
ax.set_xlabel('Time')
ax.xaxis.set_major_locator(major)
ax.xaxis.set_major_formatter(xfmt)
ax.legend() #legend needs to be added
plt.grid(True)
plt.show()
graph_2(df)
type(df.index[0])
Out[0]:
pandas._libs.tslibs.timestamps.Timestamp
And here is the working graph:
My question is if there is any way to use matplotlib date tick labels with a log xscale.
I find whenever I try to set_xscale('log') it just erases the labels and doesn't actually log the xscale...
Example code:
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
# Load a numpy record array from yahoo csv data with fields date, open, close,
# volume, adj_close from the mpl-data/example directory. The record array
# stores the date as an np.datetime64 with a day unit ('D') in the date column.
with cbook.get_sample_data('goog.npz') as datafile:
r = np.load(datafile)['price_data'].view(np.recarray)
# Matplotlib works better with datetime.datetime than np.datetime64, but the
# latter is more portable.
date = r.date.astype('O')
fig, ax = plt.subplots()
ax.plot(date, r.adj_close)
# format the ticks
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.xaxis.set_minor_locator(months)
datemin = datetime.date(date.min().year, 1, 1)
datemax = datetime.date(date.max().year + 1, 1, 1)
ax.set_xlim(datemin, datemax)
# format the coords message box
def price(x):
return '$%1.2f' % x
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
ax.format_ydata = price
ax.grid(True)
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
fig.autofmt_xdate()
ax.set_xscale('log')
plt.show()
Try using ScalarFormatter:
from matplotlib.ticker import ScalarFormatter
ax.xaxis.set_major_formatter(ScalarFormatter())
This code plots the data exactly as I want with the dates on the x-axis and the times on the y-axis. However I want the y-axis to show every hour on the hour (e.g., 00, 01, ... 23) and the x-axis to show the beginning of every month at an angle so there's no overlap (the actual data being used spans over a year) and only once, since this code repeats the same months. How is this accomplished?
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-02-04 11:55:09']
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time, '.')
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
plt.show()
UPDATE: This fixes the x axis.
# Monthly intervals on x axis
months = mdates.MonthLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(d_fmt)
However, this attempt to fix the y axis just makes it blank.
# Hourly intervals on y axis
hours = mdates.HourLocator()
t_fmt = mdates.DateFormatter('%H')
ax.yaxis.set_major_locator(hours)
ax.yaxis.set_major_formatter(t_fmt)
I'm reading these docs but not understanding my error: https://matplotlib.org/api/dates_api.html, https://matplotlib.org/api/ticker_api.html
Matplotlib cannot plot times without corresponding date. This would make is necessary to add some arbitrary date (in the below case I took the 1st of january 2018) to the times. One may use datetime.datetime.combine for that purpose.
timetodatetime = lambda x:dt.datetime.combine(dt.date(2018, 1, 1), x)
time = list(map(timetodatetime, data.time))
ax.plot(data.date, time, '.')
Then the code from the question using HourLocator() would work fine. Finally, setting the limits on the axes would also require to use datetime objects,
ax.set_ylim([dt.datetime(2018,1,1,0), dt.datetime(2018,1,2,0)])
Complete example:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27',
'2018-02-04 11:55:09']
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
timetodatetime = lambda x:dt.datetime.combine(dt.date(2018, 1, 1), x)
time = list(map(timetodatetime, data.time))
ax.plot(data.date, time, '.')
# Monthly intervals on x axis
months = mdates.MonthLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(d_fmt)
## Hourly intervals on y axis
hours = mdates.HourLocator()
t_fmt = mdates.DateFormatter('%H')
ax.yaxis.set_major_locator(hours)
ax.yaxis.set_major_formatter(t_fmt)
ax.set_ylim([dt.datetime(2018,1,1,0), dt.datetime(2018,1,2,0)])
plt.show()