How to properly display date from csv in matplotlib plot? - python

I have a csv with the following columns: recorded, humidity and temperature. I want to display the recorded values(date and time) on the x axis and the humidity on the y axis. How can I properly display the dates(it is quite a big csv), as my current plot has black instead of proper date numbers... My date format is like this: 2019-09-12T07:26:55, having the date and also the time displayed in the csv.
I have displayed the plot using this code:
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv('home_data.csv')
plt.plot(data.recorded, data.humidity)
plt.xlabel('date')
plt.ylabel('humidity')
plt.title('Visualizing date and humidity')
plt.show()
This is a print screen of the plot:
https://snipboard.io/d4hfS7.jpg

Actually, the plot is displaying every date in your dataset. They are so many that they seem just like a black blob. You can downsample the xticks in order to increase the readability. Do something like this:
fig, ax = plt.subplots()
ax.plot(data.recorded, data.humidity)
# some axes labelling
# Reduce now the number of the ticks printed in the figure
ax.set_xticks(ax.get_xticks()[::4])
ax.get_xticklabels(ax.get_xticks(), rotation=45)
In line ax.set_xticks(ax.get_xticks()[::4]) you are setting the ticks of the x-axis
picking 1 date every 4 using the property of the list. It will reduce the number of dates printed. You can increase the number as much as you want.
To increase the readibility, you can rotate the tick labels as I suggested in the line
ax.get_xticklabels(ax.get_xticks(), rotation=45).
Hope this helps.

Related

How to overlay time series from each day on one plot

I am trying to plot data from a data frame with hourly frequency on a plot where each day is its own line and the x-axis is hour. The data frame is shown below and so is the resulting graph I get when simply setting the x-axis as 'hb' and y-axis as 'BaseCase'. The plot is close to what I want, but connects the end points to the starting point. How do I go about avoiding the straight lines across the plot?
scens = pd.read_csv(---)
scens['datetime'] = pd.to_datetime(scens['datetime'])
scens.drop(scens.tail(2).index,inplace=True)
source = ColumnDataSource(scens)
p = figure()
p.line(x='hb', y='BaseCase', source=source)
show(p)
The above code is how I get the plot at the bottom of the post
If you are opened to other packages, consider seaborn
import seaborn as sns
sns.lineplot(data=df,x='hb', y='BasedCase',
hue= df['datetime'].df.normalize())
Or with just pandas:
for date, d in df.groupby(df['datetime'].df.normalize()):
d.plot(x='hb', y='BasedCase', label=date)

matplotlib formatting x axis with timestamps from big data

I am trying to create a plot that has a lot of data on it. This one in particular has about 550 points on it, each with its own timestamp. When I plot this, there are so many timestamps that I just get a black bar. I know it is not reasonable to expect to be able to make all timestamps visible, but is there a way to format the ticks t=so that they represent the range of values?
Here is my code:
plt.figure(1)
plt.scatter(x_axis_input, y_axis_input, s=DOT_SIZE)
plt.xlabel('timestamp')
plt.ylabel('value')
plt.title('test')
plt.savefig('plot_test.png')
plt.close()
and here is the resulting plot:
Link to plot

change the name of ylabel in matplotlib

i want to plot a consumption of electricty every 30 minutes over 2 month
my code is working my problem is in xlabel i don't want to have range(1,2....48*58)
but i want to have some thing like that between 1 and 48*30 give a nme of january between the secon 48*28 giving the name of february etc...
plt.xticks(rotation=70)
mask3 = (train['date'] >= '2008-01-01') & (train['date'] <= '2008-02-27')
week = train.loc[mask3]
plt.plot(range(48*58),week.LoadNette)
plt.ylabel("Electricy consumption")
plt.xlabel("Month")
plt.title('Electricity consumption / week')
plt.show()
By searching «python matplotlib use dates as xlabel» on a search engine, you can find an example of what you want in the Matplotlib documentation : https://matplotlib.org/examples/api/date_demo.html.
This example supposes your xdata is dates however, which is not the case right now. You would need to create a list of dates and use that instead of your range(48*58) list, like this :
import pandas
xdata = pandas.date_range(
pandas.to_datetime("2008-01-01"),
pandas.to_datetime("2008-02-27 23:30:00"),
freq=pandas.to_timedelta(30,unit="m")).tolist()
This creates a list of datetimes from your start time to your end time at a frequency of 30 minutes.
After that, you'll need to use the example in the link above. Here it is reproduced and tweaked a bit to your needs, but you'll need to play around with it to set it properly. You can find many more examples of using dates in matplotlib now that you'll be using a date list as input for your plot.
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
# define locators for every month and every day
months = mdates.MonthLocator() # every month
days = mdates.DayLocator() # every day
monthsFmt = mdates.DateFormatter('%m')
# create the plot and plot your data
fig, ax = plt.subplots()
ax.plot(xdata, week.LoadNette)
# format the x ticks to have a major tick every month and a minor every day
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthsFmt)
ax.xaxis.set_minor_locator(days)
# format the xlabel to only show the month
ax.format_xdata = mdates.DateFormatter('%m')
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
fig.autofmt_xdate()
plt.show()
Using dates in Matplotlib can be intimidating, but it's better in the long run than just hacking the labels you want this specific time.

X labels matplotlib

I want to change x axis to years. The years are saves in the variable years.
I want to make plot of my data that looks like this:
It should look like this image
However, I am not able to create x axes with a years. My plot looks like the following image:
This is an example of produced image by my code
My code looks as follows:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("data1.csv")
demand = data["demand"]
years = data["year"]
plt.plot( demand, color='black')
plt.xlabel("Year")
plt.ylabel("Demand (GW)")
plt.show()
I am thankful for any advice.
The plot method in your example does not know the scaling of your data. So, for simplicity it treats the values of demand as being one unit apart from each other. If you want your x-axis to represent years, you have to tell matplotlib how many values of demand it should treat as "one year". If your data is a monthly demand, it is obviously 12 values per year. And here we go:
# setup a figure
fig, (ax1, ax2) = plt.subplots(2)
# generate some random data
data = np.random.rand(100)
# plot undesired way
ax1.plot(data)
# change the tick positions and labels ...
ax2.plot(data)
# ... to one label every 12th value
xticks = np.arange(0,100,12)
# ... start counting in the year 2000
xlabels = range(2000, 2000+len(xticks))
ax2.set_xticks(xticks)
ax2.set_xticklabels(xlabels)
plt.show()

How to highlight specific x-value ranges

I'm making a visualization of historical stock data for a project, and I'd like to highlight regions of drops. For instance, when the stock is experiencing significant drawdown, I would like to highlight it with a red region.
Can I do this automatically, or will I have to draw a rectangle or something?
Have a look at axvspan (and axhspan for highlighting a region of the y-axis).
import matplotlib.pyplot as plt
plt.plot(range(10))
plt.axvspan(3, 6, color='red', alpha=0.5)
plt.show()
If you're using dates, then you'll need to convert your min and max x values to matplotlib dates. Use matplotlib.dates.date2num for datetime objects or matplotlib.dates.datestr2num for various string timestamps.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
t = mdates.drange(dt.datetime(2011, 10, 15), dt.datetime(2011, 11, 27),
dt.timedelta(hours=2))
y = np.sin(t)
fig, ax = plt.subplots()
ax.plot_date(t, y, 'b-')
ax.axvspan(*mdates.datestr2num(['10/27/2011', '11/2/2011']), color='red', alpha=0.5)
fig.autofmt_xdate()
plt.show()
Here is a solution that uses axvspan to draw multiple highlights where the limits of each highlight are set by using the indices of the stock data corresponding to the peaks and troughs.
Stock data usually contain a discontinuous time variable where weekends and holidays are not included. Plotting them in matplotlib or pandas will produce gaps along the x-axis for weekends and holidays when dealing with daily stock prices. This may not be noticeable with long date ranges and/or small figures (like in this example), but it will become apparent if you zoom in and it may be something that you want to avoid.
This is why I share here a complete example that features:
A realistic sample dataset that includes a discontinuous DatetimeIndex based on the New York Stock Exchange trading calendar imported with the pandas_market_calendars as well as fake stock data that looks like the real thing.
A pandas plot created with use_index=False which removes the gaps for weekends and holidays by using instead a range of integers for the x-axis. The returned ax object is used in a way that avoids the need to import matplotlib.pyplot (unless you need plt.show).
An automatic detection of drawdowns over the entire date range by using the scipy.signal find_peaks function which returns the indices needed to plot the highlights with axvspan. Computing drawdowns in a more correct way would require a clear definition of what would count as a drawdown and would lead to more complicated code which is a topic for another question.
Properly formatted ticks created by looping through the timestamps of the DatetimeIndex seeing as all the convenient matplotlib.dates tick locators and formatters as well as DatetimeIndex properties like .is_month_start cannot be used in this case.
Create sample dataset
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import pandas_market_calendars as mcal # v 1.6.1
from scipy.signal import find_peaks # v 1.5.2
# Create datetime index with a 'trading day end' frequency based on the New York Stock
# Exchange trading hours (end date is inclusive)
nyse = mcal.get_calendar('NYSE')
nyse_schedule = nyse.schedule(start_date='2019-10-01', end_date='2021-02-01')
nyse_dti = mcal.date_range(nyse_schedule, frequency='1D').tz_convert(nyse.tz.zone)
# Create sample of random data for daily stock closing price
rng = np.random.default_rng(seed=1234) # random number generator
price = 100 + rng.normal(size=nyse_dti.size).cumsum()
df = pd.DataFrame(data=dict(price=price), index=nyse_dti)
df.head()
# price
# 2019-10-01 16:00:00-04:00 98.396163
# 2019-10-02 16:00:00-04:00 98.460263
# 2019-10-03 16:00:00-04:00 99.201154
# 2019-10-04 16:00:00-04:00 99.353774
# 2019-10-07 16:00:00-04:00 100.217517
Plot highlights for drawdowns with properly formatted ticks
# Plot stock price
ax = df['price'].plot(figsize=(10, 5), use_index=False, ylabel='Price')
ax.set_xlim(0, df.index.size-1)
ax.grid(axis='x', alpha=0.3)
# Highlight drawdowns using the indices of stock peaks and troughs: find peaks and
# troughs based on signal analysis rather than an algorithm for drawdowns to keep
# example simple. Width and prominence have been handpicked for this example to work.
peaks, _ = find_peaks(df['price'], width=7, prominence=4)
troughs, _ = find_peaks(-df['price'], width=7, prominence=4)
for peak, trough in zip(peaks, troughs):
ax.axvspan(peak, trough, facecolor='red', alpha=.2)
# Create and format monthly ticks
ticks = [idx for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
ax.set_xticks(ticks)
labels = [tick.strftime('%b\n%Y') if df.index[ticks[idx]].year
!= df.index[ticks[idx-1]].year else tick.strftime('%b')
for idx, tick in enumerate(df.index[ticks])]
ax.set_xticklabels(labels)
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_title('Drawdowns are highlighted in red', pad=15, size=14);
For the sake of completeness, it is worth noting that you can achieve exactly the same result using the fill_between plotting function, though it takes a few more lines of code:
ax.set_ylim(*ax.get_ylim()) # remove top and bottom gaps with plot frame
drawdowns = np.repeat(False, df['price'].size)
for peak, trough in zip(peaks, troughs):
drawdowns[np.arange(peak, trough+1)] = True
ax.fill_between(np.arange(df.index.size), *ax.get_ylim(), where=drawdowns,
facecolor='red', alpha=.2)
You are using matplotlib's interactive interface and want to have dynamic ticks when you zoom in? Then you will need to use locators and formatters from the matplotlib.ticker module. You could for example keep the major ticks fixed like in this example and add dynamic minor ticks to show days or weeks of the year when zooming in. You can find an example of how to do this at the end of this answer.

Categories

Resources