I have a code in which I create a data frame from the data of a bunch of sensors, select which type of sensor I wish to observe, and the period over which I would like to observe this sensor. The data is then plotted and saved to an Excel file. The following is a snippet of what my data frame looks like.
Site Name Sensor Display Name Readings Datetime
342 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 00:06:00
343 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 00:21:00
344 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 00:36:00
345 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 00:52:00
346 100 Adelaide EY_VAV_31_05 - Space Temperature 2021-01-01 01:06:00
My issue is that when I plot the data, I would like to get the dates of each day I observe to be the labels for my X-axis. Instead, I get the number of hours since the beginning of the data. I'm a little new to matplotlib, and don't fully understand how I can fix this. The following is the code used to plot everything.
chosen['Readings Value'].plot()
plt.xlabel('Hour')
plt.ylabel('Sensor Reading')
plt.show()
I don't want to use pd.groupby to reorganize the data by day, as I need to keep all of the data for every hour. Any help will be appreciated, thank you.
I am going to assume that "Readings Datetime" in your dataframe is a single column; (that is, the column contains values like 2021-01-01 00:06:00).
The column should also be type Pandas Timestamp. If not, then you can convert it as follows:
df['Readings Datetime'] = pd.to_datetime(df['Readings Datetime'])
Matplotlib has its own datetime type. You may already know this and have converted, but in case not, do the following:
import matplotlib.dates as mdates
xdates = mdates.date2num([dtm.to_pydatetime() for dtm in df['Readings Datetime']])
where xdates is the variable you will use to populate your x-axis data.
Then to format the x-asis with only the dates:
formatter = mdates.DateFormatter('%Y-%b-%d') # or similar format string
ax.xaxis.set_major_formatter(formatter) # `ax` is your Axes object
I just now looked more closely at your code:
chosen['Readings Value'].plot()
plt.xlabel('Hour')
plt.ylabel('Sensor Reading')
plt.show()
Assuming chosen is your dataframe, you appear to be using a combination of Pandas (which uses matplotlib under the hood) and matplotlib directly (plt) to plot your data. This can result in some confusing situations, because Pandas makes certain assumptions when plotting, and takes some of the control away from you.
You can avoid this by either doing all of your plotting directly in matplotlib,
or continue using Pandas to plot, but before calling chosen[...].plot() call
fig, ax = plt.subplots()
to gain access to the Figure and Axes objects.
Then you can call ax.xaxis.set_major_formatter() as shown above. There is also an example here that also shows how to reformat the x-axis when using Pandas to plot with datetimes on the x-axis.
Related
Python beginner here :/!
The csv files can be found here (https://www.waterdatafortexas.org/groundwater/well/8739308)
#I'm trying to subset my data and plot them by years or every 6 months but I just cant make it work, this is my code so far
data=pd.read_csv('Water well.csv')
data["datetime"]=pd.to_datetime(data["datetime"])
data["datetime"]
fig, ax = plt.subplots()
ax.plot(data["datetime"], data["water_level(ft below land surface)"])
ax.set_xticklabels(data["datetime"], rotation= 90)
and this is my data and the output. As you can see, it only plots 2021 by time
This is my data of water levels from 2016 to 2021 and the output of the code
data
When you run your script, you get the following warning:
UserWarning: FixedFormatter should only be used together with FixedLocator
ax.set_xticklabels(data["datetime"], rotation= 90)
Your example demonstrates, why they included this warning.
Comment out your line
#ax.set_xticklabels(data["datetime"], rotation= 90)
and you have the following (correct) output:
Your code takes now the nine automatically generated x-axis ticks, removes the correct labels, and labels them instead with the first nine entries of the dataframe. Obviously, these labels are wrong, and this is the reason they provide you with the warning - either let matplotlib do the automatic labeling or do both using FixedFormatter and FixedLocator to ensure that tick positions and labels match.
For more information on Tick locators and formatters consult the matplotlib documentation.
P.S.: You also have to invert the y-axis because the data are in ft below land surface.
The problem is, you have too much data, you have to simplify it.
At first you can try to do something like this:
data["datetime"]=pd.to_datetime(data["datetime"])
date = data["datetime"][0::1000][0:10]
temp = data["water_level(ft below land surface)"][0::1000][0:10]
fig, ax = plt.subplots()
ax.plot(date, temp)
ax.set_xticklabels(date, rotation= 90)
date = data["datetime"][0::1000][0:10]
This line mean: take the index 0, then 1000, then 2000, ...
So you will have an new array. And then with this new array you just take the first 10 indexes.
It's a dirty solution
The best solution in my opinion is to create a new dataset with the average temperature for each day or each week. And after you display the result
I have two datasets that contain temperature and light sensor readings. The measurements were done from 22:35:41 - 04:49:41.
The problem with this datasets is to plot the measurements with respect to the datetime.date format when the measurements are taken from one day to another (22:35:41 - 04:49:41). The plot-function automatically starts from 00:00 and puts the data that was measured before 00:00 to the end of the plot.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Temperature = pd.read_excel("/kaggle/input/Temperature_measurement.xlsx")
Light = pd.read_excel("/kaggle/input/Light_measurement.xlsx")
sns.lineplot(x="Time",y="Light", data = Light)
sns.lineplot(y="Temperature", x="Time", data = Temperature)
plt.show()
This is a link to the dataset
Here is a link to the Jupyter Notebook
First you need to convert your times to a Pandas Timestamp. Pandas Timestamps don't really support a time on its own, they will attach a date to them, but that's fine since we'll hide that part later.
We also need to detect day changes, which we can do by looking at where the time wraps, which we can find by looking at a time that's smaller than its predecessor.
We can count the cumulative wraps and add that number of dates to our timestamps.
Let's define a function to take the datetime.time objects, convert them to native Pandas Timestamps (using an arbitrary date of 1900-01-01, which is the default for Pandas) and adjusting the day according to the wraps (so we end up with our final times on 1900-01-02):
def normalize_time(series):
series = pd.to_datetime(series, format="%H:%M:%S")
series += pd.to_timedelta(series.lt(series.shift()).cumsum(), unit="D")
return series
Let's now apply it to our DataFrames:
Light["Time"] = normalize_time(Light["Time"])
Temperature["Time"] = normalize_time(Temperature["Time"])
Plotting the data now will look correct, with the times being continuous. Except that the labels of the X ticks will try to display the dates, which are not really what we care about, so let's fix that part now.
We can use Matplotlib's set_major_formatter together with a DateFormatter to include times only:
import matplotlib.dates
ax = plt.subplot()
sns.lineplot(x="Time", y="Light", data=Light)
sns.lineplot(x="Time", y="Temperature", data=Temperature)
ax.xaxis.set_major_formatter(
matplotlib.dates.DateFormatter("%H:%M")
)
plt.show()
This produces X ticks every hour, which seem to be a great fit for this data set.
I have a csv with the following columns: recorded, humidity and temperature. I want to display the recorded values(date and time) on the x axis and the humidity on the y axis. How can I properly display the dates(it is quite a big csv), as my current plot has black instead of proper date numbers... My date format is like this: 2019-09-12T07:26:55, having the date and also the time displayed in the csv.
I have displayed the plot using this code:
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv('home_data.csv')
plt.plot(data.recorded, data.humidity)
plt.xlabel('date')
plt.ylabel('humidity')
plt.title('Visualizing date and humidity')
plt.show()
This is a print screen of the plot:
https://snipboard.io/d4hfS7.jpg
Actually, the plot is displaying every date in your dataset. They are so many that they seem just like a black blob. You can downsample the xticks in order to increase the readability. Do something like this:
fig, ax = plt.subplots()
ax.plot(data.recorded, data.humidity)
# some axes labelling
# Reduce now the number of the ticks printed in the figure
ax.set_xticks(ax.get_xticks()[::4])
ax.get_xticklabels(ax.get_xticks(), rotation=45)
In line ax.set_xticks(ax.get_xticks()[::4]) you are setting the ticks of the x-axis
picking 1 date every 4 using the property of the list. It will reduce the number of dates printed. You can increase the number as much as you want.
To increase the readibility, you can rotate the tick labels as I suggested in the line
ax.get_xticklabels(ax.get_xticks(), rotation=45).
Hope this helps.
Below shows a plot of simulated data, which contains the xticks that I want to modify. By default, the pd.df.plot chooses dates that are approximately 3 months apart as ticks. But what I want is each month being a tick. What is the best way to do this? What about seasonal ticks? Thank you in advance.
First of all you have to convert pandas date objects to python date objects. This conversion is needed because of matplotlib internal date conversion functions. Then use functions from matplotlib.dates to set desired formatter and tick positions like here:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
# convert date objects from pandas format to python datetime
index = pd.date_range(start = "2015-07-01", end = "2017-01-01", freq = "D")
index = [pd.to_datetime(date, format='%Y-%m-%d').date() for date in index]
data = np.random.randint(1,100, size=len(index))
df = pd.DataFrame(data=data,index=index, columns=['data'])
print (df.head())
ax = df.plot()
# set monthly locator
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
# set formatter
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()
plt.show()
For season labels you have to construct it by yourself and then set it with plt.setp function (for month 02 set label winter, 04 - spring etc.):
plt.setp(new_labels, rotation=90, fontsize=9).
head of df:
data
2015-07-01 26
2015-07-02 33
2015-07-03 46
2015-07-04 69
2015-07-05 17
This answer is based on the one by Serenity as well as on this one by ImportanceOfBeingErnest.
The best way to customize time series tick labels is to use the tick locators and formatters from the matplotlib.dates module (mdates). Though it is worth noting that if you want a tick frequency based on the same unit as the time series you are plotting, it may be more convenient to create and format the tick labels using the dates as strings like in the answers to this question concerning pandas bar plots.
As described in the documentation, pandas uses matplotlib to create plots with its own custom tick formatters for time series:
pandas provides custom formatters for timeseries plots. These change the formatting of the axis labels for dates and times. By default, the custom formatters are applied only to plots created by pandas with DataFrame.plot() or Series.plot().
The ticks and labels of pandas time series plots are currently formatted like this by default:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.dates as mdates # v 3.3.2
# Create random dataset stored as a pandas DataFrame with a DatetimeIndex
rng = np.random.default_rng(seed=1) # random number generator
date_day = pd.date_range(start='2015-07-01', end='2016-12-31', freq='D')
traffic = rng.lognormal(sigma=2, size=date_day.size)
df_day = pd.DataFrame(dict(traffic=traffic), index=date_day)
# Create pandas plot with default settings except for figure size
df_day.plot(figsize=(10,5));
To be able to use the mdates tick locators and formatters and override the default tick formatting, the pandas dates must be correctly recognized by matplotlib. The problem is that pandas and matplotlib have different approaches to computing the date numbers that are used to locate the ticks on the time axis (the x-axis by default).
In pandas, time is measured in nanoseconds starting at zero on 1970-01-01 00:00:00 (the origin of the Unix epoch) and individual time points are stored as pandas timestamp objects. But when it comes to creating time scales for plots, pandas uses another numbering system which starts at the same origin but then increases by 1 for each period of the chosen frequency (in this example the frequency is in days).
Matplotlib uses the same default origin as pandas since version 3.3.0 released in July 2020 but the dates are always numbered in terms of days:
Matplotlib represents dates using floating point numbers specifying the number of days since a default epoch of 1970-01-01 UTC; for example, 1970-01-01, 06:00 is the floating point number 0.25.
You can check what numbers are being used for the scale by running ax.get_xticks(), with ax = df.plot() when using pandas.
As you might have guessed, this means no date conversion is needed when the time series has a frequency in days, as illustrated here with a simple custom tick locator and formatter:
ax = df_day.plot(figsize=(10,5))
# Create custom ticks using matplotlib date tick locator and formatter
loc = mdates.MonthLocator(interval=2)
ax.xaxis.set_major_locator(loc)
fmt = mdates.DateFormatter('%b\n%Y')
ax.xaxis.set_major_formatter(fmt)
This particular case makes it convenient for keeping other pandas default settings for the x-axis limits and minor x ticks. But this is an exception to the general rule.
To be able to use mdates tick locators and formatters with a pandas plot of a time series of any type of frequency, you need to use the (long-existing yet absent-from-the-docstring and barely-documented) x_compat=True argument. The following example illustrates its use with the same dataset resampled to a monthly frequency. It may often be the case that you just want to slightly tweak the default pandas format, so in the following example, the default format is recreated from scratch to show what methods can be used to adjust it:
# Resample time series to monthly frequency and plot it using date
# numbers that are compatible with mdates
df_month = df_day.resample('MS').sum()
ax = df_month.plot(figsize=(10,5), x_compat=True)
# Set major and minor date tick locators
maj_loc = mdates.MonthLocator(bymonth=np.arange(1,12,2))
ax.xaxis.set_major_locator(maj_loc)
min_loc = mdates.MonthLocator()
ax.xaxis.set_minor_locator(min_loc)
# Set major date tick formatter
zfmts = ['', '%b\n%Y', '%b', '%b-%d', '%H:%M', '%H:%M']
maj_fmt = mdates.ConciseDateFormatter(maj_loc, zero_formats=zfmts, show_offset=False)
ax.xaxis.set_major_formatter(maj_fmt)
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_xlim(df_month.index.min(), df_month.index.max());
Documentation: pd.date_range, date format codes, mdates.ConciseDateFormatter, fig.autofmt_xdate
I had a hard time trying to get #Serenity answer to work because I'm working directly with Matplotlib instead of plotting the Pandas dataset. So if you are one of these, my answer might help.
Plotting with Matplotlib.plot()
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Process dataset
bitcoin['Date'] = pd.to_datetime(bitcoin['Date'])
bitcoin['Open'] = pd.to_numeric(bitcoin['Open'])
# Plot
plt.figure()
plt.plot(bitcoin['Date'], bitcoin['Open'])
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=4))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
plt.gcf().autofmt_xdate() # Rotation
plt.show()
bitcoin[['Date', 'Open']].head()
Date Open
0 2017-09-05 4228.29
1 2017-09-04 4591.63
2 2017-09-03 4585.27
3 2017-09-02 4901.42
4 2017-09-01 4701.76
I have a pandas dataframe with dates in column 0 and times in column 1. I wish to plot data in columns 2,3,4...n as a function of the date and time. How do I go about formatting the tick labels in the code below so that I can display both the Date and time in the plot. Thanks in advance. I'm new to stackoverflow (and python for that matter) so sorry but I don't have enough a reputation that allows me to attach the image that I get from my code below.
df3=pd.read_table('filename.txt',
sep=',',
skiprows=4,
na_values='N\A',
index_col=[0,1]) # date and time are my indices
datedf=df3.ix[['01:07:2013'],['AOT_1640','AOT_870']]
fig, axes = plt.subplots(nrows=2, ncols=1)
for i, c in enumerate(datedf.columns):
print i,c
datedf[c].plot(ax=axes[i], figsize=(12, 10), title=c)
plt.savefig('testing123.png', bbox_inches='tight')
You could combine columns 0 and 1 into a single date & time column, set that to your index and then the pandas .plot attribute will automatically use the index as the x-tick labels. Hard to say how it will work with your data set as I can't see it but the main point is that Pandas uses the index for the x-tick labels unless you tell it not to. Be warned that this doesn't work well with hierarchical indexing (at least in my very limited experience).