I have a pandas dataframe with datetime values including microseconds:
column1 column2
time
1900-01-01 10:39:52.887916 19363.876 19362.7575
1900-01-01 10:39:53.257916 19363.876 19362.7575
1900-01-01 10:39:53.808007 19363.876 19362.7575
1900-01-01 10:39:53.827894 19363.876 19362.7575
1900-01-01 10:39:54.277931 19363.876 19362.7575
I plot the dataframe as follows:
def plot(df):
ax = df.plot(y='column1', figsize=(20, 8))
df.plot(y='column2', ax=ax)
ax.get_yaxis().get_major_formatter().set_useOffset(False)
mpl.pyplot.show()
Notice on the image below that the microseconds are displayed as %f rather than their actual value.
That is, instead of 10:39:52.887916 it displays 10:39:52.%f
How can I display the actual microseconds in the tick labels (even if it's only a few significant digits)?
You should be able to set the major ticks to the format you want, using set_major_formatter:
In [14]:
import matplotlib as mpl
import matplotlib.dates
df = pd.DataFrame({'column1': [1,2,3,4],
'column2': [2,3,4,5]},
index =pd.to_datetime([1e8,2e8,3e8,4e8]))
def plot(df):
ax = df.plot(y='column1', figsize=(20, 8))
df.plot(y='column2', ax=ax)
ax.get_yaxis().get_major_formatter().set_useOffset(False)
ax.get_xaxis().set_major_formatter(matplotlib.dates.DateFormatter('%H:%M:%S.%f'))
#mpl.pyplot.show()
return ax
print df
column1 column2
1970-01-01 00:00:00.100000 1 2
1970-01-01 00:00:00.200000 2 3
1970-01-01 00:00:00.300000 3 4
1970-01-01 00:00:00.400000 4 5
If the problem do go away, then I think somewhere in the code the formatter format is specified incorrectly, namely %%f instead of %f, which returns a literal '%' character.
Related
I'm using matplotlib pyplot for plotting a time series of about 15000 observations. When I use this code for plotting without an x-axis data points:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(15,10)})
sns.set_palette("husl")
sns.set_style('whitegrid')
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['INTC'])
plt.show()
I get this, which is the plot I expect
The matter is that when I add the date as data points for the x-axis:
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['Date'],df['INTC'])
plt.show()
The same time series gets plotted in a weird manner:
The df looks like this:
index Date INTC
0 2022-02-04 09:30:00 47.77
1 2022-02-04 09:31:00 47.96
2 2022-02-04 09:32:00 47.81
3 2022-02-04 09:33:00 47.73
4 2022-02-04 09:34:00 47.57
...
Every observation has a time separation of 1 minute. What should I do to plot it properly including the date points in the x-axis? Thanks.
I am trying to convert HH:MM into the datetime format. It converts but it adds an unwanted year 1900. I don't know why?
My code:
df['HH:MM'] =
datetime
2019-10-01 08:19:40 08:19:40
2019-10-01 08:20:15 08:20:15
2019-10-01 08:21:29 08:21:29
2019-10-01 08:22:39 08:22:39
2019-10-01 08:29:07 08:29:07
Name: HH:MM, Length: 5, dtype: object
df['HH:MM'] = pd.to_datetime(cdf['HH:MM'], format = '%H:%M:%S', errors='ignore')
Present output
df['HH:MM'] =
datetime
2019-10-01 08:19:40 1900-01-01 08:19:40
2019-10-01 08:20:15 1900-01-01 08:20:15
2019-10-01 08:21:29 1900-01-01 08:21:29
2019-10-01 08:22:39 1900-01-01 08:22:39
2019-10-01 08:29:07 1900-01-01 08:29:07
Name: HH:MM, Length: 5, dtype: datetime64[ns]
Why I need this?
I am plotting HH:MM on the x-axis and value on the y-axis. The x-axis ticks look crazy and we cannot read even after I used plt.gcf().autofmt_xdate().
autofmt_xdate() will not work if the values are of type str, instead change the type to datetime and manipulate the xaxis through a Locator and a Formatter:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt, dates as mdates
np.random.seed(15)
dr = pd.date_range('2019-10-01 00:00:00', '2019-10-01 23:59', freq='1T')
df = pd.DataFrame({'HH:MM': dr.strftime('%H:%M'),
'y': np.random.random(len(dr)) * 10},
index=dr)
df['HH:MM'] = pd.to_datetime(df['HH:MM'])
ax = df.plot(kind='scatter', x='HH:MM', y='y', rot=45)
ax.xaxis.set_major_locator(mdates.HourLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
plt.tight_layout()
plt.show()
Sample Data and imports:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt, dates as mdates
np.random.seed(15)
dr = pd.date_range('2019-10-01 00:00:00', '2019-10-01 23:59', freq='1T')
df = pd.DataFrame({'HH:MM': dr.strftime('%H:%M'),
'y': np.random.random(len(dr)) * 10},
index=dr)
df:
HH:MM y
2019-10-01 00:00:00 00:00 8.488177
2019-10-01 00:01:00 00:01 1.788959
2019-10-01 00:02:00 00:02 0.543632
2019-10-01 00:03:00 00:03 3.615384
2019-10-01 00:04:00 00:04 2.754009
Convert 'HH:MM' to_datetime then plot:
df['HH:MM'] = pd.to_datetime(df['HH:MM'])
ax = df.plot(kind='scatter', x='HH:MM', y='y', rot=45)
To adjust the number of ticks and format in 'HH:MM' format set the Date Locator and the Date Formatter:
ax.xaxis.set_major_locator(mdates.HourLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
Adjust the type of locator or interval to increase or decrease the number of ticks.
I have a dataframe of a long time range in format datetime64[ns] and a int value
Data looks like this:
MIN_DEP DELAY
0 2018-01-01 05:09:00 0
1 2018-01-01 05:13:00 0
2 2018-01-01 05:39:00 0
3 2018-01-01 05:43:00 0
4 2018-01-01 06:12:00 34
... ... ...
77005 2020-09-30 23:42:00 0
77006 2020-09-30 23:43:00 0
77007 2020-09-30 23:43:00 43
77008 2020-10-01 00:18:00 0
77009 2020-10-01 00:59:00 0
[77010 rows x 2 columns]
MIN_DEP datetime64[ns]
DELAY int64
dtype: object
Target is to plot all the data in just a 00:00 - 24:00 range on the x-axis, no dates anymore.
As i try to plot it, the timeline is 00:00 at any point. How to fix this?
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
tried to convert the timestamps before to dt.time and plot it then
pd_to_stat['time'] = pd.to_datetime(pd_to_stat['MIN_DEP'], format='%H:%M').dt.time
fig, ax = plt.subplots()
ax.plot(pd_to_stat['time'],pd_to_stat['DELAY'])
plt.show()
Plot does not allow to do that:
TypeError: float() argument must be a string or a number, not 'datetime.time'
According to your requirement, I guess you don't need the dates and as well as the seconds field in your timestamp. So you need a little bit of preprocessing at first.
Remove the seconds field using the code below
dataset['MIN_DEP'] = dataset['MIN_DEP'].strftime("%H:%M")
Then you can remove the date from your timestamp in the following manner
dataset['MIN_DEP'] = pd.Series([val.time() for val in dataset['MIN_DEP']])
Then you can plot your data in the usual manner.
This seems to work now. I did not recognise, the plot was still splitting up in dates. To work around I hat to replace all the dates with the same date and plottet it hiding the date using DateFormatter
import matplotlib.dates as mdates
pd_to_stat['MIN_DEP'] = pd_to_stat['MIN_DEP'].map(lambda t: t.replace(year=2020, month=1, day=1))
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
I have this following df :
date values
2020-08-06 08:00:00 5
2020-08-06 09:00:00 10
2020-08-06 10:00:00 0
2020-08-17 08:00:00 8
2020-08-17 09:00:00 15
I want to plot this df so I do : df.set_index('date')['values'].plot(kind='line') but it shows all the dates between the 6th and the 17th.
How can I plot the graph only with the dates inside my df ?
I assume that date column is of datetime type.
To draw for selected dates only, the index must be built on
the principle "number of day from a unique list + hour".
But to suppress the default x label ticks, you have to define
your own, e.g. each 8 h in each date to be drawn.
Start from converting your DataFrame as follows:
idx = df['date'].dt.normalize().unique()
dateMap = pd.Series(np.arange(idx.size) * 24, index=idx)
df.set_index(df.date.dt.date.map(dateMap) + df.date.dt.hour, inplace=True)
df.index.rename('HourNo', inplace=True); df
Now, for your data sample, it has the following content:
date values
HourNo
8 2020-08-06 08:00:00 5
9 2020-08-06 09:00:00 10
10 2020-08-06 10:00:00 0
32 2020-08-17 08:00:00 8
33 2020-08-17 09:00:00 15
Then generate your plot and x ticks positions and labels:
fig, ax = plt.subplots(tight_layout=True)
df.loc[:, 'values'].plot(style='o-', rot=30, ax=ax)
xLoc = np.arange(0, dateMap.index.size * 24, 8)
xLbl = pd.concat([ pd.Series(d + pd.timedelta_range(start=0, freq='8H',
periods=3)) for d in dateMap.index ]).dt.strftime('%Y-%m-%d\n%H:%M')
plt.xticks(ticks=xLoc, labels=xLbl, ha='right')
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.set_title('Set the proper heading')
ax.grid()
plt.show()
I added also the grid.
The result is:
And the final remark: Avoid column names which are the same as existing
Pandas methods or arrtibutes (e.g. values).
Sometimes it is the cause of "stupid" errors (you intend to refer to
a column, but you actually refer to a metod or attribute).
id timestamp energy
0 a 2012-03-18 10:00:00 0.034
1 b 2012-03-20 10:30:00 0.052
2 c 2013-05-29 11:00:00 0.055
3 d 2014-06-20 01:00:00 0.028
4 a 2015-02-10 12:00:00 0.069
I want to plot these data like below.
just time on x-axis, not date nor datetime.
because I want to see the values per each hour.
https://i.stack.imgur.com/u73eJ.png
but this code plot like this.
plt.plot(df['timestamp'], df['energy'])
https://i.stack.imgur.com/yd6NL.png
I tried some codes but they just format the X data hide date part and plot like second graph.
+ df['timestamp'] is datetime type.
what should I do? Thanks.
you can convert your datetime into time, if your df["timestamp"] is already in datetime format then
df["time"] = df["timestamp"].map(lambda x: x.time())
plt.plot(df['time'], df['energy'])
if df["timestamp"] is of type string then you can add one more line in front as df["timestamp"] = pd.to_datetime(df["timestamp"])
Update: look like matplotlib does not accept time types, just convert to string
df["time"] = df["timestamp"].map(lambda x: x.strftime("%H:%M"))
plt.scatter(df['time'], df['energy'])
First check, if type of df["timestamp"] is in datetime format.
if not
import pandas as pd
time = pd.to_datetime(df["timestamp"])
print(type(time))
Then,
import matplotlib.pyplot as plt
values = df['energy']
plt.plot_date(dates , values )
plt.xticks(rotation=45)
plt.show()