Python matplotlib.dates.date2num: converting numpy array to matplotlib datetimes - python

I am trying to plot a custom chart with datetime axis. My understanding is that matplotlib requires a float format which is days since epoch. So, I want to convert a numpy array to the float epoch as required by matplotlib.
The datetime values are stored in a numpy array called t:
In [235]: t
Out[235]: array(['2008-12-01T00:00:59.000000000-0800',
'2008-12-01T00:00:59.000000000-0800',
'2008-12-01T00:00:59.000000000-0800',
'2008-12-01T00:09:26.000000000-0800',
'2008-12-01T00:09:41.000000000-0800'], dtype='datetime64[ns]')
Apparently, matplotlib.dates.date2num only accepts a sequence of python datetimes as input (not numpy datetimes arrays):
import matplotlib.dates as dates
plt_dates = dates.date2num(t)
raises AttributeError: 'numpy.datetime64' object has no attribute 'toordinal'
How should I resolve this issue? I hope to have a solution that works for all types of numpy.datetime like object.
My best workaround (which I am not sure to be correct) is not to use date2num at all. Instead, I try to use the following:
z = np.array([0]).astype(t.dtype)
plt_dates = (t - z)/ np.timedelta64(1,'D')
Even, if this solution is correct, it is nicer to use library functions, instead of manual adhoc workarounds.

For a quick fix, use:
import matplotlib.dates as dates
plt_dates = dates.date2num(t.to_pydatetime())
or:
import matplotlib.dates as dates
plt_dates = dates.date2num(list(t))
It seems the latest (matplotlib.__version__ '2.1.0') does not like numpy arrays... Edit: In my case, after checking the source code, the problem seems to be that the latest matplotlib.cbook cannot create an iterable from the numpy array and thinks the array is a number.
For similar but a bit more complex problems, check http://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64, possibly Why do I get "python int too large to convert to C long" errors when I use matplotlib's DateFormatter to format dates on the x axis?, and maybe matplotlib plot_date AttributeError: 'numpy.datetime64' object has no attribute 'toordinal' (if someone answers)
Edit: someone answered, his code using to_pydatetime() seems best, also: pandas 0.21.0 Timestamp compatibility issue with matplotlib, though that did not work in my case (because of python 2???)

Related

how to convert datetime to numeric data type?

I have a dataset as
time MachineId
1530677359000000000 01081081
1530677363000000000 01081081
1530681023000000000 01081090
1530681053000000000 01081090
1530681531000000000 01081090
So my codes goes like:
import pandas as pd
from datetime import datetime
import time
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
df= pd.read_csv('acn.csv')`
df['time']=pd.to_datetime(df['time'], unit='ns')` #converting the epoch nanosec time to datetime-format
print(df.head())
Output:
time MachineId
0 2018-07-04 04:09:19 1081081.0
1 2018-07-04 04:09:23 1081081.0
2 2018-07-04 05:10:23 1081090.0
3 2018-07-04 05:10:53 1081090.0
4 2018-07-04 05:18:51 1081090.0
and now I want to change my data of time to numeric to generate a plot between time and machine id
dates = plt.dates.date2num(df['time'])
df.plot(kind='scatter',x='dates',y='MachineId')
plt.show()
which throws a error as :
AttributeError: 'module' object has no attribute 'dates'
How can I change datetime format to numeric so that a plot can be formed ?
You got the following error:
AttributeError: 'module' object has no attribute 'dates'
Your error message is telling you that matplotlib.pyplot.dates (plt.dates) doesn't exist. (The error says that there's a module that you're calling 'dates' but it doesn't exist).
So you need to fix that error before you worry about converting anything. Did you mean to call matplotlib.dates.date2num instead? In your code you have the following:
import matplotlib.dates as mdate
So maybe you meant to call mdate.date2num instead? That should eliminate the AttributeError.
If that doesn't work for you, you could try what is suggested in the link provided by one of the other commenters, to use pandas to_pydatetime. I'm not familiar with it, but in this example page, it is accessed as Series.dt.to_pydatetime()
All of this converting is just necessary because you are trying to use df.plot; maybe you should consider just calling matplotlib directly. For example, could you just use plt.plot_date instead? (here's the link to it). Pandas is excellent, but the plotting interface isn't as mature as the rest of it. As an example (I'm not saying this is the exact problem you are having) but here is a known bug in pandas regarding plotting dates. Here is an older stack overflow thread where someone stubs out a plt.plot_date method for you.
You can directly plot dates as well. For example if you want to have the date on the x-axis you pass the dates in ax.plot(df.time, ids). I think this might the closest thing to what you look for.

Checking if a python variable is a date?

One thing that I'm finding hard with the pandas/numpy combo is dealing with dates. My dataframe time series indices are often DateTimeIndexes containing Timestamps but sometimes seem to be something else (e.g. datetime.Date or numpy.datetime64).
Is there a generic way to check if a particular object is a date, i.e. any of the known date variable types? Or is that a function I should look to create myself?
Thanks!
I use this function to convert a series to a consistent datetime object in pandas / numpy. It works with both scalars and series.
import pandas as pd
x = '2018-12-11'
pd.to_datetime(x) # Timestamp('2018-12-11 00:00:00')
if isinstance(yourVariable,datetime.datetime):
print("it's a date")
I would try converting the string representation of what I suspect to be a datetime into a datetime object, using the parse function from dateutil.parser.
https://chrisalbon.com/python/basics/strings_to_datetime/

How to use matplotlib to plot line charts

I use pandas to read my csv file and turn two columns into arrays as independent/dependent variables respectively.
the data reading, array-turning trans and value assign
Then when I want to use matplotlib.pyplot to plot the line charts out, it turns out that 'numpy.ndarray' objects has no attribute 'find'.
import numpy as np
import matplotlib.pyplot as plt
plt.plot(x,y)
The problem is probably with your dtypes, assuming your data are in df check the df.dtypes. Columns you are trying to plot must be numeric (float, int, bool).
I guess that at least one of the columns you are plotting has object dtype, try to find out why (maybe missing values were read as some sort of string, or everything is just considered string) and convert it to correct type with astype, i.e.
df['float_col'] = df['float_col'].astype(np.float64)
Edit:
If you are trying to plot date use, make sure that dtype is actually a date i.e. datetime64[ns] and use matplotlibs dedicated method plot_date

Using statsmodels.seasonal_decompose() without DatetimeIndex but with Known Frequency

I have a time-series signal I would like to decompose in Python, so I turned to statsmodels.seasonal_decompose(). My data has frequency of 48 (half-hourly). I was getting the same error as this questioner, where the solution was to change from an Int index to a DatetimeIndex. But I don't know the actual dates/times my data is from.
In this github thread, one of the statsmodels contributors says that
"In 0.8, you should be able to specify freq as keyword argument to
override the index."
But this seems not to be the case for me. Here is a minimal code example illustrating my issue:
import statsmodels.api as sm
dta = pd.Series([x%3 for x in range(100)])
decomposed = sm.tsa.seasonal_decompose(dta, freq=3)
AttributeError: 'RangeIndex' object has no attribute 'inferred_freq'
Version info:
import statsmodels
print(statsmodels.__version__)
0.8.0
Is there a way to decompose a time-series in statsmodels with a specified frequency but without a DatetimeIndex?
If not, is there a preferred alternative for doing this in Python? I checked out the Seasonal package, but its github lists 0 downloads/month, one contributor, and last commit 9 months ago, so I'm not sure I want to rely on that for my project.
Thanks to josef-pkt for answering this on github. There is a bug in statsmodels 0.8.0 where it always attempts to calculate an inferred frequency based on a DatetimeIndex, if passed a Pandas object.
The workaround when using Pandas series is to pass their values in a numpy array to seasonal_decompose(). For example:
import statsmodels.api as sm
my_pandas_series = pd.Series([x%3 for x in range(100)])
decomposed = sm.tsa.seasonal_decompose(my_pandas_series.values, freq=3)
(no errors)

What is the maximum timestamp numpy.datetime64 can handle?

I'm trying to convert datetime to numpy.datetime64 but the following case fails:
>>> import numpy as np
>>> from datetime import datetime
>>> np.datetime64(datetime.max)
OSError: Failed to use 'localtime_s' to convert to a local time
I presume that datetime64 can't handle such far-dated timestamps.
So what is the maximum timestamp that datetime64 can handle?
Depends on what the specified unit of your np.datetime64 object is (according to the numpy docs). Since you have given a timestamp with microseconds the allowed range is [290301 BC, 294241 AD].
This answered your question but I think the unspoken other question is why it throws an Exception:
I'm facing the same error (using Windows) and I tried a=np.datetime64(datetime.max) which works. Therefore I suspect the problem is NOT the np.datetime64 span (because creating such a datetime works) but that the __repr__ requires the OS in some way and probably the OS limits it in your case. So check what's the maximum localtime of your OS and for every datetime after that you can still work with the np.datetime64 objects but cannot print them on screen. :-)

Categories

Resources