converting timestamps into plot-able array of values - python

I have an array of time values = [hh:mm:ss] with seconds as decimals like 13.80 seconds, 15.90 seconds and so on. What I am trying to do:
import time
for i in timestamp_array:
new_time = time.strptime(i,"%H:%M:%S")
I get the error:
ValueError: unconverted data remains: .80
How do I deal with this?
Thank you!

Since you are going to plot the values, I will suggest using matplotlib.dates, you can convert time to numbers and backwards as well.
In [12]:
import matplotlib.dates as mpd
mpd.datestr2num('12:23:12')
Out[12]:
735420.5161111112
In [13]:
mpd.num2date(735420.5161111112)
Out[13]:
datetime.datetime(2014, 7, 6, 12, 23, 12, tzinfo=<matplotlib.dates._UTC object at 0x051FD9F0>)
An minimal example:
plt.plot([mpd.datestr2num('12:23:12.89'), mpd.datestr2num('12:23:13.89')],
[1,2], 'o')
ax=plt.gca()
ax.xaxis.set_major_locator(mpd.HourLocator())
ax.xaxis.set_major_formatter(mpd.DateFormatter('%H:%M:%S.%f'))

Related

plot only the time from datetime objects in matplot lib

I have a list of date and time values with the format '2019-08-24 08:57:18.550' for example. I have successfully converted them into numbers that matplotlib understands using datetime with the code matplotlib.dates.date2num(points) however I am having trouble getting matplotlib to plot only the time, not the associated date.
The graph it creates has tick marks with labels such as 08-24 12 which I assume has the format "month-date hour". I would like it to only plot the time, ideally with the format "hour:minute" or something along those lines. How do I get matplotlib to do this?
If I understood correctly, and it is the current date/time that you are looking for, then:
>>> from datetime import datetime
>>> current_time = datetime.now()
>>> current_time
datetime.datetime(2020, 5, 18, 22, 4, 41, 425538)
#################(year, month, day, hour, minute, second, microsecond)
You can then format it (this is what I didn't quite understand what you were asking), but if you wanted hour:minute format then:
from datetime import datetime
time = datetime.now()
hour = time.hour
minute = time.minute
print(f"{hour}:{minute}")
You should note that datetime.datetime(2020, 5, 18, 22, 4, 41, 425538) was not iterable.

Pandas datetime64 with longer range

I have a DataFrame with datetime values spanning from year 1 to way into future. When I try to import the data into pandas the dtype gets set to object although I would like it to be datetime64 to use the .dt accessor.
Consider this piece of code:
import pytz
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'dates': [datetime(108, 7, 30, 9, 25, 27, tzinfo=pytz.utc),
datetime(2018, 3, 20, 9, 25, 27, tzinfo=pytz.utc),
datetime(2529, 7, 30, 9, 25, 27, tzinfo=pytz.utc)]})
In [5]: df.dates
Out[5]:
0 0108-07-30 09:25:27+00:00
1 2018-03-20 09:25:27+00:00
2 2529-07-30 09:25:27+00:00
Name: dates, dtype: object
How can I convert it to dtype datetime64[s]? I don't really care about nano/millisecond accuracy, but I would like the range.
Pandas can generally convert to and from datetime.datetime objects:
df.dates = pd.to_datetime(df.dates)
But in your case, you can't do this, for two reasons.
First, while Pandas can convert to and from datetime.datetime, it can't handle tz-aware datetimes, and you've imbued yours with a timezone. Fortunately, this one is easy to fix—you're explicitly using UTC, and you can do that without aware objects.
Second, 64-bit nanoseconds can't handle a date range as wide as you want:
>>> (1<<64) / / 1000000000 / 3600 / 24 / 365.2425
584.5540492538555
And the Pandas documentation makes this clear:
Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years:
In [66]: pd.Timestamp.min
Out[66]: Timestamp('1677-09-21 00:12:43.145225')
In [67]: pd.Timestamp.max
Out[67]: Timestamp('2262-04-11 23:47:16.854775807')
(It looks like they put the 0 point at the Unix epoch, which makes sense.)
But notice that the documentation links to Representing Out-of-Bounds Spans: you can use Periods, which will be less efficient and convenient than int64s, but probably more so than objects. (I believe the internal storage ends up being YYYYMMDD-style strings, but they're stored as fixed-length strings directly in the array, instead of as references to Python objects on the heap.)

ValueError: Setting void-array with object members using buffer. Plotting a timeseries of an numpy array

I have 2 numpy arrays with (time and date), and the third with rain. At the end I would like to plot all the info at a xy-plot with matplotlib!
This i what I got so far
import os
import time
from datetime import datetime
import time
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
date = np.array(["01.06.2015", "01.06.2015", "01.06.2015"], dtype=object)
time = np.array(["12:23:00", "14:54:00", "14:56:00"], dtype=object)
# Rain
rain = np.array([2.544, 1.072, 1.735]
# Calculations to make one array of time and date,
# called timestamp
A = np.vstack((date, time))
A_transp = A.transpose()
A_transp.shape
A_transp.type
So at the end as mentioned I would like to have an (x,y)-Plot, with timestamps(so time and date combined as an array of floating point numbers and the rain on the other axes.
Thank you for your help
Markus
Thank you for your help, but I do not come to a conclusion!
Further stepps I did!
# Get a new .out file, to get a time tuple
# see strptime.
# Finally I would like to make a floating point number out of the
# timetuple, to plot the hole thing!
#
mydata = np.savetxt('A_transp.out', A_transp
,fmt="%s")
# Dateconv
dateconv = lambda s: datetime.strptime(s, '%d.%m.%Y %H:%M:%S')
# ColNames
col_names = ["Timestamp"]
# DataTypes
dtypes = ["object"]
# Read in the new file
mydata_next = np.genfromtxt('A_transp.out', delimiter=None,
names=col_names, dtype=dtypes, converters={"Timestamp":dateconv})
So after the np.genfromtxt following error message appears
Traceback (most recent call last):
File "parsivel.py", line 155, in <module>
names=col_names, dtype=dtypes, converters={"Timestamp":dateconv})
File "/home/unix/anaconda2/lib/python2.7/site-
packages/numpy/lib/npyio.py", line 1867, in genfromtxt
output = np.array(data, dtype)
ValueError: Setting void-array with object members using buffer.
What I would try after that would be the following.
#B = mdates.strpdate2num(mydata_next) # fail
#B = time.mktime(mydata_next) # fail
#B = plt.dates.date2num(mydata_next) # fail
And finally I would like to plot the following
# Plot
# Fail
#plt.plot_date(mydata_next, rain)
#plt.show()
But at the moment all the plots fail, because I can not make a time tuple out of A_transp! Maybe also the strptime function is not right here, or there is another way as the detour via np.savetxt and the try of rearanging A_transp?
Starting from your original date and time arrays, you can obtain a date-time string representation in a single array just by adding them:
In[61]: date_time = date + time
In[62]: date_time
Out[62]: array(['01.06.201512:23:00', '01.06.201514:54:00', '01.06.201514:56:00'], dtype=object)
Now you can convert the date-time strings into datetime format. For example:
In[63]: date_time2 = [datetime.strptime(d, '%d.%m.%Y%H:%M:%S') for d in date_time]
In[64]: date_time2
Out[64]:
[datetime.datetime(2015, 6, 1, 12, 23),
datetime.datetime(2015, 6, 1, 14, 54),
datetime.datetime(2015, 6, 1, 14, 56)]
And that's all you need to plot your data with:
plt.plot_date(date_time2, rain)

Python Numpy Loadtxt - Convert unix timestamp

I have a text file with many rows of data - the first piece of data in each row is a unix timestamp such as 1436472000. I am using numpy.loadtxt and in the parameters for converters I want to specify for it to convert the timestamp into whatever numpy understands as a date time. I know this needs to go after the 0: in the curly brackets, but I can't work out how to convert it. I know a converter can be used from matplotlib.dates.strpdate2num for normal dates, but I this won't work for unix timestamps.
Code:
timestamp, closep, highp, lowp, openp, volume = np.loadtxt(fileName,delimiter=",",unpack=True,converters={ 0: })
Thanks for help in advance, please ask if you would like me to clarify what I mean.
While converters can be convenient, they are slow because they are called once for each row of data. It is faster to convert the data after the timestamps are loaded into a NumPy array of integers:
x = np.array([1436472000, 1436472001])
x = np.asarray(x, dtype='datetime64[s]')
yields an array of NumPy datetime64s:
array(['2015-07-09T16:00:00-0400', '2015-07-09T16:00:01-0400'],
dtype='datetime64[s]')
To obtain Python datetime.datetimes use tolist():
>>> x.tolist()
# [datetime.datetime(2015, 7, 9, 20, 0),
# datetime.datetime(2015, 7, 9, 20, 0, 1)]
As you know, matplotlib datenums count the number of days since 0001-01-01
00:00:00 UTC, plus one. These are not timestamps (which count seconds since the
Epoch, 1970-01-01 00:00:00 UTC):
>>> matplotlib.dates.date2num(x.tolist())
# array([ 735788.83333333, 735788.83334491])

Python: datetime64 issues with range

I am trying to have a vector of seconds between two time intervals:
import numpy as np
import pandas as pd
date="2011-01-10"
start=np.datetime64(date+'T09:30:00')
end=np.datetime64(date+'T16:00:00')
range = pd.date_range(start, end, freq='S')
For some reason when I print range I get:
[2011-01-10 17:30:00, ..., 2011-01-11 00:00:00]
So the length is 23401 which is what I want but definitely not the correct time interval. Why is that?
Also, if I have a DataFrame df with a column of datetime64 format that looks like:
Time
15:59:57.887529007
15:59:57.805383290
Once I solved the problem above, will I be able to do the following:
data = df.reindex(df.Time + range) data = data.ffill() ??
I need to do the exact steps proposed here except with datetime64 format. Is it possible?
It seems that pandas date_range is dropping the timezone (looks like a bug, I think it's already filed...), you can use Timestamp rather than datetime64 to workaround this:
In [11]: start = pd.Timestamp(date+'T09:30:00')
In [12]: end = pd.Timestamp(date+'T16:00:00')
In [13]: pd.date_range(start, end, freq='S')
Out[13]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-10 09:30:00, ..., 2011-01-10 16:00:00]
Length: 23401, Freq: S, Timezone: None
Note: To see it's a timezone, you're in UTC-8, and 14:00 + 8:00 == 00:00 (the next day).
Is it because when you specify the datetime as a string, numpy assumes it is in localtime and converts it to UTC.
Specifying the time offset gives the correct interval though the interval is in UTC
start=np.datetime64(date+'T09:30:00+0000')
end=np.datetime64(date+'T16:00:00+0000')
range=pd.date_range(start,end,freq='S')
Or using a datetime.datetime object as the start and end and again the interval here is in UTC
import datetime
start = datetime.datetime(2011, 1, 10, 9, 30, 0)
end = datetime.datetime(2011, 1, 10, 16, 0, 0)
range=pd.date_range(start,end,freq='S')

Categories

Resources