Python Numpy Loadtxt - Convert unix timestamp - python

I have a text file with many rows of data - the first piece of data in each row is a unix timestamp such as 1436472000. I am using numpy.loadtxt and in the parameters for converters I want to specify for it to convert the timestamp into whatever numpy understands as a date time. I know this needs to go after the 0: in the curly brackets, but I can't work out how to convert it. I know a converter can be used from matplotlib.dates.strpdate2num for normal dates, but I this won't work for unix timestamps.
Code:
timestamp, closep, highp, lowp, openp, volume = np.loadtxt(fileName,delimiter=",",unpack=True,converters={ 0: })
Thanks for help in advance, please ask if you would like me to clarify what I mean.

While converters can be convenient, they are slow because they are called once for each row of data. It is faster to convert the data after the timestamps are loaded into a NumPy array of integers:
x = np.array([1436472000, 1436472001])
x = np.asarray(x, dtype='datetime64[s]')
yields an array of NumPy datetime64s:
array(['2015-07-09T16:00:00-0400', '2015-07-09T16:00:01-0400'],
dtype='datetime64[s]')
To obtain Python datetime.datetimes use tolist():
>>> x.tolist()
# [datetime.datetime(2015, 7, 9, 20, 0),
# datetime.datetime(2015, 7, 9, 20, 0, 1)]
As you know, matplotlib datenums count the number of days since 0001-01-01
00:00:00 UTC, plus one. These are not timestamps (which count seconds since the
Epoch, 1970-01-01 00:00:00 UTC):
>>> matplotlib.dates.date2num(x.tolist())
# array([ 735788.83333333, 735788.83334491])

Related

convert timestamp to datetime in python

I used pandas read_excel to load some time data from excel into python and saved in variable times. For example, times[0] = 2020-12-30, in Excel it's just 2020/12/30.
Now The type of times[0] is pandas._libs.tslibs.timestamps.Timestamp.
How can I convert it into DateTime? And if possible, can I convert it into a nanosecond?
In pandas, the timestamp default value is date and time. Moreover, if you want to get DateTime then use to_datetime and To get nanosecond set unit='ns'
pd.to_datetime(1490195805433502912, unit='ns')
Output
Timestamp('2017-03-22 15:16:45.433502912')
Read more datetime reference and to know more about timedelta, find the link
If you have a series of times
>>> times = pd.Series(pd.date_range('20180310', periods=2))
>>> times
0 2018-03-10
1 2018-03-11
dtype: datetime64[ns]
You can convert the Timestamp entries to datetime using the dt.to_pydatetime() function like so
>>> times.dt.to_pydatetime()
array([datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 11, 0, 0)], dtype=object)
However as per the documentation mentions:
Warning: Python’s datetime uses microsecond resolution, which is lower than pandas (nanosecond). The values are truncated.

How to convert strings to TimeStamps for compare?

I have strings like:
first = '2018-09-16 15:00:00'
second = '1900-01-01 09:45:55.500597'
I want to compare them.
All methods I found like Convert string date to timestamp in Python requires to know the format of the string.
I don't know the format of the strings (see differences between first and second) all I know is that they can be converted to timestamps.
How can I convert them in order to compare them?
Edit:
The "largest" string that I can get is:
1900-01-01 09:45:55.500597
but I can also get:
1900-01-01
1900-01-01 09:45
1900-01-01 09:45:55
etc..
It's always YYYY-MM-DD HH-MM....
You can use pandas.to_datetime. It offers a lot of flexibility in the string timestamp format, and you can use it on single strings or list/series/tuples of timestamps.
>>> import pandas as pd
>>> day = pd.to_datetime('1900-01-01')
>>> minute = pd.to_datetime('1900-01-01 09:45')
>>> second = pd.to_datetime('1900-01-01 09:45:55')
>>> subsecond = pd.to_datetime('1900-01-01 09:45:55.500597')
>>> assert subsecond > second
>>> assert minute < second
>>> assert day < minute
You can use the dateutil module (pip install python-dateutil):
>>> from dateutil.parser import parse
>>> parse('2018-09-16 15:00:00')
datetime.datetime(2018, 9, 16, 15, 0)
>>> parse('1900-01-01 09:45:55.500597')
datetime.datetime(1900, 1, 1, 9, 45, 55, 500597)
From the list of its features:
Generic parsing of dates in almost any string format;
Once you have the datetime objects, you can compare them directly, there's no need to calculate the timestamps.

Pandas datetime64 with longer range

I have a DataFrame with datetime values spanning from year 1 to way into future. When I try to import the data into pandas the dtype gets set to object although I would like it to be datetime64 to use the .dt accessor.
Consider this piece of code:
import pytz
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'dates': [datetime(108, 7, 30, 9, 25, 27, tzinfo=pytz.utc),
datetime(2018, 3, 20, 9, 25, 27, tzinfo=pytz.utc),
datetime(2529, 7, 30, 9, 25, 27, tzinfo=pytz.utc)]})
In [5]: df.dates
Out[5]:
0 0108-07-30 09:25:27+00:00
1 2018-03-20 09:25:27+00:00
2 2529-07-30 09:25:27+00:00
Name: dates, dtype: object
How can I convert it to dtype datetime64[s]? I don't really care about nano/millisecond accuracy, but I would like the range.
Pandas can generally convert to and from datetime.datetime objects:
df.dates = pd.to_datetime(df.dates)
But in your case, you can't do this, for two reasons.
First, while Pandas can convert to and from datetime.datetime, it can't handle tz-aware datetimes, and you've imbued yours with a timezone. Fortunately, this one is easy to fix—you're explicitly using UTC, and you can do that without aware objects.
Second, 64-bit nanoseconds can't handle a date range as wide as you want:
>>> (1<<64) / / 1000000000 / 3600 / 24 / 365.2425
584.5540492538555
And the Pandas documentation makes this clear:
Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years:
In [66]: pd.Timestamp.min
Out[66]: Timestamp('1677-09-21 00:12:43.145225')
In [67]: pd.Timestamp.max
Out[67]: Timestamp('2262-04-11 23:47:16.854775807')
(It looks like they put the 0 point at the Unix epoch, which makes sense.)
But notice that the documentation links to Representing Out-of-Bounds Spans: you can use Periods, which will be less efficient and convenient than int64s, but probably more so than objects. (I believe the internal storage ends up being YYYYMMDD-style strings, but they're stored as fixed-length strings directly in the array, instead of as references to Python objects on the heap.)

converting timestamps into plot-able array of values

I have an array of time values = [hh:mm:ss] with seconds as decimals like 13.80 seconds, 15.90 seconds and so on. What I am trying to do:
import time
for i in timestamp_array:
new_time = time.strptime(i,"%H:%M:%S")
I get the error:
ValueError: unconverted data remains: .80
How do I deal with this?
Thank you!
Since you are going to plot the values, I will suggest using matplotlib.dates, you can convert time to numbers and backwards as well.
In [12]:
import matplotlib.dates as mpd
mpd.datestr2num('12:23:12')
Out[12]:
735420.5161111112
In [13]:
mpd.num2date(735420.5161111112)
Out[13]:
datetime.datetime(2014, 7, 6, 12, 23, 12, tzinfo=<matplotlib.dates._UTC object at 0x051FD9F0>)
An minimal example:
plt.plot([mpd.datestr2num('12:23:12.89'), mpd.datestr2num('12:23:13.89')],
[1,2], 'o')
ax=plt.gca()
ax.xaxis.set_major_locator(mpd.HourLocator())
ax.xaxis.set_major_formatter(mpd.DateFormatter('%H:%M:%S.%f'))

Python: datetime64 issues with range

I am trying to have a vector of seconds between two time intervals:
import numpy as np
import pandas as pd
date="2011-01-10"
start=np.datetime64(date+'T09:30:00')
end=np.datetime64(date+'T16:00:00')
range = pd.date_range(start, end, freq='S')
For some reason when I print range I get:
[2011-01-10 17:30:00, ..., 2011-01-11 00:00:00]
So the length is 23401 which is what I want but definitely not the correct time interval. Why is that?
Also, if I have a DataFrame df with a column of datetime64 format that looks like:
Time
15:59:57.887529007
15:59:57.805383290
Once I solved the problem above, will I be able to do the following:
data = df.reindex(df.Time + range) data = data.ffill() ??
I need to do the exact steps proposed here except with datetime64 format. Is it possible?
It seems that pandas date_range is dropping the timezone (looks like a bug, I think it's already filed...), you can use Timestamp rather than datetime64 to workaround this:
In [11]: start = pd.Timestamp(date+'T09:30:00')
In [12]: end = pd.Timestamp(date+'T16:00:00')
In [13]: pd.date_range(start, end, freq='S')
Out[13]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-10 09:30:00, ..., 2011-01-10 16:00:00]
Length: 23401, Freq: S, Timezone: None
Note: To see it's a timezone, you're in UTC-8, and 14:00 + 8:00 == 00:00 (the next day).
Is it because when you specify the datetime as a string, numpy assumes it is in localtime and converts it to UTC.
Specifying the time offset gives the correct interval though the interval is in UTC
start=np.datetime64(date+'T09:30:00+0000')
end=np.datetime64(date+'T16:00:00+0000')
range=pd.date_range(start,end,freq='S')
Or using a datetime.datetime object as the start and end and again the interval here is in UTC
import datetime
start = datetime.datetime(2011, 1, 10, 9, 30, 0)
end = datetime.datetime(2011, 1, 10, 16, 0, 0)
range=pd.date_range(start,end,freq='S')

Categories

Resources