convert timestamp to datetime in python - python

I used pandas read_excel to load some time data from excel into python and saved in variable times. For example, times[0] = 2020-12-30, in Excel it's just 2020/12/30.
Now The type of times[0] is pandas._libs.tslibs.timestamps.Timestamp.
How can I convert it into DateTime? And if possible, can I convert it into a nanosecond?

In pandas, the timestamp default value is date and time. Moreover, if you want to get DateTime then use to_datetime and To get nanosecond set unit='ns'
pd.to_datetime(1490195805433502912, unit='ns')
Output
Timestamp('2017-03-22 15:16:45.433502912')
Read more datetime reference and to know more about timedelta, find the link

If you have a series of times
>>> times = pd.Series(pd.date_range('20180310', periods=2))
>>> times
0 2018-03-10
1 2018-03-11
dtype: datetime64[ns]
You can convert the Timestamp entries to datetime using the dt.to_pydatetime() function like so
>>> times.dt.to_pydatetime()
array([datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 11, 0, 0)], dtype=object)
However as per the documentation mentions:
Warning: Python’s datetime uses microsecond resolution, which is lower than pandas (nanosecond). The values are truncated.

Related

Date Time Format Unknown [duplicate]

This question already has answers here:
How to convert integer timestamp into a datetime
(3 answers)
Closed last year.
I am trying to figure out a time format used by someone else. In addition to the date, I have time with an example being the following:
1641859200000
I cant seem to figure out what time or date time format this is. It cannot be HHMMSS, because in this example the second is 85, which is not possible. Any idea what format this is, and how I can convert it using Python to HH:MM:SS ?
Thank you :)
You have a timestamp in seconds from Epoch since January 1, 1970 (midnight UTC/GMT). To convert to a datetime, use:
from datetime import datetime
print(datetime.fromtimestamp(1641859200000 / 1000))
# Output
2022-01-11 01:00:00
Note: you have to divide by 1000 because this timestamp contains milliseconds and Epoch should be in seconds.
This is a Unix-Timestamp.
You can convert in a human-readable format like this:
from datetime import datetime
timestamp = 1641859200000/1000
dt = datetime.fromtimestamp(timestamp)
print(dt)
Edit: didn't check the actual timestamp, whyever, this has to be divided by 1000 as done in the othe answer.
This probably is a unix timestamp https://en.wikipedia.org/wiki/Unix_time. The factor 1000 stems from a millisecond representation I think. Depends on, from where you got this stamp.
You can convert it using:
>>> datetime.datetime.fromtimestamp(1641859200000/1000)
datetime.datetime(2022, 1, 11, 1, 0)
Take a look at dateparser https://dateparser.readthedocs.io/.
It will help you figure out what the date and time is based on the range of input date strings:
pip install dateparser
>>> import dateparser
>>> dateparser.parse('1641859200000')
datetime.datetime(2022, 1, 11, 1, 0)
Your timestamp is miliseconds since Unix epoch in this case but if you ever run into similar problem dateparser could help you.
Regarding the second part of the question. Convertion to HH:MM:SS format
>> dt = datetime.datetime(2022, 1, 11, 1, 0)
>> dt.strftime("%H:%M:%S")
'01:00:00'
Additional info: Available Format Codes

How to convert strings to TimeStamps for compare?

I have strings like:
first = '2018-09-16 15:00:00'
second = '1900-01-01 09:45:55.500597'
I want to compare them.
All methods I found like Convert string date to timestamp in Python requires to know the format of the string.
I don't know the format of the strings (see differences between first and second) all I know is that they can be converted to timestamps.
How can I convert them in order to compare them?
Edit:
The "largest" string that I can get is:
1900-01-01 09:45:55.500597
but I can also get:
1900-01-01
1900-01-01 09:45
1900-01-01 09:45:55
etc..
It's always YYYY-MM-DD HH-MM....
You can use pandas.to_datetime. It offers a lot of flexibility in the string timestamp format, and you can use it on single strings or list/series/tuples of timestamps.
>>> import pandas as pd
>>> day = pd.to_datetime('1900-01-01')
>>> minute = pd.to_datetime('1900-01-01 09:45')
>>> second = pd.to_datetime('1900-01-01 09:45:55')
>>> subsecond = pd.to_datetime('1900-01-01 09:45:55.500597')
>>> assert subsecond > second
>>> assert minute < second
>>> assert day < minute
You can use the dateutil module (pip install python-dateutil):
>>> from dateutil.parser import parse
>>> parse('2018-09-16 15:00:00')
datetime.datetime(2018, 9, 16, 15, 0)
>>> parse('1900-01-01 09:45:55.500597')
datetime.datetime(1900, 1, 1, 9, 45, 55, 500597)
From the list of its features:
Generic parsing of dates in almost any string format;
Once you have the datetime objects, you can compare them directly, there's no need to calculate the timestamps.

Pandas datetime64 with longer range

I have a DataFrame with datetime values spanning from year 1 to way into future. When I try to import the data into pandas the dtype gets set to object although I would like it to be datetime64 to use the .dt accessor.
Consider this piece of code:
import pytz
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'dates': [datetime(108, 7, 30, 9, 25, 27, tzinfo=pytz.utc),
datetime(2018, 3, 20, 9, 25, 27, tzinfo=pytz.utc),
datetime(2529, 7, 30, 9, 25, 27, tzinfo=pytz.utc)]})
In [5]: df.dates
Out[5]:
0 0108-07-30 09:25:27+00:00
1 2018-03-20 09:25:27+00:00
2 2529-07-30 09:25:27+00:00
Name: dates, dtype: object
How can I convert it to dtype datetime64[s]? I don't really care about nano/millisecond accuracy, but I would like the range.
Pandas can generally convert to and from datetime.datetime objects:
df.dates = pd.to_datetime(df.dates)
But in your case, you can't do this, for two reasons.
First, while Pandas can convert to and from datetime.datetime, it can't handle tz-aware datetimes, and you've imbued yours with a timezone. Fortunately, this one is easy to fix—you're explicitly using UTC, and you can do that without aware objects.
Second, 64-bit nanoseconds can't handle a date range as wide as you want:
>>> (1<<64) / / 1000000000 / 3600 / 24 / 365.2425
584.5540492538555
And the Pandas documentation makes this clear:
Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years:
In [66]: pd.Timestamp.min
Out[66]: Timestamp('1677-09-21 00:12:43.145225')
In [67]: pd.Timestamp.max
Out[67]: Timestamp('2262-04-11 23:47:16.854775807')
(It looks like they put the 0 point at the Unix epoch, which makes sense.)
But notice that the documentation links to Representing Out-of-Bounds Spans: you can use Periods, which will be less efficient and convenient than int64s, but probably more so than objects. (I believe the internal storage ends up being YYYYMMDD-style strings, but they're stored as fixed-length strings directly in the array, instead of as references to Python objects on the heap.)

Python Numpy Loadtxt - Convert unix timestamp

I have a text file with many rows of data - the first piece of data in each row is a unix timestamp such as 1436472000. I am using numpy.loadtxt and in the parameters for converters I want to specify for it to convert the timestamp into whatever numpy understands as a date time. I know this needs to go after the 0: in the curly brackets, but I can't work out how to convert it. I know a converter can be used from matplotlib.dates.strpdate2num for normal dates, but I this won't work for unix timestamps.
Code:
timestamp, closep, highp, lowp, openp, volume = np.loadtxt(fileName,delimiter=",",unpack=True,converters={ 0: })
Thanks for help in advance, please ask if you would like me to clarify what I mean.
While converters can be convenient, they are slow because they are called once for each row of data. It is faster to convert the data after the timestamps are loaded into a NumPy array of integers:
x = np.array([1436472000, 1436472001])
x = np.asarray(x, dtype='datetime64[s]')
yields an array of NumPy datetime64s:
array(['2015-07-09T16:00:00-0400', '2015-07-09T16:00:01-0400'],
dtype='datetime64[s]')
To obtain Python datetime.datetimes use tolist():
>>> x.tolist()
# [datetime.datetime(2015, 7, 9, 20, 0),
# datetime.datetime(2015, 7, 9, 20, 0, 1)]
As you know, matplotlib datenums count the number of days since 0001-01-01
00:00:00 UTC, plus one. These are not timestamps (which count seconds since the
Epoch, 1970-01-01 00:00:00 UTC):
>>> matplotlib.dates.date2num(x.tolist())
# array([ 735788.83333333, 735788.83334491])

Python: datetime64 issues with range

I am trying to have a vector of seconds between two time intervals:
import numpy as np
import pandas as pd
date="2011-01-10"
start=np.datetime64(date+'T09:30:00')
end=np.datetime64(date+'T16:00:00')
range = pd.date_range(start, end, freq='S')
For some reason when I print range I get:
[2011-01-10 17:30:00, ..., 2011-01-11 00:00:00]
So the length is 23401 which is what I want but definitely not the correct time interval. Why is that?
Also, if I have a DataFrame df with a column of datetime64 format that looks like:
Time
15:59:57.887529007
15:59:57.805383290
Once I solved the problem above, will I be able to do the following:
data = df.reindex(df.Time + range) data = data.ffill() ??
I need to do the exact steps proposed here except with datetime64 format. Is it possible?
It seems that pandas date_range is dropping the timezone (looks like a bug, I think it's already filed...), you can use Timestamp rather than datetime64 to workaround this:
In [11]: start = pd.Timestamp(date+'T09:30:00')
In [12]: end = pd.Timestamp(date+'T16:00:00')
In [13]: pd.date_range(start, end, freq='S')
Out[13]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-10 09:30:00, ..., 2011-01-10 16:00:00]
Length: 23401, Freq: S, Timezone: None
Note: To see it's a timezone, you're in UTC-8, and 14:00 + 8:00 == 00:00 (the next day).
Is it because when you specify the datetime as a string, numpy assumes it is in localtime and converts it to UTC.
Specifying the time offset gives the correct interval though the interval is in UTC
start=np.datetime64(date+'T09:30:00+0000')
end=np.datetime64(date+'T16:00:00+0000')
range=pd.date_range(start,end,freq='S')
Or using a datetime.datetime object as the start and end and again the interval here is in UTC
import datetime
start = datetime.datetime(2011, 1, 10, 9, 30, 0)
end = datetime.datetime(2011, 1, 10, 16, 0, 0)
range=pd.date_range(start,end,freq='S')

Categories

Resources