Pandas to_datetime unexpected UTC return - python

I am trying to set time index to pandas DataFrame using pandas.to_datetime function, but the outcome datetime is UTC when converting seconds, and while it is not requested :
import pandas
import datetime,time
datetime1 = '2017-03-30T12-00-00'
d = datetime.datetime.strptime(datetime1, "%Y-%m-%dT%H-%M-%S")
s = time.mktime(d.timetuple())
print pandas.to_datetime(datetime1, format = "%Y-%m-%dT%H-%M-%S")
print pandas.to_datetime(s, unit='s')
Get two different results, although utc option of pandas.to_datetime is not used in both cases.
Any ideas ?

time.mktime() does not return UTC by default, see the docs for time.mktime():
Its argument is the struct_time or full 9-tuple [...] which expresses the time in local time, not UTC.

Related

Python datetime to Excel serial date conversion

The following code converts a string into a timestamp. The timestamp comes out to: 1646810127.
However, if I use Excel to convert this date and time into a float I get: 44629,34.
I need the Excel's output from the Python script.
I have tried with a few different datetime strings to see if there is any pattern in between the two numbers, but cannot seem to find any.
Any thoughts on how I get the code to output 44629,34?
Much appreciated
import datetime
date_time_str = '2022-03-09 08:15:27'
date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S')
print('Date:', date_time_obj.date())
print('Time:', date_time_obj.time())
print('Date-time:', date_time_obj)
print(date_time_obj.timestamp())
>>output:
Date: 2022-03-09
Time: 08:15:27
Date-time: 2022-03-09 08:15:27
1646810127.0
calculate the timedelta of your datetime object versus Excel's "day zero", then divide the total_seconds of the timedelta by the seconds in a day to get Excel serial date:
import datetime
date_time_str = '2022-03-09 08:15:27'
UTC = datetime.timezone.utc
dt_obj = datetime.datetime.fromisoformat(date_time_str).replace(tzinfo=UTC)
day_zero = datetime.datetime(1899,12,30, tzinfo=UTC)
excel_serial_date = (dt_obj-day_zero).total_seconds()/86400
print(excel_serial_date)
# 44629.3440625
Note: I'm setting time zone to UTC here to avoid any ambiguities - adjust as needed.
Since the question is tagged pandas, you'd do the same thing here, only that you don't need to set UTC as pandas assumes UTC by default for naive datetime:
import pandas as pd
ts = pd.Timestamp('2022-03-09 08:15:27')
excel_serial_date = (ts-pd.Timestamp('1899-12-30')).total_seconds()/86400
print(excel_serial_date)
# 44629.3440625
See also:
background: What is story behind December 30, 1899 as base date?
inverse operation: Convert Excel style date with pandas

Why does a conversion from np.datetime64 to float and back lead to a time difference?

With the following code, I get a two hour difference after converting back to np.datetime64.
How can I avoid this? (if this should be a topic: I am presently in Central Europe)
import pandas as pd
import numpy as np
import datetime
a = np.datetime64('2018-04-01T15:30:00').astype("float")
a
b = np.datetime64(datetime.datetime.fromtimestamp(a))
b
Out[18]: numpy.datetime64('2018-04-01T17:30:00.000000')
The problem is not in the np.datetime64 conversion, but in datetime.datetime.fromtimestamp.
Since Numpy 1.11, np.datetime64 is timezone naive. It no longer assumes that input is in local time, nor does it print local times.
However, datetime.datetime.fromtimestamp does assume local time. From the docs:
Return the local date and time corresponding to the POSIX timestamp, such as is returned by time.time(). If optional argument tz is None or not specified, the timestamp is converted to the platform’s local date and time, and the returned datetime object is naive.
You can use datetime.datetime.utcfromtimestamp instead:
>>> a = np.datetime64('2018-04-01T15:30:00').astype("float")
>>> np.datetime64(datetime.datetime.utcfromtimestamp(a))
numpy.datetime64('2018-04-01T15:30:00.000000')
https://github.com/numpy/numpy/issues/3290
As of 1.7, datetime64 attempts to handle timezones by:
Assuming all datetime64 objects are in UTC
Applying timezone offsets when parsing ISO 8601 strings
Applying the Locale timezone offset when the ISO string does not specify a TZ.
Applying the Locale timezone offset when printing, etc.
https://stackoverflow.com/a/18817656/7583612
classmethod datetime.fromtimestamp(timestamp, tz=None)
Return the local date and time corresponding to the POSIX timestamp,
such as is returned by time.time(). If optional argument tz is None or
not specified, the timestamp is converted to the platform’s local date
and time, and the returned datetime object is naive.
Else tz must be an instance of a class tzinfo subclass, and the
timestamp is converted to tz‘s time zone. In this case the result is
equivalent to
tz.fromutc(datetime.utcfromtimestamp(timestamp).replace(tzinfo=tz))
Referring back to some of my notes, I found the following:
import numpy
dt64 = numpy.datetime64( "2011-11-11 14:23:56" )
# dt64 is internally just some sort of int
# it has no fields, and very little support in numpy
import datetime, time
dtdt = dt64.astype(datetime.datetime) # <<<<<<<< use this!
dtdt.year
dtdt.month
dtdt.day
# to convert back:
dt64 = np.datetime64(dtdt) # <<<<<<<< use this too!
dt64.item().strftime("%Y%b%d")
The modules datetime and time are normal python modules: they work reasonably well, have lots of fields, conversions, and support.
datetime64 is an incompletely implemented subtype built into numpy. It's just some sort of 64-bit int (?) (seconds since 1970 perhaps?). datetime64 is something completely different from a datetime.datetime . If you convert a datetime64 to a float and back, you are losing lots of precision (bits) -- hence the errors.
The (not part of numpy) module datetime can also do things like:
# timedelta()
delta = datetime.timedelta(days=11, hours=10, minutes=9, seconds=8)
delta # datetime.timedelta(11, 36548) # (days,seconds)
delta.days
delta.seconds
delta.microseconds
delta.total_seconds() # 986948.0
# arithmetic: +-*/
# 2 timedelta's
# timedelta and datetime
now = datetime.datetime.now()
christmas = datetime.datetime(2019,12,25)
delta = christmas - now
So let numpy sometimes store your date-data as datetime64, but I would recommend the not-numpy module datetime to work on datetime-arithmetic.

Converting to_datetime but keeping original time

I am trying to convert string to Datetime- but the conversion adds 5 hours to the original time. How do I convert but keep the time as is?
>>> import pandas as pd
>>> t = pd.to_datetime("2016-09-21 08:56:29-05:00", format='%Y-%m-%d %H:%M:%S')
>>> t
Timestamp('2016-09-21 13:56:29')
The conversion doesn't add 5 hours to the original time. Pandas just detects that your datetime is timezone-aware and converts it to naive UTC. But it's still the same datetime.
If you want a localized Timestamp instance, use Timestamp.tz_localize() to make t a timezone-aware UTC timestamp, and then use the Timestamp.tz_convert() method to convert to UTC-0500:
>>> import pandas as pd
>>> import pytz
>>> t = pd.to_datetime("2016-09-21 08:56:29-05:00", format='%Y-%m-%d %H:%M:%S')
>>> t
Timestamp('2016-09-21 13:56:29')
>>> t.tz_localize(pytz.utc).tz_convert(pytz.timezone('America/Chicago'))
Timestamp('2016-09-21 08:56:29-0500', tz='America/Chicago')
To achieve what you want you can remove the "-5:00" from the end of your time string "2016-09-21 08:56:29-05:00"
However, Erik Cederstrand is correct in explaining that pandas is not modifying the time, it's simply displaying it in a different format.

Python - How to convert datetime data using toordinal considering the time

Let's assume that I have the following data:
25/01/2000 05:50
When I convert it using datetime.toordinal, it returns this value:
730144
That's nice, but this value just considers the date itself. I also want it to consider the hour and minutes (05:50). How can I do it using datetime?
EDIT:
I want to convert a whole Pandas Series.
An ordinal date is by definition only considering the year and day of year, i.e. its resolution is 1 day.
You can get the microseconds / milliseconds (depending on your platform) from epoch using
datetime.datetime.strptime('25/01/2000 05:50', '%d/%m/%Y %H:%M').timestamp()
for a pandas series you can do
s = pd.Series(['25/01/2000 05:50', '25/01/2000 05:50', '25/01/2000 05:50'])
s = pd.to_datetime(s) # make sure you're dealing with datetime instances
s.apply(lambda v: v.timestamp())
If you use python 3.x. You can get date with time in seconds from 1/1/1970 00:00
from datetime import datetime
dt = datetime.today() # Get timezone naive now
seconds = dt.timestamp()

How to set a variable to be "Today's" date in Python/Pandas

I am trying to set a variable to equal today's date.
I looked this up and found a related article:
Set today date as default value in the model
However, this didn't particularly answer my question.
I used the suggested:
dt.date.today
But after
import datetime as dt
date = dt.date.today
print date
<built-in method today of type object at 0x000000001E2658B0>
Df['Date'] = date
I didn't get what I actually wanted which as a clean date format of today's date...in Month/Day/Year.
How can I create a variable of today's day in order for me to input that variable in a DataFrame?
You mention you are using Pandas (in your title). If so, there is no need to use an external library, you can just use to_datetime
>>> pandas.to_datetime('today').normalize()
Timestamp('2015-10-14 00:00:00')
This will always return today's date at midnight, irrespective of the actual time, and can be directly used in pandas to do comparisons etc. Pandas always includes 00:00:00 in its datetimes.
Replacing today with now would give you the date in UTC instead of local time; note that in neither case is the tzinfo (timezone) added.
In pandas versions prior to 0.23.x, normalize may not have been necessary to remove the non-midnight timestamp.
If you want a string mm/dd/yyyy instead of the datetime object, you can use strftime (string format time):
>>> dt.datetime.today().strftime("%m/%d/%Y")
# ^ note parentheses
'02/12/2014'
Using pandas: pd.Timestamp("today").strftime("%m/%d/%Y")
pd.datetime.now().strftime("%d/%m/%Y")
this will give output as '11/02/2019'
you can use add time if you want
pd.datetime.now().strftime("%d/%m/%Y %I:%M:%S")
this will give output as '11/02/2019 11:08:26'
strftime formats
You can also look into pandas.Timestamp, which includes methods like .now and .today.
Unlike pandas.to_datetime('now'), pandas.Timestamp.now() won't default to UTC:
import pandas as pd
pd.Timestamp.now() # will return California time
# Timestamp('2018-12-19 09:17:07.693648')
pd.to_datetime('now') # will return UTC time
# Timestamp('2018-12-19 17:17:08')
i got the same problem so tried so many things
but finally this is the solution.
import time
print (time.strftime("%d/%m/%Y"))
simply just use pd.Timestamp.now()
for example:
input: pd.Timestamp.now()
output: Timestamp('2022-01-12 14:43:05.521896')
I know all you want is Timestamp('2022-01-12') you don't anything after
thus we could use replace to remove hour, minutes , second and microsecond
here:
input: pd.Timestamp.now().replace(hour=0, minute=0, second=0, microsecond=0)
output: Timestamp('2022-01-12 00:00:00')
but looks too complicated right, here is a simple way use normalize
input: pd.Timestamp.now().normalize()
output: Timestamp('2022-01-12 00:00:00')
Easy solution in Python3+:
import time
todaysdate = time.strftime("%d/%m/%Y")
#with '.' isntead of '/'
todaysdate = time.strftime("%d.%m.%Y")
import datetime
def today_date():
'''
utils:
get the datetime of today
'''
date=datetime.datetime.now().date()
date=pd.to_datetime(date)
return date
Df['Date'] = today_date()
this could be safely used in pandas dataframes.
There are already quite a few good answers, but to answer the more general question about "any" period:
Use the function for time periods in pandas. For Day, use 'D', for month 'M' etc.:
>pd.Timestamp.now().to_period('D')
Period('2021-03-26', 'D')
>p = pd.Timestamp.now().to_period('D')
>p.to_timestamp().strftime("%Y-%m-%d")
'2021-03-26'
note: If you need to consider UTC, you can use: pd.Timestamp.utcnow().tz_localize(None).to_period('D')...
From your solution that you have you can use:
import pandas as pd
pd.to_datetime(date)
using the date variable that you use

Categories

Resources