Difference between pandas datetime and datetime datetime - python

Hi have some dates in datetime.datetime format that I use to filter a panda dataframe with panda timestamp. I just tried the following and get a 2 hour offset :
from datetime import datetime
import pandas as pd
pd.to_datetime(datetime(2020, 5, 11, 0, 0, 0).timestamp()*1e9)
The output is:
->Timestamp('2020-05-10 22:00:00')
Can anybody explain why this gives a 2 hour offset? I am in Denmark so it corresponds to the offset to GMT. Is this the reason. I can of course just add 2 hours but want to understand why to make the script robust in the future.
Thanks for your help Jesper

pd.to_datetime accepts a datetime object so you could just do (pandas assumes UTC):
pd.to_datetime(datetime(2020, 5, 11))
You are getting a 2 hour offset when converting to a timestamp because by default python's datetime is unaware of timezone and will give you a "naive" datetime object (docs are here: https://docs.python.org/3/library/datetime.html#aware-and-naive-objects). The generated timestamp will be in the local timezone, hence the 2 hour offset.
You can pass in a tzinfo parameter to the datetime object specifying that the time should be treated as UTC:
from datetime import datetime
import pandas as pd
import pytz
pd.to_datetime(datetime(2020, 5, 11, 0, 0, 0, tzinfo=pytz.UTC).timestamp()*1e9)
Alternatively, you can generate a UTC timestamp using the calendar module:
from datetime import datetime
import pandas as pd
import calendar
timestamp = calendar.timegm(datetime(2020, 5, 11, 0, 0, 0).utctimetuple())
pd.to_datetime(timestamp*1e9)

if your datetime objects actually represent local time (i.e. your OS setting), you can simply use
from datetime import datetime
import pandas as pd
t = pd.to_datetime(datetime(2020, 5, 11).astimezone())
# e.g. I'm on CEST, so t is
# Timestamp('2020-05-11 00:00:00+0200', tz='Mitteleuropäische Sommerzeit')
see: How do I get a value of datetime.today() in Python that is “timezone aware”?
Just keep in mind that pandas will treat naive Python datetime objects as if they were UTC:
from datetime import timezone
t1 = pd.to_datetime(datetime(2020, 5, 11, tzinfo=timezone.utc))
t2 = pd.to_datetime(datetime(2020, 5, 11))
t1.timestamp() == t2.timestamp()
# True
see also: Python datetime and pandas give different timestamps for the same date

Related

Strange behavior with pandas timestamp to posix conversion

I do the following operations:
Convert string datetime in pandas dataframe to python datetime via apply(strptime)
Convert datetime to posix timestamp via .timestamp() method
If I revert posix back to datetime with .fromtimestamp() I obtain different datetime
It differs by 3 hours which is my timezone (I'm at UTC+3 now), so I suppose it is a kind of timezone issue. Also I understand that in apply it implicitly converts to pandas.Timestamp, but I don't understand the difference in this case.
What is the reason for such strange behavior and what should I do to avoid it? Actually in my project I need to compare this pandas timestamps with correct poxis timestamps and now it works wrong.
Below is dummy reproducible example:
df = pd.DataFrame(['2018-03-03 14:30:00'], columns=['c'])
df['c'] = df['c'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
dt = df['c'].iloc[0]
dt
>> Timestamp('2018-03-03 14:30:00')
datetime.datetime.fromtimestamp(dt.timestamp())
>> datetime.datetime(2018, 3, 3, 17, 30)
First, I suggest using the np.timedelta64 dtype when working with pandas. In this case it makes the reciprocity simple.
pd.to_datetime('2018-03-03 14:30:00').value
#1520087400000000000
pd.to_datetime(pd.to_datetime('2018-03-03 14:30:00').value)
#Timestamp('2018-03-03 14:30:00')
The issue with the other methods is that POSIX has UTC as the origin, but fromtimestamp returns the local time. If your system isn't UTC compliant, then we get issues. The following methods will work to remedy this:
from datetime import datetime
import pytz
dt
#Timestamp('2018-03-03 14:30:00')
# Seemingly problematic:
datetime.fromtimestamp(dt.timestamp())
#datetime.datetime(2018, 3, 3, 9, 30)
datetime.fromtimestamp(dt.timestamp(), tz=pytz.utc)
#datetime.datetime(2018, 3, 3, 14, 30, tzinfo=<UTC>)
datetime.combine(dt.date(), dt.timetz())
#datetime.datetime(2018, 3, 3, 14, 30)
mytz = pytz.timezone('US/Eastern') # Use your own local timezone
datetime.fromtimestamp(mytz.localize(dt).timestamp())
#datetime.datetime(2018, 3, 3, 14, 30)
An answer with the to_datetime function:
df = pd.DataFrame(['2018-03-03 14:30:00'], columns=['c'])
df['c'] = pd.to_datetime(df['c'].values, dayfirst=False).tz_localize('Your/Timezone')
When working with date, you should always put a timezone it is easier after to work with.
It does not explain the difference between the datetime in pandas and alone.

Conversion between Pandas Timestamp, DateTime, Unix Timestamp with Timezone / without Timezone Info

I created the following test case to understand the conversion of between DateTime to Pandas Timestamp, to Unix Timestamp, and back to DateTime, with and without TimeZone info, with Python 3.6
from unittest import TestCase
import datetime
from datetime import timezone
from pytz import timezone
import time
import pandas as pd
def test_date_with_timestamp_method(self):
hkzone = timezone('Hongkong')
dt_with_tz = datetime.datetime(2017, 9, 24, tzinfo=hkzone)
dt_without_tz = datetime.datetime(2017, 9, 24)
uts_with = dt_with_tz.timestamp()
uts_without = dt_without_tz.timestamp()
self.assertNotEqual(uts_without, uts_with)
pd_with = pd.Timestamp(dt_with_tz)
pd_without = pd.Timestamp(dt_without_tz)
pd_unix_with_tz = pd_with.value // 10 ** 9
pd_unix_without_tz = pd_without.value // 10 ** 9
self.assertEqual(uts_with, pd_unix_with_tz)
self.assertEqual(uts_without, pd_unix_without_tz)
I would like to ask why this assertion failed? The result of this is
AssertionError: 1506182400.0 != 1506211200
# convert back to datetime
pd_dt_with_tz = pd_with.to_pydatetime()
pd_dt_without_tz = pd_without.to_pydatetime()
self.assertEqual(pd_dt_with_tz, dt_with_tz)
self.assertEqual(pd_dt_without_tz, dt_without_tz)
And this line
self.assertEqual(pd_dt_without_tz, dt_without_tz)
will result in this error.
AssertionError: datet[16 chars]7, 9, 24, 0, 46, tzinfo=) != datet[16 chars]7, 9, 24, 0, 0, tzinfo=)
So can I say it is the best practice to always put in back the timezone info to Datetime object before convert it to Timestamp?
Is it possible to make this two assertion success without timezone info?

pandas to_datetime vs datetime fromtimestamp

I have the following code which implies that fromtimestamp and to_datetime works differently:
import pandas as pd
from datetime import datetime
pd.to_datetime(1488286965000,unit='ms')
>>> Timestamp('2017-02-28 13:02:45')
datetime.fromtimestamp(1488286965000/1e3)
>>> datetime.datetime(2017, 3, 1, 0, 2, 45)
When I go to https://www.epochconverter.com/ it seems that the pandas version is correct.
Is there a reason that the datetime version is giving a different answer?
Also is this possibly a bug?

Getting today's date in YYYY-MM-DD in Python?

Is there a nicer way than the following to return today's date in the YYYY-MM-DD format?
str(datetime.datetime.today()).split()[0]
Use strftime:
>>> from datetime import datetime
>>> datetime.today().strftime('%Y-%m-%d')
'2021-01-26'
To also include a zero-padded Hour:Minute:Second at the end:
>>> datetime.today().strftime('%Y-%m-%d %H:%M:%S')
'2021-01-26 16:50:03'
To get the UTC date and time:
>>> datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')
'2021-01-27 00:50:03'
You can use datetime.date.today() and convert the resulting datetime.date object to a string:
from datetime import date
today = str(date.today())
print(today) # '2017-12-26'
I always use the isoformat() method for this.
from datetime import date
today = date.today().isoformat()
print(today) # '2018-12-05'
Note that this also works on datetime objects if you need the time in the standard ISO 8601 format as well.
from datetime import datetime
now = datetime.today().isoformat()
print(now) # '2018-12-05T11:15:55.126382'
Very late answer, but you can simply use:
import time
today = time.strftime("%Y-%m-%d")
# 2023-02-08
Datetime is just lovely if you like remembering funny codes. Wouldn't you prefer simplicity?
>>> import arrow
>>> arrow.now().format('YYYY-MM-DD')
'2017-02-17'
This module is clever enough to understand what you mean.
Just do pip install arrow.
Addendum: In answer to those who become exercised over this answer let me just say that arrow represents one of the alternative approaches to dealing with dates in Python. That's mostly what I meant to suggest.
Are you working with Pandas?
You can use pd.to_datetime from the pandas library. Here are various options, depending on what you want returned.
import pandas as pd
pd.to_datetime('today') # pd.to_datetime('now')
# Timestamp('2019-03-27 00:00:10.958567')
As a python datetime object,
pd.to_datetime('today').to_pydatetime()
# datetime.datetime(2019, 4, 18, 3, 50, 42, 587629)
As a formatted date string,
pd.to_datetime('today').isoformat()
# '2019-04-18T04:03:32.493337'
# Or, `strftime` for custom formats.
pd.to_datetime('today').strftime('%Y-%m-%d')
# '2019-03-27'
To get just the date from the timestamp, call Timestamp.date.
pd.to_datetime('today').date()
# datetime.date(2019, 3, 27)
Aside from to_datetime, you can directly instantiate a Timestamp object using,
pd.Timestamp('today') # pd.Timestamp('now')
# Timestamp('2019-04-18 03:43:33.233093')
pd.Timestamp('today').to_pydatetime()
# datetime.datetime(2019, 4, 18, 3, 53, 46, 220068)
If you want to make your Timestamp timezone aware, pass a timezone to the tz argument.
pd.Timestamp('now', tz='America/Los_Angeles')
# Timestamp('2019-04-18 03:59:02.647819-0700', tz='America/Los_Angeles')
Yet another date parser library: Pendulum
This one's good, I promise.
If you're working with pendulum, there are some interesting choices. You can get the current timestamp using now() or today's date using today().
import pendulum
pendulum.now()
# DateTime(2019, 3, 27, 0, 2, 41, 452264, tzinfo=Timezone('America/Los_Angeles'))
pendulum.today()
# DateTime(2019, 3, 27, 0, 0, 0, tzinfo=Timezone('America/Los_Angeles'))
Additionally, you can also get tomorrow() or yesterday()'s date directly without having to do any additional timedelta arithmetic.
pendulum.yesterday()
# DateTime(2019, 3, 26, 0, 0, 0, tzinfo=Timezone('America/Los_Angeles'))
pendulum.tomorrow()
# DateTime(2019, 3, 28, 0, 0, 0, tzinfo=Timezone('America/Los_Angeles'))
There are various formatting options available.
pendulum.now().to_date_string()
# '2019-03-27'
pendulum.now().to_formatted_date_string()
# 'Mar 27, 2019'
pendulum.now().to_day_datetime_string()
# 'Wed, Mar 27, 2019 12:04 AM'
Rationale for this answer
A lot of pandas users stumble upon this question because they believe it is a python question more than a pandas one. This answer aims to be useful to folks who are already using these libraries and would be interested to know that there are ways to achieve these results within the scope of the library itself.
If you are not working with pandas or pendulum already, I definitely do not recommend installing them just for the sake of running this code! These libraries are heavy and come with a lot of plumbing under the hood. It is not worth the trouble when you can use the standard library instead.
from datetime import datetime
date = datetime.today().date()
print(date)
Use f-strings, they are usually the best choice for any text-variable mix:
from datetime import date
print(f'{date.today():%Y-%m-%d}')
Taken from Python f-string formatting not working with strftime inline which has the official links as well.
If you need e.g. pacific standard time (PST) you can do
from datetime import datetime
import pytz
tz = pytz.timezone('US/Pacific')
datetime.now(tz).strftime('%Y-%m-%d %H:%M:%S')
# '2021-09-02 10:21:41'
my code is a little complicated but I use it a lot
strftime("%y_%m_%d", localtime(time.time()))
reference:'https://strftime.org/
you can look at the reference to make anything you want
for you what YYYY-MM-DD just change my code to:
strftime("%Y-%m-%d", localtime(time.time()))
This works:
from datetime import date
today =date.today()
Output in this time: 2020-08-29
Additional:
this_year = date.today().year
this_month = date.today().month
this_day = date.today().day
print(today)
print(this_year)
print(this_month)
print(this_day)
To get day number from date is in python
for example:19-12-2020(dd-mm-yyy)order_date
we need 19 as output
order['day'] = order['Order_Date'].apply(lambda x: x.day)

Converting timezone-aware datetime to local time in Python

How do you convert a timezone-aware datetime object to the equivalent non-timezone-aware datetime for the local timezone?
My particular application uses Django (although, this is in reality a generic Python question):
import iso8601
....
date_str="2010-10-30T17:21:12Z"
....
d = iso8601.parse_date(date_str)
foo = app.models.FooModel(the_date=d)
foo.save()
This causes Django to throw an error:
raise ValueError("MySQL backend does not support timezone-aware datetimes.")
What I need is:
d = iso8601.parse_date(date_str)
local_d = SOME_FUNCTION(d)
foo = app.models.FooModel(the_date=local_d)
What would SOME_FUNCTION be?
In general, to convert an arbitrary timezone-aware datetime to a naive (local) datetime, I'd use the pytz module and astimezone to convert to local time, and replace to make the datetime naive:
In [76]: import pytz
In [77]: est=pytz.timezone('US/Eastern')
In [78]: d.astimezone(est)
Out[78]: datetime.datetime(2010, 10, 30, 13, 21, 12, tzinfo=<DstTzInfo 'US/Eastern' EDT-1 day, 20:00:00 DST>)
In [79]: d.astimezone(est).replace(tzinfo=None)
Out[79]: datetime.datetime(2010, 10, 30, 13, 21, 12)
But since your particular datetime seems to be in the UTC timezone, you could do this instead:
In [65]: d
Out[65]: datetime.datetime(2010, 10, 30, 17, 21, 12, tzinfo=tzutc())
In [66]: import datetime
In [67]: import calendar
In [68]: datetime.datetime.fromtimestamp(calendar.timegm(d.timetuple()))
Out[68]: datetime.datetime(2010, 10, 30, 13, 21, 12)
By the way, you might be better off storing the datetimes as naive UTC datetimes instead of naive local datetimes. That way, your data is local-time agnostic, and you only convert to local-time or any other timezone when necessary. Sort of analogous to working in unicode as much as possible, and encoding only when necessary.
So if you agree that storing the datetimes in naive UTC is the best way, then all you'd need to do is define:
local_d = d.replace(tzinfo=None)
In recent versions of Django (at least 1.4.1):
from django.utils.timezone import localtime
result = localtime(some_time_object)
A portable robust solution should use the tz database. To get local timezone as pytz tzinfo object, use tzlocal module:
#!/usr/bin/env python
import iso8601
import tzlocal # $ pip install tzlocal
local_timezone = tzlocal.get_localzone()
aware_dt = iso8601.parse_date("2010-10-30T17:21:12Z") # some aware datetime object
naive_local_dt = aware_dt.astimezone(local_timezone).replace(tzinfo=None)
Note: it might be tempting to use something like:
#!/usr/bin/env python3
# ...
naive_local_dt = aware_dt.astimezone().replace(tzinfo=None)
but it may fail if the local timezone has a variable utc offset but python does not use a historical timezone database on a given platform.
Using python-dateutil you can parse the date in iso-8561 format with dateutil.parsrser.parse() that will give you an aware datetime in UTC/Zulu timezone.
Using .astimezone() you can convert it to an aware datetime in another timezone.
Using .replace(tzinfo=None) will convert the aware datetime into a naive datetime.
from datetime import datetime
from dateutil import parser as datetime_parser
from dateutil.tz import tzutc,gettz
aware = datetime_parser.parse('2015-05-20T19:51:35.998931Z').astimezone(gettz("CET"))
naive = aware.replace(tzinfo=None)
In general the best idea is to convert all dates to UTC and store them that way, and convert them back to local as needed. I use aware.astimezone(tzutc()).replace(tzinfo=None) to make sure is in UTC and convert to naive.
I use this helper function all the time.
from datetime import datetime
import pytz
def tz_convert(t: datetime, tz=pytz.utc):
'''
Convert a timestamp to the target timezone.
If the timestamp is naive, the timezone is set to the target timezone.
'''
if not t.tzinfo:
tc = t.replace(tzinfo=tz)
else:
tc = t.astimezone(tz)
return tc
Demo
# tz-aware timestamp
>>> t = datetime.now(tz=pytz.utc)
>>> t.isoformat()
'2022-09-15T08:24:38.093312+00:00'
>>> tc = tz_convert(t, pytz.timezone('est'))
>>> tc.isoformat()
'2022-09-15T03:24:38.093312-05:00'
# tz-naive timestamp
>>> t = datetime.now()
>>> t.isoformat()
'2022-09-15T10:22:41.464200'
>>> tc = tz_convert(t, pytz.timezone('est'))
>>> tc.isoformat()
'2022-09-15T10:22:41.464200-05:00'

Categories

Resources