Strange timestamp conversion - python

I have these dates
2016-02-26 12:12:12
2016-02-friday 12:12:12
(Those two dates refers to the same day)
If I convert the first one in a timestamp and then convert it back in a readable format it works.
But if I try the same on the second one it does not convert back to the right day !
Here's what I did :
sTimestamp = time.mktime(
datetime.datetime.strptime(
"2016-02-26 12:12:12",
"%Y-%m-%d %H:%M:%S")
.timetuple())
print("date from timestamp = " +
datetime.datetime.fromtimestamp(int(sTimestamp))
.strftime('%Y-%m-%d %H:%M:%S'))
sTimestamp = time.mktime(
datetime.datetime.strptime(
"2016-02-friday 12:12:12",
"%Y-%m-%A %H:%M:%S")
.timetuple())
print("date from timestamp = " +
datetime.datetime.fromtimestamp(int(sTimestamp)).
strftime('%Y-%m-%d %H:%M:%S'))
The output of thoses two lines are :
date from timestamp = 2016-02-26 12:12:12
date from timestamp = 2016-02-01 12:12:12
As you can see the first one is back to 26 but the second one converts back to 01 for an unknown reason. And by the way, 01 is a monday...
For information I am using python 3.4 and I am on Windows.

The first problem:
(Those two dates refers to the same day)
No, they don't. The first one refers to the last Friday of February in the year 2016; the second refers to, at best, a Friday in February in the year 2016.
Further, strptime is meant to be used with numbers as strings like "Friday" are not exact. The Python docs say:
For time objects, the format codes for year, month, and day should not be used, as time objects have no such values. If they’re used anyway, 1900 is substituted for the year, and 1 for the month and day.
So it looks like using inexact values such as "Friday" use the same fallback of defaulting to 1.

Related

Converting worded date format to datetime format in pandas

Today one of my script gave an error for an invalid datetime format as an input. The script is expecting the datetime input as '%m/%d/%Y', but it got it in an entirely different format. For example, the date should have been 5/2/2022 but it was May 2, 2022. To add a bit more information for clarity, the input is coming for a Google sheet and the entire date is in a single cell (rather than different cells for month, date and year).
Is there a way to convert this kind of worded format to the desired datetime format before the script starts any kind of processing?
If you're in presence of the full month name, try this:
>>> pd.to_datetime(df["Date"], format="%B %d, %Y")
0 2022-05-02
Name: Date, dtype: datetime64[ns]
According to the Python docs:
%B: "Month as locale’s full name".
%d: "Day of the month as a zero-padded decimal number". (Although it seems to work in this case)
%Y: "Year with century as a decimal number."
Now, if you want to transform this date to the format you initially expected, just transform the series using .dt.strftime:
>>> pd.to_datetime(df["Date"], format="%B %d, %Y").dt.strftime("%m/%d/%Y")
0 05/02/2022
Name: Date, dtype: object

Parse datetime when it comes in two different formats - Python

Depending on whether the item is over a week old or not, the date comes in the following formats:
Less than a week old comprised of day of the week and time:
date1 = "Monday 21:14"
More than a week old, is just the date:
date2 = "5 Apr '21"
I read them in as a string and need to parse them both to a consistent timestamp.
One way I tried is:
from dateutil.parser import parse
parse(date1)
# output: datetime.datetime(2021, 4, 23, 22, 18)
parse(date2)
# output: datetime.datetime(2021, 4, 5, 0, 0)
Using the dateutil package I can get it to easily parse date2, but it gives me the forward looking friday for date1 not the previous friday.
How would you suggest I parse the two dates, without knowing which will be received? and can I instruct the parser to take the last previous weekday (i.e. previous friday)
Many thanks
Setting a default for the parser might work. Since you want to take the last previous weekday in case the day of the month is not defined, you can use today's date one week ago.
Ex:
from datetime import datetime, timedelta
from dateutil import parser
date1 = "Monday 21:14"
date2 = "5 Apr '21"
# reference = date of today one week ago (as datetime object):
ref_date = datetime(*datetime.now().timetuple()[:3]) - timedelta(7)
for d in (date1, date2):
print(parser.parse(d, default=ref_date))
# 2021-04-12 21:14:00
# 2021-04-05 00:00:00
Note that today is Monday 2021-4-19 but this code gives you the previous Monday, 2021-4-12.

getting 2 different date input from user

I am trying to get user input for 2 different dates which i will pass on to another function.
def twodifferentdates():
print("Data between 2 different dates")
start_date = datetime.strptime(input('Enter Start Date in m/d/y format'), '%m&d&Y')
end_date = datetime.strptime(input('Enter end date in m/d/y format'), '%m&d&Y')
print(start_date)
twodifferentdates()
I have tried a lot of different ways to enter the dates but i keep getting
ValueError: time data '01/11/1987' does not match format '%m&d&Y'
I have used the same code which was discussed in:
how do I take input in the date time format?
Any help here would be appreciated.
Replace %m&d&Y with %m/%d/%Y as described in the referenced post.
datetime.strptime() requires you to specify the format, on a character-by-character basis, of the date you want to input. For the string '01/11/1987' you'd do
datetime.strptime(..., '%m/%d/%Y')
where %m is "two-digit month", %d is "two-digit day" and %Y is "four-digit year" (as opposed to two-digit year %y. These values are separated by slashes.
See also the datetime documentation which describes how to use strptime and strftime.
I'm not very experienced with the datetime module, but the error seems to be the way you're taking input. You should be taking it like this:
start_date = datetime.strptime(input('Enter Start Date in m/d/y format'), '%m &d &Y')
or
start_date = datetime.strptime(input('Enter Start Date in m/d/y format'), '%m/&d/&Y')

ValueError: time data '10/11/2006 24:00' does not match format '%d/%m/%Y %H:%M'

I tried:
df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))
but got this error:
ValueError: time data '10/11/2006 24:00' does not match format
'%d/%m/%Y %H:%M'
How to solve it correctly?
The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.
I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.
Like:
from datetime import timedelta
import pandas as pd
df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)
The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.
Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.
Here's one way to approach the problem:
# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'
# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')
# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)
Here's some data to test:
dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})
Result after transformations described above:
print(df['datetime'])
0 2006-11-11 00:00:00
1 2006-11-11 00:00:00
2 2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]
As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.

Python string of numbers to date

I have am trying to process data with a timestamp field. The timestamp looks like this:
'20151229180504511' (year, month, day, hour, minute, second, millisecond)
and is a python string. I am attempting to convert it to a python datetime object. Here is what I have tried (using pandas):
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S"))
# returns error time data '20151229180504511' does not match format '%Y%b%d%H%M%S'
So I add milliseconds:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S%f"))
# also tried with .%f all result in a format error
So tried using the dateutil.parser:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda s: dateutil.parser.parse(s).strftime(DateFormat))
# results in OverflowError: 'signed integer is greater than maximum'
Also tried converting these entries using the pandas function:
data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'], unit='ms', errors='coerce')
# coerce does not show entries as NaT
I've made sure that whitespace is gone. Converting to Strings, to integers and floats. No luck so far - pretty stuck.
Any ideas?
p.s. Background info: The data is generated in an Android app as a the java.util.Calendar class, then converted to a string in Java, written to a csv and then sent to the python server where I read it in using pandas read_csv.
Just try :
datetime.strptime(x,"%Y%m%d%H%M%S%f")
You miss this :
%b : Month as locale’s abbreviated name.
%m : Month as a zero-padded decimal number.
%b is for locale-based month name abbreviations like Jan, Feb, etc.
Use %m for 2-digit months:
In [36]: df = pd.DataFrame({'Timestamp':['20151229180504511','20151229180504511']})
In [37]: df
Out[37]:
Timestamp
0 20151229180504511
1 20151229180504511
In [38]: pd.to_datetime(df['Timestamp'], format='%Y%m%d%H%M%S%f')
Out[38]:
0 2015-12-29 18:05:04.511
1 2015-12-29 18:05:04.511
Name: Timestamp, dtype: datetime64[ns]

Categories

Resources