I have am trying to process data with a timestamp field. The timestamp looks like this:
'20151229180504511' (year, month, day, hour, minute, second, millisecond)
and is a python string. I am attempting to convert it to a python datetime object. Here is what I have tried (using pandas):
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S"))
# returns error time data '20151229180504511' does not match format '%Y%b%d%H%M%S'
So I add milliseconds:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda x:datetime.strptime(x,"%Y%b%d%H%M%S%f"))
# also tried with .%f all result in a format error
So tried using the dateutil.parser:
data['TIMESTAMP'] = data['TIMESTAMP'].apply(lambda s: dateutil.parser.parse(s).strftime(DateFormat))
# results in OverflowError: 'signed integer is greater than maximum'
Also tried converting these entries using the pandas function:
data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'], unit='ms', errors='coerce')
# coerce does not show entries as NaT
I've made sure that whitespace is gone. Converting to Strings, to integers and floats. No luck so far - pretty stuck.
Any ideas?
p.s. Background info: The data is generated in an Android app as a the java.util.Calendar class, then converted to a string in Java, written to a csv and then sent to the python server where I read it in using pandas read_csv.
Just try :
datetime.strptime(x,"%Y%m%d%H%M%S%f")
You miss this :
%b : Month as locale’s abbreviated name.
%m : Month as a zero-padded decimal number.
%b is for locale-based month name abbreviations like Jan, Feb, etc.
Use %m for 2-digit months:
In [36]: df = pd.DataFrame({'Timestamp':['20151229180504511','20151229180504511']})
In [37]: df
Out[37]:
Timestamp
0 20151229180504511
1 20151229180504511
In [38]: pd.to_datetime(df['Timestamp'], format='%Y%m%d%H%M%S%f')
Out[38]:
0 2015-12-29 18:05:04.511
1 2015-12-29 18:05:04.511
Name: Timestamp, dtype: datetime64[ns]
Related
I have a df with dates in a column converted to a datetime. the current format is YYYYDDMM. I need this converted to YYYYMMDD. I tried the below code but it does not change the format and still gives me YYYYDDMM. the end goal is to subtract 1 business day from the effective date but the format needs to be in YYYYMMDD to do this otherwise it subtracts 1 day from the M and not D. can someone help?
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'])
# Effective Date = 20220408 (4th Aug 2022 for clarity)
filtered_df['Effective Date new'] = filtered_df['Effective Date'].dt.strftime("%Y%m%d")
# Effective Date new = 20220408
desired output -- > Effective Date new = 20220804
By default, .to_datetime will interpret the input YYYYDDMM as YYYYMMDD, and therefore print the same thing with %Y%m%d as the format. You can fix this and make it properly parse days in the month greater than 12 by adding the dayfirst keyword argument.
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'], dayfirst=True)
I like to use the datetime library for this purpose. You can use strptime to convert a string into the datetime object and strftime to convert your datetime object to the new string.
from datetime import datetime
def change_date(row):
row["Effective Date new"] = datetime.strptime(row["Effective Date"], "%Y%d%m").strftime("%Y%m%d")
return row
df2 = df.apply(change_date, axis=1)
The output df2 will have Effective Date new as your new column.
Today one of my script gave an error for an invalid datetime format as an input. The script is expecting the datetime input as '%m/%d/%Y', but it got it in an entirely different format. For example, the date should have been 5/2/2022 but it was May 2, 2022. To add a bit more information for clarity, the input is coming for a Google sheet and the entire date is in a single cell (rather than different cells for month, date and year).
Is there a way to convert this kind of worded format to the desired datetime format before the script starts any kind of processing?
If you're in presence of the full month name, try this:
>>> pd.to_datetime(df["Date"], format="%B %d, %Y")
0 2022-05-02
Name: Date, dtype: datetime64[ns]
According to the Python docs:
%B: "Month as locale’s full name".
%d: "Day of the month as a zero-padded decimal number". (Although it seems to work in this case)
%Y: "Year with century as a decimal number."
Now, if you want to transform this date to the format you initially expected, just transform the series using .dt.strftime:
>>> pd.to_datetime(df["Date"], format="%B %d, %Y").dt.strftime("%m/%d/%Y")
0 05/02/2022
Name: Date, dtype: object
I have a dataframe containing different recorded times as string objects, such as 1:02:45, 51:11, 54:24.
I can't convert to time objects, this is the error I am getting:
"time data '49:49' does not match format '%H:%M:%S"
This is the code I am using:
df_plot2 = df[['year', 'gun_time', 'chip_time']]
df_plot2['gun_time'] = pd.to_datetime(df_plot2['gun_time'], format = '%H:%M:%S')
df_plot2['chip_time'] = pd.to_datetime(df_plot2['chip_time'], format = '%H:%M:%S')
Thanks in advance for your help!
you can create a common format in the time Series by checking string len and adding the hours as zero '00:' where there are only minutes and seconds. Then parse to datetime. Ex:
import pandas as pd
s = pd.Series(["1:02:45", "51:11", "54:24"])
m = s.str.len() <= 5
s.loc[m] = '00:' + s.loc[m]
dts = pd.to_datetime(s)
print(dts)
0 2021-12-01 01:02:45
1 2021-12-01 00:51:11
2 2021-12-01 00:54:24
dtype: datetime64[ns]
I believe it may be because for %H python expects to see 01, 02, 03 etc instead of 1, 2, 3. To use your specific example 1:02:45 may have to be in the 01:02:45 format for python to be able to convert it to a datetime variable with %H:%M:$S.
I have a column called 'created_at' in dataframe df, its value is like '2/3/15 2:00' in UTC. Now I want to convert it to unix time, how can I do that?
I tried the script like:
time.mktime(datetime.datetime.strptime(df['created_at'], "%m/%d/%Y, %H:%MM").timetuple())
It returns error I guess the tricky part is the year is '15' instead of '2015'
Is there any efficient way that I am able to deal with it?
Thanks!
since you mention that you're working with a pandas DataFrame, you can simplify to using
import pandas as pd
import numpy as np
df = pd.DataFrame({'times': ['2/3/15 2:00']})
# to datetime, format is inferred correctly
df['datetime'] = pd.to_datetime(df['times'])
# df['datetime']
# 0 2015-02-03 02:00:00
# Name: datetime, dtype: datetime64[ns]
# to Unix time / seconds since 1970-1-1 Z
# .astype(np.int64) on datetime Series gives you nanoseconds, so divide by 1e9 to get seconds
df['unix'] = df['datetime'].astype(np.int64) / 1e9
# df['unix']
# 0 1.422929e+09
# Name: unix, dtype: float64
%Y is for 4-digit years.
Since you have 2-digits years (assuming it's 20##), you can use %y specifier instead (notice the lower-case y).
You should use lowercase %y (year without century) rather than uppercase %Y (year with century)
I tried:
df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))
but got this error:
ValueError: time data '10/11/2006 24:00' does not match format
'%d/%m/%Y %H:%M'
How to solve it correctly?
The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.
I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.
Like:
from datetime import timedelta
import pandas as pd
df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)
The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.
Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.
Here's one way to approach the problem:
# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'
# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')
# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)
Here's some data to test:
dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})
Result after transformations described above:
print(df['datetime'])
0 2006-11-11 00:00:00
1 2006-11-11 00:00:00
2 2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]
As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.