python dataframe converting multiple datetime formats

python dataframe converting multiple datetime formats - python

I have a pandas.dataframe like this ('col' column has two formats):
col val
'12/1/2013' value1
'1/22/2014 12:00:01 AM' value2
'12/10/2013' value3
'12/31/2013' value4
I want to convert them into datetime, and I am considering using:
test_df['col']= test_df['col'].map(lambda x: datetime.strptime(x, '%m/%d/%Y'))
test_df['col']= test_df['col'].map(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M %p'))
Obviously either of them works for the whole df. I'm thinking about using try and except but didn't get any luck, any suggestions?

Just use to_datetime, it's man/woman enough to handle both those formats:
In [4]:
df['col'] = pd.to_datetime(df['col'])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
col 4 non-null datetime64[ns]
val 4 non-null object
dtypes: datetime64[ns](1), object(1)
memory usage: 96.0+ bytes
The df now looks likes this:
In [5]:
df
Out[5]:
col val
0 2013-12-01 00:00:00 value1
1 2014-01-22 00:00:01 value2
2 2013-12-10 00:00:00 value3
3 2013-12-31 00:00:00 value4

I had two different date formats in the same column Temps, similar to the OP, which look like the following;
01.03.2017 00:00:00.000
01/03/2017 00:13
The timings are as follows for the two different code snippets;
v['Timestamp1'] = pd.to_datetime(v.Temps)
Took 25.5408718585968 seconds
v['Timestamp'] = pd.to_datetime(v.Temps, format='%d/%m/%Y %H:%M', errors='coerce')
mask = v.Timestamp.isnull()
v.loc[mask, 'Timestamp'] = pd.to_datetime(v[mask]['Temps'], format='%d.%m.%Y %H:%M:%S.%f',
errors='coerce')
Took 0.2923243045806885 seconds
In other words, if you have a small number of known formats for your datetimes, don't use to_datetime without a format!

You can create a new column :
test_df['col1'] = pd.Timestamp(test_df['col']).to_datetime()
and then drop col and rename col1.

It works for me.
I had two formats in my column 'fecha_hechos'. The formats where:
2015/03/02
10/02/2010
what I did was:
carpetas_cdmx['Timestamp'] = pd.to_datetime(carpetas_cdmx.fecha_hechos, format='%Y/%m/%d %H:%M:%S', errors='coerce')
mask = carpetas_cdmx.Timestamp.isnull()
carpetas_cdmx.loc[mask, 'Timestamp'] = pd.to_datetime(carpetas_cdmx[mask]['fecha_hechos'], format='%d/%m/%Y %H:%M',errors='coerce')
were: carpetas_cdmx is my DataFrame
and fecha_hechos the column with my formats

Related

Convert Unix time in a DataFrame to Seconds [duplicate]

I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates.
So for instance I have date as 1349633705 in the index column but I'd want it to show as 10/07/2012 (or at least 10/07/2012 18:15).
For some context, here is the code I'm working with and what I've tried already:
import json
import urllib2
from datetime import datetime
response = urllib2.urlopen('http://blockchain.info/charts/market-price?&format=json')
data = json.load(response)
df = DataFrame(data['values'])
df.columns = ["date","price"]
#convert dates
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.index = df.date
As you can see I'm using
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d")) here which doesn't work since I'm working with integers, not strings. I think I need to use datetime.date.fromtimestamp but I'm not quite sure how to apply this to the whole of df.date.
Thanks.

These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object

If you try using:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],***unit='s'***))
and receive an error :
"pandas.tslib.OutOfBoundsDatetime: cannot convert input with unit 's'"
This means the DATE_FIELD is not specified in seconds.
In my case, it was milli seconds - EPOCH time.
The conversion worked using below:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))

Assuming we imported pandas as pd and df is our dataframe
pd.to_datetime(df['date'], unit='s')
works for me.

The Pandas Documentation gives this and other format examples and wasn't included in any of the above previous answers. Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

Alternatively, by changing a line of the above code:
# df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.date = df.date.apply(lambda d: datetime.datetime.fromtimestamp(int(d)).strftime('%Y-%m-%d'))
It should also work.

Pandas - Datetime format change to '%m/%d/%Y'

Hello there stackoverflow community,
I would like to change the datetime format of a column, but I doesn't work and I don't know what I'am doing wrong.
After executing the following code:
df6['beginn'] = pd.to_datetime(df6['beginn'], unit='s', errors='ignore')
I got this output, and thats fine, but i would like to take out the hour to have only %m/%d/%Y left.
ID DATE
91060 2017-11-10 00:00:00
91061 2022-05-01 00:00:00
91062 2022-04-01 00:00:00
Name: beginn, Length: 91063, dtype: object
I've tried this one and many others
df6['beginn'] = df6['beginn'].dt.strftime('%m/%d/%Y')
and get the following output:
AttributeError: Can only use .dt accessor with datetimelike values.
But I don't understand why, I've transformed the data with pd.to_datetime or not?
Appreciate any hint you can give me! Thanks a lot!

The reason you have to use errors="ignore" is because not all the dates you are parsing are in the correct format. If you use errors="coerce" like #phi has mentioned then any dates that cannot be converted will be set to NaT. The columns datatype will still be converted to datatime64 and you can then format as you like and deal with the NaT as you want.
Example
A dataframe with one item in Date not written as Year/Month/Day (25th Month is wrong):
>>> df = pd.DataFrame({'ID': [91060, 91061, 91062, 91063], 'Date': ['2017/11/10', '2022/05/01', '2022/04/01', '2055/25/25']})
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Using errors="ignore":
>>> df['Date'] = pd.to_datetime(df['Date'], errors='ignore')
>>> df
ID Date
0 91060 2017/11/10
1 91061 2022/05/01
2 91062 2022/04/01
3 91063 2055/25/25
>>> df.dtypes
ID int64
Date object
dtype: object
Column Date is still an object because not all the values have been converted. Running df['Date'] = df['Date'].dt.strftime("%m/%d/%Y") will result in the AttributeError
Using errors="coerce":
>>> df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
>>> df
ID Date
0 91060 2017-11-10
1 91061 2022-05-01
2 91062 2022-04-01
3 91063 NaT
>>> df.dtypes
ID int64
Date datetime64[ns]
dtype: object
Invalid dates are set to NaT and the column is now of type datatime64 and you can now format it:
>>> df['Date'] = df['Date'].dt.strftime("%m/%d/%Y")
>>> df
ID Date
0 91060 11/10/2017
1 91061 05/01/2022
2 91062 04/01/2022
3 91063 NaN
Note: When formatting datatime64, it is converted back to type object so NaT's are changed to NaN. The issue you are having is a case of some dirty data not in the correct format.

Convert Column Timestamp to Date Time [duplicate]

I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates.
So for instance I have date as 1349633705 in the index column but I'd want it to show as 10/07/2012 (or at least 10/07/2012 18:15).
For some context, here is the code I'm working with and what I've tried already:
import json
import urllib2
from datetime import datetime
response = urllib2.urlopen('http://blockchain.info/charts/market-price?&format=json')
data = json.load(response)
df = DataFrame(data['values'])
df.columns = ["date","price"]
#convert dates
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.index = df.date
As you can see I'm using
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d")) here which doesn't work since I'm working with integers, not strings. I think I need to use datetime.date.fromtimestamp but I'm not quite sure how to apply this to the whole of df.date.
Thanks.

These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object

If you try using:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],***unit='s'***))
and receive an error :
"pandas.tslib.OutOfBoundsDatetime: cannot convert input with unit 's'"
This means the DATE_FIELD is not specified in seconds.
In my case, it was milli seconds - EPOCH time.
The conversion worked using below:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))

Assuming we imported pandas as pd and df is our dataframe
pd.to_datetime(df['date'], unit='s')
works for me.

The Pandas Documentation gives this and other format examples and wasn't included in any of the above previous answers. Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

Alternatively, by changing a line of the above code:
# df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.date = df.date.apply(lambda d: datetime.datetime.fromtimestamp(int(d)).strftime('%Y-%m-%d'))
It should also work.

pd.to_datetime to unix timestamp to date in python is giving incoreect output [duplicate]

I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates.
So for instance I have date as 1349633705 in the index column but I'd want it to show as 10/07/2012 (or at least 10/07/2012 18:15).
For some context, here is the code I'm working with and what I've tried already:
import json
import urllib2
from datetime import datetime
response = urllib2.urlopen('http://blockchain.info/charts/market-price?&format=json')
data = json.load(response)
df = DataFrame(data['values'])
df.columns = ["date","price"]
#convert dates
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.index = df.date
As you can see I'm using
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d")) here which doesn't work since I'm working with integers, not strings. I think I need to use datetime.date.fromtimestamp but I'm not quite sure how to apply this to the whole of df.date.
Thanks.

These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object

If you try using:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],***unit='s'***))
and receive an error :
"pandas.tslib.OutOfBoundsDatetime: cannot convert input with unit 's'"
This means the DATE_FIELD is not specified in seconds.
In my case, it was milli seconds - EPOCH time.
The conversion worked using below:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))

Assuming we imported pandas as pd and df is our dataframe
pd.to_datetime(df['date'], unit='s')
works for me.

The Pandas Documentation gives this and other format examples and wasn't included in any of the above previous answers. Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

Alternatively, by changing a line of the above code:
# df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.date = df.date.apply(lambda d: datetime.datetime.fromtimestamp(int(d)).strftime('%Y-%m-%d'))
It should also work.

Formatting Dates with Pandas [duplicate]

I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates.
So for instance I have date as 1349633705 in the index column but I'd want it to show as 10/07/2012 (or at least 10/07/2012 18:15).
For some context, here is the code I'm working with and what I've tried already:
import json
import urllib2
from datetime import datetime
response = urllib2.urlopen('http://blockchain.info/charts/market-price?&format=json')
data = json.load(response)
df = DataFrame(data['values'])
df.columns = ["date","price"]
#convert dates
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.index = df.date
As you can see I'm using
df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d")) here which doesn't work since I'm working with integers, not strings. I think I need to use datetime.date.fromtimestamp but I'm not quite sure how to apply this to the whole of df.date.
Thanks.

These appear to be seconds since epoch.
In [20]: df = DataFrame(data['values'])
In [21]: df.columns = ["date","price"]
In [22]: df
Out[22]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 358 entries, 0 to 357
Data columns (total 2 columns):
date 358 non-null values
price 358 non-null values
dtypes: float64(1), int64(1)
In [23]: df.head()
Out[23]:
date price
0 1349720105 12.08
1 1349806505 12.35
2 1349892905 12.15
3 1349979305 12.19
4 1350065705 12.15
In [25]: df['date'] = pd.to_datetime(df['date'],unit='s')
In [26]: df.head()
Out[26]:
date price
0 2012-10-08 18:15:05 12.08
1 2012-10-09 18:15:05 12.35
2 2012-10-10 18:15:05 12.15
3 2012-10-11 18:15:05 12.19
4 2012-10-12 18:15:05 12.15
In [27]: df.dtypes
Out[27]:
date datetime64[ns]
price float64
dtype: object

If you try using:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],***unit='s'***))
and receive an error :
"pandas.tslib.OutOfBoundsDatetime: cannot convert input with unit 's'"
This means the DATE_FIELD is not specified in seconds.
In my case, it was milli seconds - EPOCH time.
The conversion worked using below:
df[DATE_FIELD]=(pd.to_datetime(df[DATE_FIELD],unit='ms'))

Assuming we imported pandas as pd and df is our dataframe
pd.to_datetime(df['date'], unit='s')
works for me.

The Pandas Documentation gives this and other format examples and wasn't included in any of the above previous answers. Link:
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Code
pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

Alternatively, by changing a line of the above code:
# df.date = df.date.apply(lambda d: datetime.strptime(d, "%Y-%m-%d"))
df.date = df.date.apply(lambda d: datetime.datetime.fromtimestamp(int(d)).strftime('%Y-%m-%d'))
It should also work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python dataframe converting multiple datetime formats - python

You can create a new column : test_df['col1'] = pd.Timestamp(test_df['col']).to_datetime() and then drop col and rename col1.

Related

Convert Unix time in a DataFrame to Seconds [duplicate]

Pandas - Datetime format change to '%m/%d/%Y'

Convert Column Timestamp to Date Time [duplicate]

pd.to_datetime to unix timestamp to date in python is giving incoreect output [duplicate]

Formatting Dates with Pandas [duplicate]

Categories

Resources