pandas convert string to date (the string time is B.E year) - python

I have a data frame in which a column contains a date having time represent in B.E year format:
date
28-01-2562
29-01-2562
30-01-2562
31-01-2562
I tried using pd.to_datetime but its give me an error:
pd.to_datetime(df['date'])
This is the error I got:
Out of bounds nanosecond timestamp: 2562-01-30 00:00:00

You can convert values to daily periods:
df['date'] = df['date'].apply(pd.Period)
print (df)
date
0 2562-01-28
1 2562-01-29
2 2562-01-30
3 2562-01-31

Related

Pandas Date Formatting (With Optional Milliseconds)

I'm getting data from an API and putting it into a Pandas DataFrame. The date column needs formatting into date/time, which I am doing. However the API sometimes returns dates without milliseconds which doesn't match the format pattern. This results in an error:
time data '2020-07-30T15:57:37Z' does not match format '%Y-%m-%dT%H:%M:%S.%fZ' (match)
In this example, how can I format the date column to date/time, so all dates are formatted with milliseconds?
import pandas as pd
dates = {
'date': ['2020-07-30T15:57:37Z', '2020-07-30T15:57:37.1Z']
}
df = pd.DataFrame(dates)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%dT%H:%M:%S.%fZ')
print(df)
do it one time with milliseconds included and another time without milliseconds included. use errors='coerce' to return NaT when ValueError occurs.
with_miliseconds = pd.to_datetime(df['date'], format='%Y-%m-%dT%H:%M:%S.%fZ',errors='coerce')
without_miliseconds = pd.to_datetime(df['date'], format='%Y-%m-%dT%H:%M:%SZ',errors='coerce')
the results would be something like this:
with milliseconds:
0 NaT
1 2020-07-30 15:57:37.100
Name: date, dtype: datetime64[ns]
without milliseconds:
0 2020-07-30 15:57:37
1 NaT
Name: date, dtype: datetime64[ns]
then you can fill NaTs of one dataframe with values of the other because they complement each other.
with_miliseconds.fillna(without_miliseconds)
0 2020-07-30 15:57:37.000
1 2020-07-30 15:57:37.100
Name: date, dtype: datetime64[ns]
To have a consistent format in your output DataFrame, you could run a Regex replacement before converting to a df for all values without mills.
dates = {'date': [re.sub(r'Z', '.0Z', date) if '.' not in date else date for date in dates['date']]}
Since only those dates containing a . have mills, we can run the replacements on the others.
After that, everything else is the same as in your code.
Output:
date
0 2020-07-30 15:57:37.000
1 2020-07-30 15:57:37.100
As your date string seems like the standard ISO 8601 you can just avoid the use of the format param. The parser will take into account that miliseconds are optional.
import pandas as pd
dates = {
'date': ['2020-07-30T15:57:37Z', '2020-07-30T15:57:37.1Z']
}
df = pd.DataFrame(dates)
df['date'] = pd.to_datetime(df['date'])
print(df)
date
0 2020-07-30 15:57:37+00:00
1 2020-07-30 15:57:37.100000+00:00

How to remove hours, minutes, seconds and UTC offset from pandas date column? I'm running with streamlit and pandas

How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)

How do to convert a Pandas column with different date formatting from object to Datetime type? [duplicate]

This question already has answers here:
Convert Pandas Column to DateTime
(8 answers)
Closed 1 year ago.
I have a DataFrame from Pandas:
import pandas as pd
inp = [{'date':'31/03/2021 11:50:12 PM', 'value':100},
{'date':'1/4/2021 0:53','value':110},
{'date':'2/04/2021 9:40:12 AM', 'value':200}]
df = pd.DataFrame(inp)
print(f'{df}\n')
print(df.dtypes)
output:
date value
0 '31/03/2021 11:50:12 PM' 100
1 '1/4/2021 0:53' 110
2 '2/04/2021 9:40:12 AM' 100
date object
value int64
dtype: object
Now I want to convert the 'date' column from object to Datetime type so that the output will be as follow:
date value
0 2021-03-31 23:50:12 100
1 2021-04-01 00:53:00 110
2 2021-04-02 09:40:12 100
date datetime64[ns]
value int64
dtype: object
I have tried to run this code:
df['date'] = pd.to_datetime(df['date'])
print(f'{df}\n')
print(df.dtypes)
But the output was as follow:
date value
0 2021-03-31 23:50:12 100
1 2021-01-04 00:53:00 110
2 2021-02-04 09:40:12 100
date datetime64[ns]
value int64
dtype: object
As you can see, the panda to_datetime function mistook the first number in the column value with the dd/mm/yyyy hh:mm format as month. I would like to know how to convert the column to datetype format while also accounting for the different placement of the month and day numbers.
Hi try it with this code.
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d%H%M%S')
def function_to_format(string):
#Format date using datetime
return formatted_datetime
dataframe['col']=dataframe['col'].apply(function_to_format)
Use datetime library to convert string to datetime, format the new dateime object using strftime() to convert to required format.

Dataframe datetime switching month into days

I am trying to convert a day/month/Year Hours:Minutes column into just day and month. When I run my code, the conversion switches the months into days and the days into months.
You can find a copy of my dataframe with the one column I want to switch to Day/Month here
https://file.io/JkWl7fsBN0vl
Below is the code I am using to convert:
df =pd.read_csv('Example.csv')
df['DateTime'] = pd.to_datetime(df['DateTime'])
df.to_csv("output.csv", index=False)
Without knowing the exact DateTime format you are using (the link to the dataframe is broken), I'm going to use an example of
day/month/Year Hours:Minutes
05/09/2014 12:30
You can determine the exact format date code using this site
Essentially, to_datetime() has a format argument where you can pass in the specific format when it is not immediately obvious. This will let you specify that what it keeps confusing for month -> day, day -> month is actually the opposite.
>>> df = pd.DataFrame(['05/09/2014 12:30'],columns=['DateTime'])
DateTime
0 05/09/2014 12:30
>>> df['DateTime'] = pd.to_datetime(df['DateTime'], format='%d/%m/%Y %H:%M')
DateTime
0 2014-09-05 12:30:00
>>> df['day'] = df['DateTime'].dt.day
>>> df['month'] = df['DateTime'].dt.month
DateTime day month
0 2014-09-05 12:30:00 5 9
>>> df['DD/MM'] = df['DateTime'].dt.strftime('%d/%m')
DateTime day month DD/MM
0 2014-09-05 12:30:00 5 9 05/09
I'm unsure about the exact format you want the day and month available in (separate columns, combined), but I provided a few examples, so you can remove the DateTime column when you're done with it and use the one you need.

Add days to date in pandas

I have a data frame that contains 2 columns, one is Date and other is float number.
I would like to add those 2 to get the following:
Index Date Days NewDate
0 20-04-2016 5 25-04-2016
1 16-03-2015 3.7 20-03-2015
As you can see if there is decimal it is converted as int as 3.1--> 4 (days).
I have some weird questions so I appreciate any help.
Thank you !
First, ensure that the Date column is a datetime object:
df['Date'] = pd.to_datetime(df['Date'])
Then, we can convert the Days column to int by ceiling it and the converting it to a pandas Timedelta:
temp = df['Days'].apply(np.ceil).apply(lambda x: pd.Timedelta(x, unit='D'))
Datetime objects and timedeltas can be added:
df['NewDate'] = df['Date'] + temp
You can convert the Days column to timedelta and add it to Date column:
import pandas as pd
df['NewDate'] = pd.to_datetime(df.Date) + pd.to_timedelta(pd.np.ceil(df.Days), unit="D")
df
using combine for two columns calculations and pd.DateOffset for adding days
df['NewDate'] = df['Date'].combine(df['Days'], lambda x,y: x + pd.DateOffset(days=int(np.ceil(y))))
output:
Date Days NewDate
0 2016-04-20 5.0 2016-04-25
1 2016-03-16 3.7 2016-03-20

Categories

Resources