Date1 :20061201
Date2 :01/12/2006
How could use pandas in Python to convert date1 into date2(day/month/year) format?Thanks!Date1 and Date2 are two column in csv files.
Data:
In [151]: df
Out[151]:
Date
0 20061201
1 20170530
Option 1:
In [152]: pd.to_datetime(df.Date, format='%Y%m%d').dt.strftime('%d/%m/%Y')
Out[152]:
0 01/12/2006
1 30/05/2017
Name: Date, dtype: object
Option 2:
In [153]: df.Date.astype(str).str.replace('(\d{4})(\d{2})(\d{2})', r'\3/\2/\1')
Out[153]:
0 01/12/2006
1 30/05/2017
Name: Date, dtype: object
If you're using pandas and want a timestamp object back
pd.to_datetime('20061201')
Timestamp('2006-12-01 00:00:00')
If you want a string back
str(pd.to_datetime('20061201').date())
'2006-12-01'
Assuming you have a dataframe df
df = pd.DataFrame(dict(Date1=['20161201']))
Then you can use the same techniques in vectorized form.
as timestamps
df.assign(Date2=pd.to_datetime(df.Date1))
Date1 Date2
0 20161201 2016-12-01
as strings
df.assign(Date2=pd.to_datetime(df.Date1).dt.date.astype(str))
Date1 Date2
0 20161201 2016-12-01
import datetime
A=datetime.datetime.strptime('20061201','%Y%m%d')
A.strftime('%m/%d/%Y')
You may use apply and lambda function here.
Suppose you have a dataset named df as below:
id date1
0 20061201
2 20061202
You can use the code like below:
df['date2'] = df['date1'].apply(lambda x: x[6:] + '/' + x[4:6] + '/' + x[:4])
The result will be:
id date1 date2
0 20061201 01/12/2016
2 20061202 02/12/2016
The simplest way is probably using the date parsing provided by datetime:
from datetime import datetime
datetime.strptime(str(20061201), "%Y%m%d")
You can apply this transformation to all rows in your pandas dataframe/series using the following:
from datetime import datetime
def convert_date(d):
return datetime.strptime(str(d), "%Y%m%d")
df['Date2'] = df.Date1.apply(convert_date)
This will add a Date2 column to your dataframe df, which is the datetime representation of the Date1 column.
You can then serialize the date again by using strftime:
def serialize_date(d):
return d.strftime(d, "%d/%m/%Y")
df['Date2'] = df.Date2.apply(serialize_date)
Alternatively you can do it all with string manipulations:
def reformat_date(d):
year = d // 10000
month = d % 10000 // 100
day = d % 100
return "{day}/{month}/{year}".format(day=day, month=month, year=year)
df['Date2'] = df.Date1.apply(reformat_date)
This is quite a bit faster than using the parsing machinery provided by strptime.
Related
I'm trying to convert all data in a column from the below to dates.
Event Date
2020-07-16 00:00:00
31/03/2022, 26/11/2018, 31/01/2028
This is just a small section of the data - there are more columns/rows.
I've tried to split out the cells with multiple values using the below:
df["Event Date"] = df["Event Date"].str.replace(' ', '')
df["Event Date"] = df["Event Date"].str.split(",")
df= df.explode("Event Date")
The issue with this is it sets any cell without a ',' e.g. '2020-07-16 00:00:00' to NaN.
Is there any way to separate the values with a ',' and set the entire column to date types?
You can use combination of split and explode to separate dates and then use infer_datetime_format to convert mixed date types
df = df.assign(dates=df['dates'].str.split(',')).explode('dates')
df
Out[18]:
dates
0 2020-07-16 00:00:00
1 31/03/2022
1 26/11/2018
1 31/01/2028
df.dates = pd.to_datetime(df.dates, infer_datetime_format=True)
df.dates
Out[20]:
0 2020-07-16
1 2022-03-31
1 2018-11-26
1 2028-01-31
Name: dates, dtype: datetime64[ns]
Here is a proposition with pandas.Series.str.split and pandas.Series.explode :
s_dates = (
df["Event Date"]
.str.split(",")
.explode(ignore_index=True)
.apply(pd.to_datetime, dayfirst=True)
)
Output :
0 2020-07-16
1 2022-03-31
2 2018-11-26
3 2028-01-31
Name: Event Date, dtype: datetime64[ns]
Your example table shows mixed date formats in each row. The idea is to try a date parsing technique and then try another if it fails. Using loops and having such wide variations of data types are red flags with a script design. I recommend using datetime and dateutil to handle the dates.
from datetime import datetime
from dateutil import parser
date_strings = ["2020-07-16 00:00:00", "31/03/2022, 26/11/2018, 31/01/2028"] % Get these from your table.
parsed_dates = []
for date_string in date_strings:
try:
# strptime
date_object = datetime.strptime(date_string, "%Y-%m-%d %H:%M:%S")
parsed_dates.append(date_object)
except ValueError:
# parser.parse() and split
date_strings = date_string.split(",")
for date_str in date_strings:
date_str = date_str.strip()
date_object = parser.parse(date_str, dayfirst=True)
parsed_dates.append(date_object)
print(parsed_dates)
Try the code on Trinket: https://trinket.io/python3/95c0d14271
How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)
What's the best way to do this? I thought about extracting the two separately then combining them? This doesn't seem like it should be the most efficient way?
df['date'] = df['datetime'].dt.date
df['hour'] = df['datetime'].hour
df['dateAndHour'] = df['datetime'].dt.date.astype(str) + ' ' + df['datetime'].dt.hour.astype(str)
You can use strftime and it depends on the format your date is in and how you want to combine them
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'date':[datetime.now()]})
df['date-hour'] = df.date.dt.strftime('%Y-%m-%d %H')
df
date date-hour
0 2020-11-18 11:03:38.390393 2020-11-18 11
Depends what you want to do with it, but one way to do this would be to use strftime to format the datetime column to %Y-%m-%d %H or similar:
>>> df
datetime
0 2020-01-01 12:15:00
1 2020-10-22 11:11:11
>>> df.datetime.dt.strftime("%Y-%m-%d %H")
0 2020-01-01 12
1 2020-10-22 11
Name: datetime, dtype: object
I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.
I have a data frame that contains 2 columns, one is Date and other is float number.
I would like to add those 2 to get the following:
Index Date Days NewDate
0 20-04-2016 5 25-04-2016
1 16-03-2015 3.7 20-03-2015
As you can see if there is decimal it is converted as int as 3.1--> 4 (days).
I have some weird questions so I appreciate any help.
Thank you !
First, ensure that the Date column is a datetime object:
df['Date'] = pd.to_datetime(df['Date'])
Then, we can convert the Days column to int by ceiling it and the converting it to a pandas Timedelta:
temp = df['Days'].apply(np.ceil).apply(lambda x: pd.Timedelta(x, unit='D'))
Datetime objects and timedeltas can be added:
df['NewDate'] = df['Date'] + temp
You can convert the Days column to timedelta and add it to Date column:
import pandas as pd
df['NewDate'] = pd.to_datetime(df.Date) + pd.to_timedelta(pd.np.ceil(df.Days), unit="D")
df
using combine for two columns calculations and pd.DateOffset for adding days
df['NewDate'] = df['Date'].combine(df['Days'], lambda x,y: x + pd.DateOffset(days=int(np.ceil(y))))
output:
Date Days NewDate
0 2016-04-20 5.0 2016-04-25
1 2016-03-16 3.7 2016-03-20