I have a date column of the format YYYY-MM-DD. I want to slice the only year and month from it. But I don't want the "-" as I have to later convert it into an integer to feed into my linear regression model.
It's current datatype is "object".
Dataframe :-
date open close high low
0 2019-10-08 56.46 56.10 57.02 56.08
1 2019-10-09 56.76 56.76 56.95 56.41
2 2019-10-10 56.98 57.52 57.61 56.83
3 2019-10-11 58.24 59.05 59.41 58.08
4 2019-10-14 58.73 58.97 59.53 58.67
You can use pd.to_datetime to convert date column to datetime then use pd.Series.dt.strftime.
s = pd.to_datetime(df['date'])
df['date'] = s.dt.strftime("%Y%m") # would give 202010
# or
# df['date'] = s.dt.strftime("%y%m") # would give 2010
date --> your date column
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(lambda x: x.strftime('%Y-%m'))
Related
I am trying to slice the data based on the date.
If I know what date , I know how to do the slicing. In my case I will NOT the date stamp.
So based on date , I want to do slicing to do my further operation on the data
Please refer to the example for data. Here date column can have a date of any day. I want slice the data.
First slice will be for date : 20211201
Second slice will be for date : 20211202
I am able to covert column into date time format as below
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
please help over this
here is what you need to do :
df = df[df['time'].between('9:10','9:20')].groupby('date')['Open'].max()
Input data
The data you used is:
import pandas as pd
df = pd.DataFrame({"date":[20211201,20211201,20211201,20211201,20211201,20211202,20211202,20211202,20211202],\
"time":["9:08","9:16","9:17","9:18","9:19","13:08","13:09","13:10","13:11"],\
"Open":[17104.4,17105.05,171587.75,17175.2,17168.6,17311.95,17316.5,17322.55,17325.9]})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
Solution
You can slice the DataFrame as follows:
import datetime
df1 = df[df.index==datetime.datetime(2021,12,1)]
df2 = df[df.index==datetime.datetime(2021,12,2)]
Output
Then the outputs you would obtain are:
>>> df1
time Open
date
2021-12-01 9:08 17104.40
2021-12-01 9:16 17105.05
2021-12-01 9:17 171587.75
2021-12-01 9:18 17175.20
2021-12-01 9:19 17168.60
>>> df2
time Open
date
2021-12-02 13:08 17311.95
2021-12-02 13:09 17316.50
2021-12-02 13:10 17322.55
2021-12-02 13:11 17325.90
How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)
I've a column with birth dates as object, the problem is when I tried to convert it into datetime, because it displays always the next warning
time data '27126' does not match format '%d/%m/%Y' (match)
date
0 05/06/1980
1 31/07/1947
2 07/01/1963
3 26/03/1973
4 30/01/1991
5 12/12/1991
6 13/08/1987
7 10/01/1944
8 23/06/1965
9 08/10/1995
till now I've tried the next codes:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df['date'] = df['date'].apply(lambda x: datetime.datetime.strptime(x, "%d/%m/%Y").strftime("%Y-%m-%d"))
df['date'] = pd.to_datetime(df['date'].str.strip(), format='%d/%m/%Y')
Add parameter errors='coerce' for convert non matched datetimes to missing values, here NaT:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
For a date column I have data like this: 19.01.01, which means 2019-01-01. Is there a method to change the format from the former to the latter?
My idea is to add 20 to the start of date and replace . with -. Are there better ways to do that?
Thanks.
If format is YY.DD.MM use %y.%d.%m, if format is YY.MM.DD use %y.%m.%d in to_datetime:
df = pd.DataFrame({'date':['19.01.01','19.01.02']})
#YY.DD.MM
df['date'] = pd.to_datetime(df['date'], format='%y.%d.%m')
print (df)
date
0 2019-01-01
1 2019-02-01
#YY.MM.DD
df['date'] = pd.to_datetime(df['date'], format='%y.%m.%d')
print (df)
date
0 2019-01-01
1 2019-01-02
I have a Dataframe that has dates stored in different formats in the same column as shown below:
date
1-10-2018
2-10-2018
3-Oct-2018
4-10-2018
Is there anyway I could make all of them to have the same format.
Use to_datetime with specify formats with errors='coerce' for replace not matched values to NaNs. Last combine_first for replace missing values by date2 Series.
date1 = pd.to_datetime(df['date'], format='%d-%m-%Y', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce')
df['date'] = date1.combine_first(date2)
print (df)
date
0 2018-10-01
1 2018-10-02
2 2018-10-03
3 2018-10-04