How to extract multiple parts of values of a single column?

How to extract multiple parts of values of a single column? - python

I have a date column of the format YYYY-MM-DD. I want to slice the only year and month from it. But I don't want the "-" as I have to later convert it into an integer to feed into my linear regression model.
It's current datatype is "object".
Dataframe :-
date open close high low
0 2019-10-08 56.46 56.10 57.02 56.08
1 2019-10-09 56.76 56.76 56.95 56.41
2 2019-10-10 56.98 57.52 57.61 56.83
3 2019-10-11 58.24 59.05 59.41 58.08
4 2019-10-14 58.73 58.97 59.53 58.67

You can use pd.to_datetime to convert date column to datetime then use pd.Series.dt.strftime.
s = pd.to_datetime(df['date'])
df['date'] = s.dt.strftime("%Y%m") # would give 202010
# or
# df['date'] = s.dt.strftime("%y%m") # would give 2010

date --> your date column
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(lambda x: x.strftime('%Y-%m'))

Related

I want to slice the data in pandas based on date time

I am trying to slice the data based on the date.
If I know what date , I know how to do the slicing. In my case I will NOT the date stamp.
So based on date , I want to do slicing to do my further operation on the data
Please refer to the example for data. Here date column can have a date of any day. I want slice the data.
First slice will be for date : 20211201
Second slice will be for date : 20211202
I am able to covert column into date time format as below
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
please help over this

here is what you need to do :
df = df[df['time'].between('9:10','9:20')].groupby('date')['Open'].max()

Input data
The data you used is:
import pandas as pd
df = pd.DataFrame({"date":[20211201,20211201,20211201,20211201,20211201,20211202,20211202,20211202,20211202],\
"time":["9:08","9:16","9:17","9:18","9:19","13:08","13:09","13:10","13:11"],\
"Open":[17104.4,17105.05,171587.75,17175.2,17168.6,17311.95,17316.5,17322.55,17325.9]})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
Solution
You can slice the DataFrame as follows:
import datetime
df1 = df[df.index==datetime.datetime(2021,12,1)]
df2 = df[df.index==datetime.datetime(2021,12,2)]
Output
Then the outputs you would obtain are:
>>> df1
time Open
date
2021-12-01 9:08 17104.40
2021-12-01 9:16 17105.05
2021-12-01 9:17 171587.75
2021-12-01 9:18 17175.20
2021-12-01 9:19 17168.60
>>> df2
time Open
date
2021-12-02 13:08 17311.95
2021-12-02 13:09 17316.50
2021-12-02 13:10 17322.55
2021-12-02 13:11 17325.90

How to remove hours, minutes, seconds and UTC offset from pandas date column? I'm running with streamlit and pandas

How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?

If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02

Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]

Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)

Problem converting column with date info as object to datetime

I've a column with birth dates as object, the problem is when I tried to convert it into datetime, because it displays always the next warning
time data '27126' does not match format '%d/%m/%Y' (match)
date
0 05/06/1980
1 31/07/1947
2 07/01/1963
3 26/03/1973
4 30/01/1991
5 12/12/1991
6 13/08/1987
7 10/01/1944
8 23/06/1965
9 08/10/1995
till now I've tried the next codes:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df['date'] = df['date'].apply(lambda x: datetime.datetime.strptime(x, "%d/%m/%Y").strftime("%Y-%m-%d"))
df['date'] = pd.to_datetime(df['date'].str.strip(), format='%d/%m/%Y')

Add parameter errors='coerce' for convert non matched datetimes to missing values, here NaT:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')

Convert one column to standard date format in Python

For a date column I have data like this: 19.01.01, which means 2019-01-01. Is there a method to change the format from the former to the latter?
My idea is to add 20 to the start of date and replace . with -. Are there better ways to do that?
Thanks.

If format is YY.DD.MM use %y.%d.%m, if format is YY.MM.DD use %y.%m.%d in to_datetime:
df = pd.DataFrame({'date':['19.01.01','19.01.02']})
#YY.DD.MM
df['date'] = pd.to_datetime(df['date'], format='%y.%d.%m')
print (df)
date
0 2019-01-01
1 2019-02-01
#YY.MM.DD
df['date'] = pd.to_datetime(df['date'], format='%y.%m.%d')
print (df)
date
0 2019-01-01
1 2019-01-02

Pandas - Different time formats in the same column

I have a Dataframe that has dates stored in different formats in the same column as shown below:
date
1-10-2018
2-10-2018
3-Oct-2018
4-10-2018
Is there anyway I could make all of them to have the same format.

Use to_datetime with specify formats with errors='coerce' for replace not matched values to NaNs. Last combine_first for replace missing values by date2 Series.
date1 = pd.to_datetime(df['date'], format='%d-%m-%Y', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce')
df['date'] = date1.combine_first(date2)
print (df)
date
0 2018-10-01
1 2018-10-02
2 2018-10-03
3 2018-10-04

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract multiple parts of values of a single column? - python

You can use pd.to_datetime to convert date column to datetime then use pd.Series.dt.strftime. s = pd.to_datetime(df['date']) df['date'] = s.dt.strftime("%Y%m") # would give 202010 # or # df['date'] = s.dt.strftime("%y%m") # would give 2010

date --> your date column df['date'] = pd.to_datetime(df['date']) df['date'] = df['date'].apply(lambda x: x.strftime('%Y-%m'))

Related

I want to slice the data in pandas based on date time

How to remove hours, minutes, seconds and UTC offset from pandas date column? I'm running with streamlit and pandas

Problem converting column with date info as object to datetime

Convert one column to standard date format in Python

Pandas - Different time formats in the same column

Categories

Resources