I want to slice the data in pandas based on date time - python

I am trying to slice the data based on the date.
If I know what date , I know how to do the slicing. In my case I will NOT the date stamp.
So based on date , I want to do slicing to do my further operation on the data
Please refer to the example for data. Here date column can have a date of any day. I want slice the data.
First slice will be for date : 20211201
Second slice will be for date : 20211202
I am able to covert column into date time format as below
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
please help over this

here is what you need to do :
df = df[df['time'].between('9:10','9:20')].groupby('date')['Open'].max()

Input data
The data you used is:
import pandas as pd
df = pd.DataFrame({"date":[20211201,20211201,20211201,20211201,20211201,20211202,20211202,20211202,20211202],\
"time":["9:08","9:16","9:17","9:18","9:19","13:08","13:09","13:10","13:11"],\
"Open":[17104.4,17105.05,171587.75,17175.2,17168.6,17311.95,17316.5,17322.55,17325.9]})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
Solution
You can slice the DataFrame as follows:
import datetime
df1 = df[df.index==datetime.datetime(2021,12,1)]
df2 = df[df.index==datetime.datetime(2021,12,2)]
Output
Then the outputs you would obtain are:
>>> df1
time Open
date
2021-12-01 9:08 17104.40
2021-12-01 9:16 17105.05
2021-12-01 9:17 171587.75
2021-12-01 9:18 17175.20
2021-12-01 9:19 17168.60
>>> df2
time Open
date
2021-12-02 13:08 17311.95
2021-12-02 13:09 17316.50
2021-12-02 13:10 17322.55
2021-12-02 13:11 17325.90

Related

How to remove hours, minutes, seconds and UTC offset from pandas date column? I'm running with streamlit and pandas

How to remove T00:00:00+05:30 after year, month and date values in pandas? I tried converting the column into datetime but also it's showing the same results, I'm using pandas in streamlit. I tried the below code
df['Date'] = pd.to_datetime(df['Date'])
The output is same as below :
Date
2019-07-01T00:00:00+05:30
2019-07-01T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-02T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-03T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-04T00:00:00+05:30
2019-07-05T00:00:00+05:30
Can anyone help me how to remove T00:00:00+05:30 from the above rows?
If I understand correctly, you want to keep only the date part.
Convert date strings to datetime
df = pd.DataFrame(
columns={'date'},
data=["2019-07-01T02:00:00+05:30", "2019-07-02T01:00:00+05:30"]
)
date
0 2019-07-01T02:00:00+05:30
1 2019-07-02T01:00:00+05:30
2 2019-07-03T03:00:00+05:30
df['date'] = pd.to_datetime(df['date'])
date
0 2019-07-01 02:00:00+05:30
1 2019-07-02 01:00:00+05:30
Remove the timezone
df['datetime'] = df['datetime'].dt.tz_localize(None)
date
0 2019-07-01 02:00:00
1 2019-07-02 01:00:00
Keep the date only
df['date'] = df['date'].dt.date
0 2019-07-01
1 2019-07-02
Don't bother with apply to Python dates or string changes. The former will leave you with an object type column and the latter is slow. Just round to the day frequency using the library function.
>>> pd.Series([pd.Timestamp('2000-01-05 12:01')]).dt.round('D')
0 2000-01-06
dtype: datetime64[ns]
If you have a timezone aware timestamp, convert to UTC with no time zone then round:
>>> pd.Series([pd.Timestamp('2019-07-01T00:00:00+05:30')]).dt.tz_convert(None) \
.dt.round('D')
0 2019-07-01
dtype: datetime64[ns]
Pandas doesn't have a builtin conversion to datetime.date, but you could use .apply to achieve this if you want to have date objects instead of string:
import pandas as pd
import datetime
df = pd.DataFrame(
{"date": [
"2019-07-01T00:00:00+05:30",
"2019-07-01T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-02T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-03T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-04T00:00:00+05:30",
"2019-07-05T00:00:00+05:30"]})
df["date"] = df["date"].apply(lambda x: datetime.datetime.fromisoformat(x).date())
print(df)

Dates go crazy when applying pd.to_datetime

I have this situation in which I have a DataFrame with a string column with some values with this format:
DD/MM/YYYY
and some with this other one:
DD/MM/YYYY HH:Mi:SS
If I try to convert everything to datetime like this
df['COLUMN'] = pd.to_datetime(df['COLUMN'])
The rows without the HH:Mi:SS go crazy and the months are interpreted as days (and viceversa).
How could avoid this and have a column with just date format?
Example of column which goes crazy:
Before conversion:
DateTime
--------
02/07/2021
15/07/2021 18:16:00
After conversion:
DateTime
2021-02-07 (This is February!!)
2021-07-15 18:16:00
Pandas to_datetime has an inbuild parameter to specify if your day is first. i.e. dayfirst
You can use it as :
df['COLUMN'] = pd.to_datetime(df['COLUMN'], dayfirst=True)
Checkout the documentation for more info.
I believe the following achieves the desired output (may not be the fastest way)
import pandas as pd
df = pd.DataFrame({'date': ['15/07/2021 18:16:00', '02/07/2021']})
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce').fillna(pd.to_datetime(df['date'], format="%d/%m/%Y %H:%M:%S", errors="coerce"))
print(df.head())
for date in df['date']:
print(type(date))
Output:
date
0 2021-07-15 18:16:00
1 2021-07-02 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Python, how to replace a column of dataframe with isoformat date with simple date?

I have a dataframe "df" in Python where one column is the date represented in isoformat "2017-01-01T12:30:59.000000".
df['date']
Out[1]:
0 2020-02-24T18:00:00
1 2020-02-24T18:00:00
2 2020-02-24T18:00:00
Is there a single command to replace the entire column in simple date like ?
df['date']
Out[1]:
0 2020-02-24
1 2020-02-24
2 2020-02-24
Cast your string date to actual datetime and then get only the date from .dt
import pandas as pd
df = pd.DataFrame({"date": ["2020-02-24T18:00:00","2020-02-24T18:00:00","2020-02-24T18:00:00"]})
df["date"] = pd.to_datetime(df["date"]).dt.date

How to extract multiple parts of values of a single column?

I have a date column of the format YYYY-MM-DD. I want to slice the only year and month from it. But I don't want the "-" as I have to later convert it into an integer to feed into my linear regression model.
It's current datatype is "object".
Dataframe :-
date open close high low
0 2019-10-08 56.46 56.10 57.02 56.08
1 2019-10-09 56.76 56.76 56.95 56.41
2 2019-10-10 56.98 57.52 57.61 56.83
3 2019-10-11 58.24 59.05 59.41 58.08
4 2019-10-14 58.73 58.97 59.53 58.67
You can use pd.to_datetime to convert date column to datetime then use pd.Series.dt.strftime.
s = pd.to_datetime(df['date'])
df['date'] = s.dt.strftime("%Y%m") # would give 202010
# or
# df['date'] = s.dt.strftime("%y%m") # would give 2010
date --> your date column
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(lambda x: x.strftime('%Y-%m'))

Cannot remove timestamp in datetime

I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.

Categories

Resources