Filter datetime index by date - python

Can you please explain wy this code doesn't work and how to make it work:
index = pd.date_range('1/1/2000', periods=9, freq='T')
dates = index.date
index.loc(dates[0])
I tried other solutions like:
index = pd.date_range('1/1/2000', periods=9, freq='T')
dates = pd.to_datetime(index.date)
index.loc(dates[0])
As you can see, I want to extract one date from datetime object.

When you create index, it is a DatetimeIndex object, that does not have the loc attribute. The DatetimeIndex object does not have indexes as it itself is used as an index. You can access elements just by square brackets as in lists [].
It is not clear what exactly you want to do.
You can use index[0] to access first element, or make list, numpy array, or DaraFrame using .to_list(), .to_numpy(), `.to_frame()' for easyer manipulations.
To extract date from index, just index[0].date() is enough.
Also when you create dates, all dates are the same, as index elements are different from each other only by minutes.
First, create a date_rangeby using 5 hours (so we will have several points in one day). The last line will return all indexes from a specific date. You can define any date and write index.date == your_date to use it.
index = pd.date_range('1/1/2000', periods=100, freq='5H')
dates = index.date
index[index.date == dates[0]]

Related

Python pandas.datetimeindex piecewise dataframe slicing

I have a dataframe with a pandas DatetimeIndex. I need to take many slices from it(for printing a piecewise graph with matplotlib). In other words I need a new DF which would be a subset of the first one.
More precisely I need to take all rows that are between 9 and 16 o'clock but only if they are within a date range. Fortunately I have only one date range and one time range to apply.
What would be a clean way to do that? thanks
The first step is to set the index of the dataframe to the column where you store time. Once the index is based on time, you can subset the dataframe easily.
df['time'] = pd.to_datetime(df['time'], format='%H:%M:%S.%f') # assuming you have a col called 'time'
df['time'] = df['time'].dt.strftime('%H:%M:%S.%f')
df.set_index('time', inplace=True)
new_df = df[startTime:endTime] # startTime and endTime are strings

How to change date format in dataframe

I'm trying to calculate the beta in stock but when I bring in the data it has a time in the date frame how can I drop it?
If you want to transform a datetime object to a date object, you can get the date with the .date on the index, then just reassign it:
Ford_df.index = Ford_df.index.date
If instead you want the index to be a string with your custom format (%Y-%m in this example) then do:
Ford_df.index = Ford_df.index.strftime("%Y-%m")
Both solutions presume your index is a DatetimeIndex. If it is not you can transform it with:
Ford_df.index = pd.to_datetime(Ford_df.index)

How to check if pandas DateTimeIndex dates belong to a list?

I have a pandas datetime index idx with with minute frequency, I also have a list(or set) of dates list_of_dates. I would like to return a boolean array of the same size of idx with the condition that the dates of datetime index belong is in the list_of_dates. Is it possible to do it in a vectorized way (i.e. not using a for loop)?
Since you're trying to compare only the dates, you could remove the times and compare like so:
>>> df.index.normalize().isin(list_of_dates)
Or:
>>> df.index.floor('D').isin(list_of_dates)

Pandas - New Row for Each Day in Date Range

I have a Pandas df with one column (Reservation_Dt_Start) representing the start of a date range and another (Reservation_Dt_End) representing the end of a date range.
Rather than each row having a date range, I'd like to expand each row to have as many records as there are dates in the date range, with each new row representing one of those dates.
See the two pics below for an example input and the desired output.
The code snippet below works!! However, for every 250 rows in the input table, it takes 1 second to run. Given my input table is 120,000,000 rows in size, this code will take about one week to run.
pd.concat([pd.DataFrame({'Book_Dt': row.Book_Dt,
'Day_Of_Reservation': pd.date_range(row.Reservation_Dt_Start, row.Reservation_Dt_End),
'Pickup': row.Pickup,
'Dropoff' : row.Dropoff,
'Price': row.Price},
columns=['Book_Dt','Day_Of_Reservation', 'Pickup', 'Dropoff' , 'Price'])
for i, row in df.iterrows()], ignore_index=True)
There has to be a faster way to do this. Any ideas? Thanks!
pd.concat in a loop with a large dataset gets pretty slow as it will make a copy of the frame each time and return a new dataframe. You are attempting to do this 120m times. I would try to work with this data as a simple list of tuples instead then convert to dataframe at the end.
e.g.
Given a list list = []
For each row in the dataframe:
get list of date range (can use pd.date_range here still) store in variable dates which is a list of dates
for each date in date range, add a tuple to the list list.append((row.Book_Dt, dates[i], row.Pickup, row.Dropoff, row.Price))
Finally you can convert the list of tuples to a dataframe:
df = pd.DataFrame(list, columns = ['Book_Dt', 'Day_Of_Reservation', 'Pickup', 'Dropoff', 'Price'])

Dropping rows from a Dataframe based on Date

How can I drop rows from Dataframe df if the dates associated with df['maturity_dt'] are less that today's date?
I am currently doing the following:
todays_date = datetime.date.today()
datenow = datetime.datetime.combine(todays_date, datetime.datetime.min.time()) #Converting to datetime
for (i,row) in df.iterrows():
if datetime.datetime.strptime(row['maturity_dt'], '%Y-%m-%d %H:%M:%S.%f') < datenow):
df.drop(df.index[i])
However, its taking too long and I was hoping to do something like: df = df[datetime.datetime.strptime(df['maturity_dt'], '%Y-%m-%d %H:%M:%S.%f') < datenow, but this results in the error TypeError: must be str, not Series
Thank You
Haven't tried it but maybe the pandas native functions will iterate faster. Something like:
df['dt']=pandas.Datetimeindex(df['maturity_dt'])
newdf=df.loc[df['dt']<=todays_date].copy()
Instead of parsing the date in each row, you could format your comparison date in the same format as these dates are stored and then you could just do a string comparison.
Also, if there is a way to drop multiple rows in a single call, you could use your loop just to gather the indices of those rows to be dropped, then use that call to drop them in bunches.

Categories

Resources