Pandas slice rolling by day - python

i'm scratiching my head for a pandas slicing problem.
I have a dataframe with a date time column and a time date index (every 15 minutes), i want to create a new column with the same frequency (15min) that contains the max value of another column in the previous day (it will be the same value for each row within the same day).
My dataframe is call klines i know that i can get the date of each row with
klines['date']=klines['timedate'].dt.date
I know I can create a timedelta with timedelta function but I can't figure out how to use the timedate object
I was hoping that something like
klines['max_prev_day']=klines[klines['Close time'].dt.date+dt.timedelta(days = -1):klines['Close time'].dt.date]['value_to_look'].max()
But i'm getting a loc error
raise InvalidIndexError(key)
Any clever input is wellcome !

Related

pandas create a column to compare a value with one week ago

I have a pandas dataframe and the index column is time with hourly precision. I want to create a new column that compares the value of the column "Sales number" at each hour with the same exact time one week ago.
I know that it can be written in using shift function:
df['compare'] = df['Sales'] - df['Sales'].shift(7*24)
But I wonder how can I take advantage of the date_time format of the index. I mean, is there any alternatives to using shift(7*24) when the index is in date_time format?
Try something with
df['Sales'].shift(7,freq='D')

Select hourly data based on days

I have a time series hourly_df, containing some hourly data:
import pandas as pd
import numpy as np
hourly_index = pd.date_range(start='2018-01-01', end='2018-01-07', freq='H')
hourly_data = np.random.rand(hourly_index.shape[0])
hourly_df = pd.DataFrame(hourly_data, index=hourly_index)
and I have a DatetimeIndex, containing some dates (as days as I wish), e.g.
daily_index = pd.to_datetime(['2018-01-01', '2018-01-05', '2018-01-06'])
I want to select each row of hourly_df, which date of its index is in daily_index, so in my case all hourly data from 1st, 5th and 6th January. What is the best way to do this?
If I naively use hourly_df.loc[daily_index], I only get the rows at 0:00:00 for each of the three days. What I want is the hourly data for the whole day for each of the days in daily_index.
One possibility to solve this, is to create a filter that takes the date of each element in the index of hourly_df and compares whether of not this date is in daily_index.
day_filter = [hour.date() in daily_index.date for hour in hourly_df.index]
hourly_df[day_filter]
This produces the desired output, but it seems the usage of the filter is avoidable and can be done in an expression similar to hourly_df.loc[daily_index.date].
save the daily_index as a dataframe
merge on index using hourly_df.merge(daily_index, how = 'inner', ...)

Change date to day + 1 in a pandas dataframe where time = 00:00:00

If you see in the image of my dataframe, I have time points where midnight is a day behind what it should be, which affects my time series graphs.
I tried df.replace() where I passed in lists a and b:
df.replace(to_replace=a,value=b,inplace=True)
This just replaced all values in a with just the same one value in b instead of all the values in the list.
I also tried passing in a dictionary but received:
Value Error: "Replacement not allowed with overlapping keys and values"
Is there any way I can change either the dates in either the date column or the date_time column to day+1 for instances where time is 00:00:00 ?
Maybe using pandas map() method with strftime format?
Maybe you can do something in this context
df.loc[df['time'] == datetime.time(0, 0), 'date'] += datetime.timedelta(days+1)
It selects the rows where the time is 00:00. Only on that rows, you increase the date-column by one day.

What is pandas syntax for lookup based on existing columns + row values?

I'm trying to recreate a bit of a convoluted scenario, but I will do my best to explain it:
Create a pandas df1 with two columns: 'Date' and 'Price' - done
I add two new columns: 'rollmax' and 'rollmin', where the 'rollmax' is an 8 days rolling maximum and 'rollmin' is a
rolling minimum. - done
Now I need to create another column 'rollmax_date' that would get
populated through a look up rule:
for the row n, go to the column 'Price' and parse through the values
for the last 8 days and find the maximum, then get the value of the
corresponding column 'Price' and put this value in the column 'rollingmax_date'.
the same logic for the 'rollingmin_date', but instead of rolling maximum date, we look for the rolling minimum date.
Now I need to find the previous 8 days max and min for the same rolling window of 8 days that I have already found.
I did the first two and tried the third one, but I'm getting wrong results.
The code below gives me only dates where on the same row df["Price"] is the same as df['rollmax'], but it doesn't bring all the corresponding dates from 'Date' to 'rollmax_date'
df['rollmax_date'] = df.loc[(df["Price"] == df.rollmax), 'Date']
This is an image with steps for recreating the lookup

Getting complete datetime index of data frame in miliseconds using groupby method

I have a dataframe (df) with the index as datetime index in "%Y-%m-%d %H:%M:%S.%f" format i.e 2012-06-16 15:53:42.457000.
I am trying to create groups of 1 second using groupby method i.e
x= df.groupby(pd.TimeGrouper('1S'))
time=x.first().index[1]
print time
The problem is using groupby method i am only getting the timestamp in seconds only i.e "2012-06-16 15:53:42" , the milliseconds are excluded. Is there a way to get the complete timestamp?
thank you
I think this is a problem of formatting.
> df.index[0].strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]
'2016-01-01 00:00:00.000'
Documentation here
After spending few hours, i solved it. The functions such as groupby, rolling etc were giving the same problem. These functions uses a general datetime index for grouping which is mathematically the multiples of the defined frequency. There are possibilities to get keyerror if that index is used to get data of dataframe.
To get the complete datetime index (in milliseconds), access the daytime index of members of the individual group which were created by groupby method. i.e
time= x2.first()['timeindex'][0]
where timeindex is the datetime index in my dataframe. [0] is the datetime index of the first member of the group, it can be incremented to get the datetime index of the second, 3rd and so on members of each groups.

Categories

Resources