I have a simple dataframe with typical OHLC values. I want to calculate daily 52 weeks high/low (or other time range) from it and put the result into a dataframe, so that I can track the daily movement of all record high/low.
For example, if the time range is just 3-day, the 3-day high/low would be:
(3-Day High: Maximum 'High' value in the last 3 days)
Out[21]:
Open High Low Close Volume 3-Day-High 3-Day-Low
Date
2015-07-01 273.6 273.6 273.6 273.6 0 273.6 273.6
2015-07-02 276.0 276.0 267.0 268.6 15808300 276.0 267.0
2015-07-03 268.8 269.0 256.6 259.8 20255200 276.0 256.6
2015-07-06 261.0 261.8 223.0 235.0 53285100 276.0 223.0
2015-07-07 237.2 237.8 218.4 222.0 38001700 269.0 218.4
2015-07-08 207.0 219.4 196.0 203.4 48558100 261.8 196.0
2015-07-09 207.4 233.8 204.2 233.6 37835900 237.8 196.0
2015-07-10 235.4 244.8 233.8 239.2 23299900 244.8 196.0
Is there any simple way to do it and how? Thanks guys!
You can use rolling_max and rolling_min:
>>> df["3-Day-High"] = pd.rolling_max(df.High, window=3, min_periods=1)
>>> df["3-Day-Low"] = pd.rolling_min(df.Low, window=3, min_periods=1)
>>> df
Open High Low Close Volume 3-Day-High 3-Day-Low
Date
2015-07-01 273.6 273.6 273.6 273.6 0 273.6 273.6
2015-07-02 276.0 276.0 267.0 268.6 15808300 276.0 267.0
2015-07-03 268.8 269.0 256.6 259.8 20255200 276.0 256.6
2015-07-06 261.0 261.8 223.0 235.0 53285100 276.0 223.0
2015-07-07 237.2 237.8 218.4 222.0 38001700 269.0 218.4
2015-07-08 207.0 219.4 196.0 203.4 48558100 261.8 196.0
2015-07-09 207.4 233.8 204.2 233.6 37835900 237.8 196.0
2015-07-10 235.4 244.8 233.8 239.2 23299900 244.8 196.0
Note that in agreement with your example, this uses the last three recorded days, regardless of the size of any gap between those rows (such as between 07-03 and 07-06).
The above method has been replaced in the latest versions of the python
Use this instead:
Series.rolling(min_periods=1, window=252, center=False).max()
You can try this:
three_days=df.index[-3:]
maxHigh=max(df['High'][three_days])
minLow=min(df['Low'][three_days])
Related
I have the following dataframe, named 'ORDdataM', with a DateTimeIndex column 'date', and a price point column 'ORDprice'. The date column has no timezone associated with it (and is naive) but is actually in 'Australia/ACT'. I want to convert it into 'America/New_York' time.
ORDprice
date
2021-02-23 18:09:00 24.01
2021-02-23 18:14:00 23.91
2021-02-23 18:19:00 23.98
2021-02-23 18:24:00 24.00
2021-02-23 18:29:00 24.04
... ...
2021-02-25 23:44:00 23.92
2021-02-25 23:49:00 23.88
2021-02-25 23:54:00 23.92
2021-02-25 23:59:00 23.91
2021-02-26 00:09:00 23.82
The line below is one that I have played around with quite a bit, but I cannot figure out what is erroneous. The only error message is:
KeyError: 'date'
ORDdataM['date'] = ORDdataM['date'].dt.tz_localize('Australia/ACT').dt.tz_convert('America/New_York')
I have also tried
ORDdataM.date = ORDdataM.date.dt.tz_localize('Australia/ACT').dt.tz_convert('America/New_York')
What is the issue here?
Your date is index not a column, try:
df.index = df.index.tz_localize('Australia/ACT').tz_convert('America/New_York')
df
# ORDprice
#date
#2021-02-23 02:09:00-05:00 24.01
#2021-02-23 02:14:00-05:00 23.91
#2021-02-23 02:19:00-05:00 23.98
#2021-02-23 02:24:00-05:00 24.00
#2021-02-23 02:29:00-05:00 24.04
#2021-02-25 07:44:00-05:00 23.92
#2021-02-25 07:49:00-05:00 23.88
#2021-02-25 07:54:00-05:00 23.92
#2021-02-25 07:59:00-05:00 23.91
#2021-02-25 08:09:00-05:00 23.82
I'm still a newbie to matplotlib. Currently, I have below dataset for plotting:
Date Open High Low Close
Trade_Date
2018-01-02 736696.0 42.45 42.45 41.45 41.45
2018-01-03 736697.0 41.60 41.70 40.70 40.95
2018-01-04 736698.0 40.90 41.05 40.20 40.25
2018-01-05 736699.0 40.35 41.60 40.35 41.50
2018-01-08 736702.0 40.20 40.20 37.95 38.00
2018-01-09 736703.0 37.15 39.00 37.15 38.00
2018-01-10 736704.0 38.70 38.70 37.15 37.25
2018-01-11 736705.0 37.50 37.50 36.55 36.70
2018-01-12 736706.0 37.00 37.40 36.90 37.20
2018-01-15 736709.0 37.50 37.70 37.15 37.70
2018-01-16 736710.0 37.80 38.25 37.45 37.95
2018-01-17 736711.0 38.00 38.05 37.65 37.75
2018-01-18 736712.0 38.00 38.20 37.70 37.75
2018-01-19 736713.0 36.70 37.10 35.30 36.45
2018-01-22 736716.0 36.25 36.25 35.50 36.10
2018-01-23 736717.0 36.20 36.30 35.65 36.00
2018-01-24 736718.0 35.80 36.00 35.60 36.00
2018-01-25 736719.0 36.10 36.10 35.45 35.45
2018-01-26 736720.0 35.50 35.75 35.00 35.00
2018-01-29 736723.0 34.80 35.00 33.65 33.70
2018-01-30 736724.0 33.70 34.45 33.65 33.90
I've converted the date value to number using mdates.date2num
After that, I've tried to plot candlestick graph with codes below:
f1, ax = plt.subplots(figsize= (10,5))
candlestick_ohlc(ax, ohlc.values, width=.6, colorup='red', colordown='green')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.show()
However, I'm still getting the graph with gaps.
I've tried the possible solution from How do I plot only weekdays using Python's matplotlib candlestick?
However, I was not able to solve my problem with the solution above.
Can anyone kindly help me with this issue?
Thanks!
I'm new in python and my English are not so good so i ll try to explain my problem with the example below.
In :ds # is my dataframe
Out :DateStarted DateCompleted DayStarted DayCompleted \
1460 2017-06-12 14:03:32 2017-06-12 14:04:07 2017-06-12 2017-06-12
14445 2017-06-13 13:39:16 2017-06-13 13:40:32 2017-06-13 2017-06-13
14109 2017-06-21 10:25:36 2017-06-21 10:32:17 2017-06-21 2017-06-21
16652 2017-06-27 15:44:28 2017-06-27 15:44:41 2017-06-27 2017-06-27
30062 2017-07-05 09:49:01 2017-07-05 10:04:00 2017-07-05 2017-07-05
22357 2017-08-31 09:06:00 2017-08-31 09:10:31 2017-08-31 2017-08-31
39117 2017-09-08 08:43:07 2017-09-08 08:44:51 2017-09-08 2017-09-08
41903 2017-09-15 12:54:40 2017-09-15 14:00:06 2017-09-15 2017-09-15
74633 2017-09-27 12:41:09 2017-09-27 13:16:04 2017-09-27 2017-09-27
69315 2017-10-23 08:25:28 2017-10-23 08:26:09 2017-10-23 2017-10-23
87508 2017-10-30 12:19:19 2017-10-30 12:19:45 2017-10-30 2017-10-30
86828 2017-11-03 12:20:09 2017-11-03 12:24:56 2017-11-03 2017-11-03
89877 2017-11-06 13:52:05 2017-11-06 13:52:50 2017-11-06 2017-11-06
94970 2017-11-07 08:09:53 2017-11-07 08:10:15 2017-11-07 2017-11-07
94866 2017-11-28 14:38:14 2017-11-30 07:51:04 2017-11-28 2017-11-30
DailyTotalActiveTime diff
1460 NaN 35.0
14445 NaN 76.0
14109 NaN 401.0
16652 NaN 13.0
30062 NaN 899.0
22357 NaN 271.0
39117 NaN 104.0
41903 NaN 3926.0
74633 NaN 2095.0
69315 NaN 41.0
87508 NaN 26.0
86828 NaN 287.0
89877 NaN 45.0
94970 NaN 22.0
94866 NaN 148370.0
In the DailyTotalActiveTime column, i want to calculate how much time,
the specific days, will have in total. The diff column is in seconds.
I tried this, but i had no results:
for i in ds['diff']:
if i <= 86400:
ds['DailyTotalActiveTime']==i
else:
ds['DailyTotalActiveTime']==86400
ds['DailyTotalActiveTime']+1 == i-86400
What can i do? Again, sorry for the explanation..
You should try with = instead of ==
To get you halfway there, you could do something like the following (I am sure there must an a more simple way but I can't see it right now):
df['datestarted'] = pd.to_datetime(df['datestarted'])
df['datecompleted'] = pd.to_datetime(df['datecompleted'])
df['daystarted'] = df['datestarted'].dt.date
df['daycompleted'] = df['datecompleted'].dt.date
df['Date'] = df['daystarted'] # This is the unqiue date per row.
for row in df.itertuples():
if (row.daycompleted - row.daystarted) > pd.Timedelta(days=0):
for i in range(1, (row.daycompleted - row.daystarted).days+1):
df2 = pd.DataFrame([row]).drop('Index', axis=1)
df2['Date'] = df2['Date'] + pd.Timedelta(days=i)
df = df.append(df2)
EDIT: Just when I gave up i found the answer here:
rmlag = lambda xs: np.argmax(xs[::-1])
df['Open'].rolling(window=5).apply(func=rmlag)
I'm wrestling with the following issue: How can i add a column to a DataFrame that, for each row, calculates the number of days (periods) since an n-period high was reached?
Below is a sample DataFrame i'm working with. I've calculated the rolling 5-day high as
df['Rolling 5 Day High'] = df['Open'].rolling(5).max()
How can I calculate, for each row, the number of days since the respective 5-day high was reached? For example, the "Number of Days Since" for the row indexed at 2012-03-16 should be 4 since this row's corresponding rolling 5-day high of 14.88 was reached on 2012-03-12. For the next row at index 2012-03-19, the value should be 3 given this row's rolling 5-day high of 14.79 was reached on 2012-03-14.
Open Rolling 5 Day High
Date
2012-03-12 14.88 NaN
2012-03-13 14.65 NaN
2012-03-14 14.79 NaN
2012-03-15 14.41 NaN
2012-03-16 14.59 14.88
2012-03-19 14.68 14.79
2012-03-20 14.56 14.79
2012-03-21 14.40 14.68
2012-03-22 14.35 14.68
2012-03-23 14.40 14.68
2012-03-26 14.69 14.69
2012-03-27 14.78 14.78
2012-03-28 15.01 15.01
2012-03-29 15.14 15.14
2012-03-30 15.36 15.36
2012-04-02 15.36 15.36
2012-04-03 15.44 15.44
2012-04-04 14.85 15.44
2012-04-05 14.67 15.44
2012-04-09 14.40 15.44
2012-04-10 14.38 15.44
2012-04-11 14.35 14.85
2012-04-12 14.36 14.67
2012-04-13 14.55 14.55
2012-04-16 14.26 14.55
I have this code :
close[close['Datetime'].isin(datefilter)] #Only date in the range
close1='Close' ; start='12/18/2015 00:00:00';
end='3/1/2016 00:00:00'; freq='1d0h00min';
datefilter= pd.date_range(start=start, end=end, freq= freq).values
But, strangely, some columns are given back with Nan:
Datetime ENTA KITE BSTC SAGE AGEN MGNX ESPR FPRX
2015-12-18 31.73 63.38 16.34 56.88 12.24 NaN NaN 38.72
2015-12-21 32.04 63.60 16.26 56.75 12.18 NaN NaN 42.52
Just wondering the reasons, and how can we remedy ?
Original :
Datetime ENTA KITE BSTC SAGE AGEN MGNX ESPR FPRX
0 2013-03-21 17.18 29.0 20.75 30.1 11.52 11.52 38.72
1 2013-03-22 16.81 30.53 21.25 30.0 11.64 11.52 39.42
2 2013-03-25 16.83 32.15 20.8 27.59 11.7 11.52 42.52
3 2013-03-26 17.09 29.55 20.6 27.5 11.76 11.52 11.52
EDIT:
it seems related to the datetime hh:mm:ss filtering.