Aggregate prices and volume hourly pandas python - python

I need quick help. how can I resample data in this data frame from 1min candles to 1 hour
i don't want to sum the price maybe choose the highest or lowest, only the volume column I need to sum.
this is the dataframe in csv https://drive.google.com/file/d/1yTd0TB6Pp9obg4iyCWzeFIg3lin9tYVc/view?usp=sharing

Try this:
df = pd.read_csv('BTC.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df = df.resample('1H').agg({'Close': 'min', 'Volume': 'sum'})
print(df)
Close Volume
Date
2020-06-06 15:00:00 9650.39 201.0
2020-06-06 16:00:00 9593.09 1616.0
2020-06-06 17:00:00 9595.00 1140.0
2020-06-06 18:00:00 9606.57 642.0
2020-06-06 19:00:00 9614.44 1015.0
2020-06-06 20:00:00 9647.68 1293.0
2020-06-06 21:00:00 9678.52 1293.0
2020-06-06 22:00:00 9635.49 1021.0
2020-06-06 23:00:00 9644.18 1118.0
2020-06-07 00:00:00 9629.88 801.0
2020-06-07 01:00:00 9647.38 541.0
2020-06-07 02:00:00 9654.82 1034.0
2020-06-07 03:00:00 9671.70 710.0
2020-06-07 04:00:00 9677.98 1264.0
2020-06-07 05:00:00 9659.31 798.0
2020-06-07 06:00:00 9656.76 886.0
2020-06-07 07:00:00 9639.48 1769.0
2020-06-07 08:00:00 9599.25 3190.0
2020-06-07 09:00:00 9623.41 1332.0
2020-06-07 10:00:00 9610.64 1018.0
2020-06-07 11:00:00 9575.59 1812.0
2020-06-07 12:00:00 9499.99 5431.0
2020-06-07 13:00:00 9446.98 4372.0
2020-06-07 14:00:00 9426.07 5999.0
2020-06-07 15:00:00 9463.05 1097.0

Related

Add hours to year-month-day data in pandas data frame

I have the following data frame with hourly resolution
day_ahead_DK1
Out[27]:
DateStamp DK1
0 2017-01-01 20.96
1 2017-01-01 20.90
2 2017-01-01 18.13
3 2017-01-01 16.03
4 2017-01-01 16.43
... ...
8756 2017-12-31 25.56
8757 2017-12-31 11.02
8758 2017-12-31 7.32
8759 2017-12-31 1.86
type(day_ahead_DK1)
Out[28]: pandas.core.frame.DataFrame
But the current column DateStamp is missing hours. How can I add hours 00:00:00, to 2017-01-01 for Index 0 so it will be 2017-01-01 00:00:00, and then 01:00:00, to 2017-01-01 for Index 1 so it will be 2017-01-01 01:00:00, and so on, so that all my days will have hours from 0 to 23. Thank you!
The expected output:
day_ahead_DK1
Out[27]:
DateStamp DK1
0 2017-01-01 00:00:00 20.96
1 2017-01-01 01:00:00 20.90
2 2017-01-01 02:00:00 18.13
3 2017-01-01 03:00:00 16.03
4 2017-01-01 04:00:00 16.43
... ...
8756 2017-12-31 20:00:00 25.56
8757 2017-12-31 21:00:00 11.02
8758 2017-12-31 22:00:00 7.32
8759 2017-12-31 23:00:00 1.86
Use GroupBy.cumcount for counter with to_timedelta for hours and add to DateStamp column:
df['DateStamp'] = pd.to_datetime(df['DateStamp'])
df['DateStamp'] += pd.to_timedelta(df.groupby('DateStamp').cumcount(), unit='H')
print (df)
DateStamp DK1
0 2017-01-01 00:00:00 20.96
1 2017-01-01 01:00:00 20.90
2 2017-01-01 02:00:00 18.13
3 2017-01-01 03:00:00 16.03
4 2017-01-01 04:00:00 16.43
8756 2017-12-31 00:00:00 25.56
8757 2017-12-31 01:00:00 11.02
8758 2017-12-31 02:00:00 7.32
8759 2017-12-31 03:00:00 1.86

How to extract rows between 2 times with Pandas?

I want to make sub dataframes out of one dataframe, using its datetime index. For example, if I want to extract rows between 07:00~06:00 and make new dataframes:
import pandas as pd
int_rows = 24
str_freq = '180min'
i = pd.date_range('2018-04-09', periods=int_rows, freq=str_freq)
df = pd.DataFrame({'A': [i for i in range(int_rows)]}, index=i)
>>> df
A
2018-04-09 00:00:00 0
2018-04-09 03:00:00 1
2018-04-09 06:00:00 2
2018-04-09 09:00:00 3
2018-04-09 12:00:00 4
2018-04-09 15:00:00 5
2018-04-09 18:00:00 6
2018-04-09 21:00:00 7
2018-04-10 00:00:00 8
2018-04-10 03:00:00 9
2018-04-10 06:00:00 10
2018-04-10 09:00:00 11
2018-04-10 12:00:00 12
2018-04-10 15:00:00 13
2018-04-10 18:00:00 14
2018-04-10 21:00:00 15
2018-04-11 00:00:00 16
2018-04-11 03:00:00 17
2018-04-11 06:00:00 18
2018-04-11 09:00:00 19
2018-04-11 12:00:00 20
2018-04-11 15:00:00 21
2018-04-11 18:00:00 22
2018-04-11 21:00:00 23
# new dataframes that I want
A
2018-04-09 00:00:00 0
2018-04-09 03:00:00 1
A
2018-04-09 06:00:00 2
2018-04-09 09:00:00 3
2018-04-09 12:00:00 4
2018-04-09 15:00:00 5
2018-04-09 18:00:00 6
2018-04-09 21:00:00 7
2018-04-10 00:00:00 8
2018-04-10 03:00:00 9
A
2018-04-10 06:00:00 10
2018-04-10 09:00:00 11
2018-04-10 12:00:00 12
2018-04-10 15:00:00 13
2018-04-10 18:00:00 14
2018-04-10 21:00:00 15
2018-04-11 00:00:00 16
2018-04-11 03:00:00 17
A
2018-04-11 06:00:00 18
2018-04-11 09:00:00 19
2018-04-11 12:00:00 20
2018-04-11 15:00:00 21
2018-04-11 18:00:00 22
2018-04-11 21:00:00 23
I found between_time method, but it doesn't care about dates. I could iterate over the original dataframe and check each date and time, but I think it's going to be inefficient. Are there any simple ways to do this?
You can 'shift' the timestamp by 6 hours and group by day:
for k, d in df.groupby((df.index - pd.to_timedelta('6:00:00')).normalize()):
print(d); print()
Output:
A
2018-04-09 00:00:00 0
2018-04-09 03:00:00 1
A
2018-04-09 06:00:00 2
2018-04-09 09:00:00 3
2018-04-09 12:00:00 4
2018-04-09 15:00:00 5
2018-04-09 18:00:00 6
2018-04-09 21:00:00 7
2018-04-10 00:00:00 8
2018-04-10 03:00:00 9
A
2018-04-10 06:00:00 10
2018-04-10 09:00:00 11
2018-04-10 12:00:00 12
2018-04-10 15:00:00 13
2018-04-10 18:00:00 14
2018-04-10 21:00:00 15
2018-04-11 00:00:00 16
2018-04-11 03:00:00 17
A
2018-04-11 06:00:00 18
2018-04-11 09:00:00 19
2018-04-11 12:00:00 20
2018-04-11 15:00:00 21
2018-04-11 18:00:00 22
2018-04-11 21:00:00 23

TTR:MACD gives different result than I get in Python and chart

I have following stock price data in hand:
2017-06-15 10:00:00 958.4334
2017-06-15 11:00:00 955.7800
2017-06-15 12:00:00 958.2800
2017-06-15 13:00:00 959.2200
2017-06-15 14:00:00 962.4900
2017-06-15 15:00:00 964.0000
2017-06-15 15:59:00 963.3500
2017-06-16 09:00:00 997.3500
2017-06-16 10:00:00 995.0000
2017-06-16 11:00:00 992.7600
2017-06-16 12:00:00 990.7200
2017-06-16 13:00:00 994.6800
2017-06-16 14:00:00 996.0500
2017-06-16 15:00:00 987.6100
2017-06-16 15:59:00 987.5000
2017-06-19 09:00:00 999.1700
2017-06-19 10:00:00 1001.2700
2017-06-19 11:00:00 995.5200
2017-06-19 12:00:00 994.3350
2017-06-19 13:00:00 995.2199
2017-06-19 14:00:00 990.9221
2017-06-19 15:00:00 995.1300
2017-06-19 15:59:00 994.3400
2017-06-20 09:00:00 995.5200
2017-06-20 10:00:00 1003.5100
2017-06-20 11:00:00 998.8129
2017-06-20 12:00:00 996.2800
2017-06-20 13:00:00 997.2100
2017-06-20 14:00:00 998.0000
2017-06-20 15:00:00 992.5800
2017-06-20 15:59:00 992.8000
2017-06-21 09:00:00 993.9500
2017-06-21 10:00:00 995.2700
2017-06-21 11:00:00 996.4000
2017-06-21 12:00:00 994.2800
2017-06-21 13:00:00 996.1000
2017-06-21 14:00:00 998.7450
2017-06-21 15:00:00 1001.7900
2017-06-21 15:59:00 1002.9800
2017-06-22 09:00:00 1001.4100
2017-06-22 10:00:00 1004.0700
2017-06-22 11:00:00 1003.1500
2017-06-22 12:00:00 1003.4800
2017-06-22 13:00:00 1003.1600
2017-06-22 14:00:00 1003.1800
2017-06-22 15:00:00 1001.3900
2017-06-22 15:59:00 1001.5600
2017-06-23 09:00:00 999.8699
2017-06-23 10:00:00 1001.5800
2017-06-23 11:00:00 1001.0700
2017-06-23 12:00:00 1002.9800
2017-06-23 13:00:00 1003.2400
2017-06-23 14:00:00 1002.4300
2017-06-23 15:00:00 1003.7400
2017-06-23 15:59:00 1003.0500
2017-06-26 09:00:00 1006.2000
2017-06-26 10:00:00 997.3500
2017-06-26 11:00:00 999.3300
2017-06-26 12:00:00 999.1000
2017-06-26 13:00:00 997.0600
2017-06-26 14:00:00 995.8336
2017-06-26 15:00:00 993.9900
2017-06-26 15:59:00 993.5500
2017-06-27 09:00:00 992.7550
2017-06-27 10:00:00 993.7600
2017-06-27 11:00:00 990.6700
2017-06-27 12:00:00 986.5500
2017-06-27 13:00:00 981.1099
2017-06-27 14:00:00 982.5499
2017-06-27 15:00:00 977.4100
2017-06-27 15:59:00 976.7800
2017-06-28 09:00:00 971.4600
2017-06-28 10:00:00 982.5200
2017-06-28 11:00:00 980.9100
2017-06-28 12:00:00 986.4372
2017-06-28 13:00:00 987.6710
2017-06-28 14:00:00 986.7977
2017-06-28 15:00:00 990.0300
2017-06-28 15:59:00 991.0000
2017-06-29 09:00:00 982.5200
2017-06-29 10:00:00 977.7710
2017-06-29 11:00:00 972.6600
2017-06-29 12:00:00 970.3100
2017-06-29 13:00:00 969.1600
2017-06-29 14:00:00 973.4720
2017-06-29 15:00:00 975.9100
2017-06-29 15:59:00 975.3100
2017-06-30 09:00:00 977.5800
2017-06-30 10:00:00 978.6400
2017-06-30 11:00:00 978.7299
2017-06-30 12:00:00 974.9700
2017-06-30 13:00:00 975.7700
2017-06-30 14:00:00 975.7000
2017-06-30 15:00:00 968.0000
2017-06-30 15:59:00 969.0000
I was trying to calculate the MACD using TTR::MACD as following (given above dataframe is called amz.xts):
macd -> MACD(amz.xts, nFast = 20, nSlow = 40, nSig = 10, maType = 'EMA')
the result was a series of decimal numbers which are mainly between 0.0 ~ 1.5
Whereas when I used python wrapper of Talib to do the same thing, the result was between 0.0 ~ 25.0 and vast majority was between 10.0 ~ 20.0
and this was also the same data shown in my charting software for trading.
python code:
import talib as ta
# m is macd
# s is signal
# h is histogram
m,s,h = ta.MACD(data, fastperiod=20, slowperiod=40, signalperiod=10)
I wouldn't doubt the trading software was wrong and given python gave the same result, I prefer to say TTR::MACD was doing something different. I also think the result from python and trading software made sense because the price is really high. (above $900 per share).
Am I doing something wrong or it is just they use different algorithm? (which I highly doubt.)
I haven't checked the python function, but TTR::MACD() is definitely correct. Maybe it is that percent=FALSE argument?
library(TTR)
xx <- rep(c(1, rep(0, 49)), 4)
fast <- 20
slow <- 40
sig <- 10
macd <- MACD(xx, fast, slow, sig, maType="EMA", percent=FALSE)
macd2 <- EMA(xx, fast) - EMA(xx, slow)
macd2 <- cbind(macd2, EMA(macd2, sig))
par(mar=c(2, 2, 1, 1))
matplot(macd[-1:-40, ], type="l", lty=1, lwd=1.5)
matlines(macd2[-1:-40, ], type="l", lty=3, lwd=3, col=c("green", "blue"))

iterate over days (pandas)

How can I iterate over days in the dataframe in pandas?
Example:
My dataframe:
time consumption
time
2016-10-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-10-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
Pseudocode:
for day in df:
print day
First iteration return:
time consumption
time
2016-10-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-10-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
Second iteration return:
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
Third iteration return :
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
Use groupby by date, what is a bit different as day:
#groupby by index date
for idx, day in df.groupby(df.index.date):
print (day)
time consumption
time
2016-10-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-10-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
time consumption
time
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
time consumption
time
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
Or:
#groupby by column time
for idx, day in df.groupby(df.time.dt.date):
print (day)
time consumption
time
2016-10-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-10-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
time consumption
time
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
time consumption
time
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
Differences can check in first 2 rows are changed with different month:
for idx, day in df.groupby(df.index.day):
print (day)
time consumption
time
2016-09-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-09-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
time consumption
time
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
time consumption
time
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
for idx, day in df.groupby(df.index.date):
print (day)
time consumption
time
2016-09-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-09-17 10:00:00 2016-10-17 10:00:00 2135.966666
time consumption
time
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
time consumption
time
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
time consumption
time
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666

Pandas filtering values in dataframe

I have this dataframe. The columns represent the highs and the lows in daily EURUSD price:
df.low df.high
2013-01-17 16:00:00 1.33394 2013-01-17 20:00:00 1.33874
2013-01-18 18:00:00 1.32805 2013-01-18 09:00:00 1.33983
2013-01-21 00:00:00 1.32962 2013-01-21 09:00:00 1.33321
2013-01-22 11:00:00 1.32667 2013-01-22 09:00:00 1.33715
2013-01-23 17:00:00 1.32645 2013-01-23 14:00:00 1.33545
2013-01-24 10:00:00 1.32860 2013-01-24 18:00:00 1.33926
2013-01-25 04:00:00 1.33497 2013-01-25 17:00:00 1.34783
2013-01-28 10:00:00 1.34246 2013-01-28 16:00:00 1.34771
2013-01-29 13:00:00 1.34143 2013-01-29 21:00:00 1.34972
2013-01-30 08:00:00 1.34820 2013-01-30 21:00:00 1.35873
2013-01-31 13:00:00 1.35411 2013-01-31 17:00:00 1.35944
I summed them up into a third column (df.extremes).
df.extremes
2013-01-17 16:00:00 1.33394
2013-01-17 20:00:00 1.33874
2013-01-18 18:00:00 1.32805
2013-01-18 09:00:00 1.33983
2013-01-21 00:00:00 1.32962
2013-01-21 09:00:00 1.33321
2013-01-22 09:00:00 1.33715
2013-01-22 11:00:00 1.32667
2013-01-23 14:00:00 1.33545
2013-01-23 17:00:00 1.32645
2013-01-24 10:00:00 1.32860
2013-01-24 18:00:00 1.33926
2013-01-25 04:00:00 1.33497
2013-01-25 17:00:00 1.34783
2013-01-28 10:00:00 1.34246
2013-01-28 16:00:00 1.34771
2013-01-29 13:00:00 1.34143
2013-01-29 21:00:00 1.34972
2013-01-30 08:00:00 1.34820
2013-01-30 21:00:00 1.35873
2013-01-31 13:00:00 1.35411
2013-01-31 17:00:00 1.35944
But now i want to filter some values from df.extremes.
To explain what to filter i try with this "pseudocode":
IF following the index we move from: previous df.low --> df.low --> df.high:
IF df.low > previous df.low: delete df.low
IF df.low < previous df.low: delete previous df.low
If i try to work this out with a for loop, it gives me a KeyError: 1.3339399999999999.
day = df.groupby(pd.TimeGrouper('D'))
is_day_min = day.extremes.apply(lambda x: x == x.min())
for i in df.extremes:
if is_day_min[i] == True and is_day_min[i+1] == True:
if df.extremes[i] > df.extremes[i+1]:
del df.extremes[i]
for i in df.extremes:
if is_day_min[i] == True and is_day_min[i+1] == True:
if df.extremes[i] < df.extremes[i+1]:
del df.extremes[i+1]
How to filter/delete the values as i explained in pseudocode?
I am struggling with indexing and bools but i can't solve this. I strongly suspect that i need to use a lambda function, but i don't know how to apply it. So please have mercy it's too long that i'm trying on this. Hope i've been clear enough.
All you're really missing is a way of saying "previous low" in a vectorized fashion. That's spelled df['low'].shift(-1). Once you have that it's just:
prev = df.low.shift(-1)
filtered_df = df[~((df.low > prev) | (df.low < prev))]

Categories

Resources