I have following stock price data in hand:
2017-06-15 10:00:00 958.4334
2017-06-15 11:00:00 955.7800
2017-06-15 12:00:00 958.2800
2017-06-15 13:00:00 959.2200
2017-06-15 14:00:00 962.4900
2017-06-15 15:00:00 964.0000
2017-06-15 15:59:00 963.3500
2017-06-16 09:00:00 997.3500
2017-06-16 10:00:00 995.0000
2017-06-16 11:00:00 992.7600
2017-06-16 12:00:00 990.7200
2017-06-16 13:00:00 994.6800
2017-06-16 14:00:00 996.0500
2017-06-16 15:00:00 987.6100
2017-06-16 15:59:00 987.5000
2017-06-19 09:00:00 999.1700
2017-06-19 10:00:00 1001.2700
2017-06-19 11:00:00 995.5200
2017-06-19 12:00:00 994.3350
2017-06-19 13:00:00 995.2199
2017-06-19 14:00:00 990.9221
2017-06-19 15:00:00 995.1300
2017-06-19 15:59:00 994.3400
2017-06-20 09:00:00 995.5200
2017-06-20 10:00:00 1003.5100
2017-06-20 11:00:00 998.8129
2017-06-20 12:00:00 996.2800
2017-06-20 13:00:00 997.2100
2017-06-20 14:00:00 998.0000
2017-06-20 15:00:00 992.5800
2017-06-20 15:59:00 992.8000
2017-06-21 09:00:00 993.9500
2017-06-21 10:00:00 995.2700
2017-06-21 11:00:00 996.4000
2017-06-21 12:00:00 994.2800
2017-06-21 13:00:00 996.1000
2017-06-21 14:00:00 998.7450
2017-06-21 15:00:00 1001.7900
2017-06-21 15:59:00 1002.9800
2017-06-22 09:00:00 1001.4100
2017-06-22 10:00:00 1004.0700
2017-06-22 11:00:00 1003.1500
2017-06-22 12:00:00 1003.4800
2017-06-22 13:00:00 1003.1600
2017-06-22 14:00:00 1003.1800
2017-06-22 15:00:00 1001.3900
2017-06-22 15:59:00 1001.5600
2017-06-23 09:00:00 999.8699
2017-06-23 10:00:00 1001.5800
2017-06-23 11:00:00 1001.0700
2017-06-23 12:00:00 1002.9800
2017-06-23 13:00:00 1003.2400
2017-06-23 14:00:00 1002.4300
2017-06-23 15:00:00 1003.7400
2017-06-23 15:59:00 1003.0500
2017-06-26 09:00:00 1006.2000
2017-06-26 10:00:00 997.3500
2017-06-26 11:00:00 999.3300
2017-06-26 12:00:00 999.1000
2017-06-26 13:00:00 997.0600
2017-06-26 14:00:00 995.8336
2017-06-26 15:00:00 993.9900
2017-06-26 15:59:00 993.5500
2017-06-27 09:00:00 992.7550
2017-06-27 10:00:00 993.7600
2017-06-27 11:00:00 990.6700
2017-06-27 12:00:00 986.5500
2017-06-27 13:00:00 981.1099
2017-06-27 14:00:00 982.5499
2017-06-27 15:00:00 977.4100
2017-06-27 15:59:00 976.7800
2017-06-28 09:00:00 971.4600
2017-06-28 10:00:00 982.5200
2017-06-28 11:00:00 980.9100
2017-06-28 12:00:00 986.4372
2017-06-28 13:00:00 987.6710
2017-06-28 14:00:00 986.7977
2017-06-28 15:00:00 990.0300
2017-06-28 15:59:00 991.0000
2017-06-29 09:00:00 982.5200
2017-06-29 10:00:00 977.7710
2017-06-29 11:00:00 972.6600
2017-06-29 12:00:00 970.3100
2017-06-29 13:00:00 969.1600
2017-06-29 14:00:00 973.4720
2017-06-29 15:00:00 975.9100
2017-06-29 15:59:00 975.3100
2017-06-30 09:00:00 977.5800
2017-06-30 10:00:00 978.6400
2017-06-30 11:00:00 978.7299
2017-06-30 12:00:00 974.9700
2017-06-30 13:00:00 975.7700
2017-06-30 14:00:00 975.7000
2017-06-30 15:00:00 968.0000
2017-06-30 15:59:00 969.0000
I was trying to calculate the MACD using TTR::MACD as following (given above dataframe is called amz.xts):
macd -> MACD(amz.xts, nFast = 20, nSlow = 40, nSig = 10, maType = 'EMA')
the result was a series of decimal numbers which are mainly between 0.0 ~ 1.5
Whereas when I used python wrapper of Talib to do the same thing, the result was between 0.0 ~ 25.0 and vast majority was between 10.0 ~ 20.0
and this was also the same data shown in my charting software for trading.
python code:
import talib as ta
# m is macd
# s is signal
# h is histogram
m,s,h = ta.MACD(data, fastperiod=20, slowperiod=40, signalperiod=10)
I wouldn't doubt the trading software was wrong and given python gave the same result, I prefer to say TTR::MACD was doing something different. I also think the result from python and trading software made sense because the price is really high. (above $900 per share).
Am I doing something wrong or it is just they use different algorithm? (which I highly doubt.)
I haven't checked the python function, but TTR::MACD() is definitely correct. Maybe it is that percent=FALSE argument?
library(TTR)
xx <- rep(c(1, rep(0, 49)), 4)
fast <- 20
slow <- 40
sig <- 10
macd <- MACD(xx, fast, slow, sig, maType="EMA", percent=FALSE)
macd2 <- EMA(xx, fast) - EMA(xx, slow)
macd2 <- cbind(macd2, EMA(macd2, sig))
par(mar=c(2, 2, 1, 1))
matplot(macd[-1:-40, ], type="l", lty=1, lwd=1.5)
matlines(macd2[-1:-40, ], type="l", lty=3, lwd=3, col=c("green", "blue"))
Related
I need quick help. how can I resample data in this data frame from 1min candles to 1 hour
i don't want to sum the price maybe choose the highest or lowest, only the volume column I need to sum.
this is the dataframe in csv https://drive.google.com/file/d/1yTd0TB6Pp9obg4iyCWzeFIg3lin9tYVc/view?usp=sharing
Try this:
df = pd.read_csv('BTC.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df = df.resample('1H').agg({'Close': 'min', 'Volume': 'sum'})
print(df)
Close Volume
Date
2020-06-06 15:00:00 9650.39 201.0
2020-06-06 16:00:00 9593.09 1616.0
2020-06-06 17:00:00 9595.00 1140.0
2020-06-06 18:00:00 9606.57 642.0
2020-06-06 19:00:00 9614.44 1015.0
2020-06-06 20:00:00 9647.68 1293.0
2020-06-06 21:00:00 9678.52 1293.0
2020-06-06 22:00:00 9635.49 1021.0
2020-06-06 23:00:00 9644.18 1118.0
2020-06-07 00:00:00 9629.88 801.0
2020-06-07 01:00:00 9647.38 541.0
2020-06-07 02:00:00 9654.82 1034.0
2020-06-07 03:00:00 9671.70 710.0
2020-06-07 04:00:00 9677.98 1264.0
2020-06-07 05:00:00 9659.31 798.0
2020-06-07 06:00:00 9656.76 886.0
2020-06-07 07:00:00 9639.48 1769.0
2020-06-07 08:00:00 9599.25 3190.0
2020-06-07 09:00:00 9623.41 1332.0
2020-06-07 10:00:00 9610.64 1018.0
2020-06-07 11:00:00 9575.59 1812.0
2020-06-07 12:00:00 9499.99 5431.0
2020-06-07 13:00:00 9446.98 4372.0
2020-06-07 14:00:00 9426.07 5999.0
2020-06-07 15:00:00 9463.05 1097.0
I have the following strings:
start = "07:00:00"
end = "17:00:00"
How can I generate a list of 5 minute interval between those times, ie
["07:00:00","07:05:00",...,"16:55:00","17:00:00"]
This works for me, I'm sure you can figure out how to put the results in the list instead of printing them out:
>>> import datetime
>>> start = "07:00:00"
>>> end = "17:00:00"
>>> delta = datetime.timedelta(minutes=5)
>>> start = datetime.datetime.strptime( start, '%H:%M:%S' )
>>> end = datetime.datetime.strptime( end, '%H:%M:%S' )
>>> t = start
>>> while t <= end :
... print datetime.datetime.strftime( t, '%H:%M:%S')
... t += delta
...
07:00:00
07:05:00
07:10:00
07:15:00
07:20:00
07:25:00
07:30:00
07:35:00
07:40:00
07:45:00
07:50:00
07:55:00
08:00:00
08:05:00
08:10:00
08:15:00
08:20:00
08:25:00
08:30:00
08:35:00
08:40:00
08:45:00
08:50:00
08:55:00
09:00:00
09:05:00
09:10:00
09:15:00
09:20:00
09:25:00
09:30:00
09:35:00
09:40:00
09:45:00
09:50:00
09:55:00
10:00:00
10:05:00
10:10:00
10:15:00
10:20:00
10:25:00
10:30:00
10:35:00
10:40:00
10:45:00
10:50:00
10:55:00
11:00:00
11:05:00
11:10:00
11:15:00
11:20:00
11:25:00
11:30:00
11:35:00
11:40:00
11:45:00
11:50:00
11:55:00
12:00:00
12:05:00
12:10:00
12:15:00
12:20:00
12:25:00
12:30:00
12:35:00
12:40:00
12:45:00
12:50:00
12:55:00
13:00:00
13:05:00
13:10:00
13:15:00
13:20:00
13:25:00
13:30:00
13:35:00
13:40:00
13:45:00
13:50:00
13:55:00
14:00:00
14:05:00
14:10:00
14:15:00
14:20:00
14:25:00
14:30:00
14:35:00
14:40:00
14:45:00
14:50:00
14:55:00
15:00:00
15:05:00
15:10:00
15:15:00
15:20:00
15:25:00
15:30:00
15:35:00
15:40:00
15:45:00
15:50:00
15:55:00
16:00:00
16:05:00
16:10:00
16:15:00
16:20:00
16:25:00
16:30:00
16:35:00
16:40:00
16:45:00
16:50:00
16:55:00
17:00:00
>>>
Try:
# import modules
from datetime import datetime, timedelta
# Create starting and end datetime object from string
start = datetime.strptime("07:00:00", "%H:%M:%S")
end = datetime.strptime("17:00:00", "%H:%M:%S")
# min_gap
min_gap = 5
# compute datetime interval
arr = [(start + timedelta(hours=min_gap*i/60)).strftime("%H:%M:%S")
for i in range(int((end-start).total_seconds() / 60.0 / min_gap))]
print(arr)
# ['07:00:00', '07:05:00', '07:10:00', '07:15:00', '07:20:00', '07:25:00', '07:30:00', ..., '16:55:00']
Explanations:
First, you need to convert string date to datetime object. The strptime does it!
Then, we will find the number of minutes between the starting date and the ending datetime. This discussion solved it! We can do it like this :
(end-start).total_seconds() / 60.0
However, in our case, we only want to iterate every n minutes. So, in our loop, we need to divide it by n.
Also, as we will iterate over this number of minutes, we need to convertit to int for the for loop. That results in:
int((end-start).total_seconds() / 60.0 / min_gap)
Then, on each element of our loop, we will add the number of minutes to the initial datetime. The tiemdelta function has been designed for. As parameter, we specify the number of hours we want to add : min_gap*i/60.
Finally, we convert this datetime object back to a string object using the strftime.
I have a dataframe that has a date time column called start time and it is set to a default of 12:00:00 AM. I would like to reset this column so that the first row is 00:01:00 and the second row is 00:02:00, that is one minute interval.
This is the original table.
ID State Time End Time
A001 12:00:00 12:00:00
A002 12:00:00 12:00:00
A003 12:00:00 12:00:00
A004 12:00:00 12:00:00
A005 12:00:00 12:00:00
A006 12:00:00 12:00:00
A007 12:00:00 12:00:00
I want to reset the start time column so that my output is this:
ID State Time End Time
A001 0:00:00 12:00:00
A002 0:00:01 12:00:00
A003 0:00:02 12:00:00
A004 0:00:03 12:00:00
A005 0:00:04 12:00:00
A006 0:00:05 12:00:00
A007 0:00:06 12:00:00
How do I go about this?
you could use pd.date_range:
df['Start Time'] = pd.date_range('00:00', periods=df['Start Time'].shape[0], freq='1min')
gives you
df
Out[23]:
Start Time
0 2019-09-30 00:00:00
1 2019-09-30 00:01:00
2 2019-09-30 00:02:00
3 2019-09-30 00:03:00
4 2019-09-30 00:04:00
5 2019-09-30 00:05:00
6 2019-09-30 00:06:00
7 2019-09-30 00:07:00
8 2019-09-30 00:08:00
9 2019-09-30 00:09:00
supply a full date/time string to get another starting date.
First we convert your State Time column to datetime type. Then we use pd.date_range and use the first time as starting point with a frequency of 1 minute.
df['State Time'] = pd.to_datetime(df['State Time'])
df['State Time'] = pd.date_range(start=df['State Time'].min(),
periods=len(df),
freq='min').time
Output
ID State Time End Time
0 A001 12:00:00 12:00:00
1 A002 12:01:00 12:00:00
2 A003 12:02:00 12:00:00
3 A004 12:03:00 12:00:00
4 A005 12:04:00 12:00:00
5 A006 12:05:00 12:00:00
6 A007 12:06:00 12:00:00
How can I iterate over days in the dataframe in pandas?
Example:
My dataframe:
time consumption
time
2016-10-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-10-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
Pseudocode:
for day in df:
print day
First iteration return:
time consumption
time
2016-10-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-10-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
Second iteration return:
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
Third iteration return :
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
Use groupby by date, what is a bit different as day:
#groupby by index date
for idx, day in df.groupby(df.index.date):
print (day)
time consumption
time
2016-10-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-10-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
time consumption
time
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
time consumption
time
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
Or:
#groupby by column time
for idx, day in df.groupby(df.time.dt.date):
print (day)
time consumption
time
2016-10-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-10-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
time consumption
time
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
time consumption
time
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
Differences can check in first 2 rows are changed with different month:
for idx, day in df.groupby(df.index.day):
print (day)
time consumption
time
2016-09-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-09-17 10:00:00 2016-10-17 10:00:00 2135.966666
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
time consumption
time
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
time consumption
time
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
for idx, day in df.groupby(df.index.date):
print (day)
time consumption
time
2016-09-17 09:00:00 2016-10-17 09:00:00 2754.483333
2016-09-17 10:00:00 2016-10-17 10:00:00 2135.966666
time consumption
time
2016-10-17 11:00:00 2016-10-17 11:00:00 1497.716666
2016-10-17 12:00:00 2016-10-17 12:00:00 448.100000
time consumption
time
2016-10-24 09:00:00 2016-10-24 09:00:00 1527.716666
2016-10-24 10:00:00 2016-10-24 10:00:00 1219.833333
2016-10-24 11:00:00 2016-10-24 11:00:00 1284.350000
2016-10-24 12:00:00 2016-10-24 12:00:00 14195.633333
time consumption
time
2016-10-31 09:00:00 2016-10-31 09:00:00 2120.933333
2016-10-31 10:00:00 2016-10-31 10:00:00 1630.700000
2016-10-31 11:00:00 2016-10-31 11:00:00 1241.866666
2016-10-31 12:00:00 2016-10-31 12:00:00 1156.266666
I have this dataframe. The columns represent the highs and the lows in daily EURUSD price:
df.low df.high
2013-01-17 16:00:00 1.33394 2013-01-17 20:00:00 1.33874
2013-01-18 18:00:00 1.32805 2013-01-18 09:00:00 1.33983
2013-01-21 00:00:00 1.32962 2013-01-21 09:00:00 1.33321
2013-01-22 11:00:00 1.32667 2013-01-22 09:00:00 1.33715
2013-01-23 17:00:00 1.32645 2013-01-23 14:00:00 1.33545
2013-01-24 10:00:00 1.32860 2013-01-24 18:00:00 1.33926
2013-01-25 04:00:00 1.33497 2013-01-25 17:00:00 1.34783
2013-01-28 10:00:00 1.34246 2013-01-28 16:00:00 1.34771
2013-01-29 13:00:00 1.34143 2013-01-29 21:00:00 1.34972
2013-01-30 08:00:00 1.34820 2013-01-30 21:00:00 1.35873
2013-01-31 13:00:00 1.35411 2013-01-31 17:00:00 1.35944
I summed them up into a third column (df.extremes).
df.extremes
2013-01-17 16:00:00 1.33394
2013-01-17 20:00:00 1.33874
2013-01-18 18:00:00 1.32805
2013-01-18 09:00:00 1.33983
2013-01-21 00:00:00 1.32962
2013-01-21 09:00:00 1.33321
2013-01-22 09:00:00 1.33715
2013-01-22 11:00:00 1.32667
2013-01-23 14:00:00 1.33545
2013-01-23 17:00:00 1.32645
2013-01-24 10:00:00 1.32860
2013-01-24 18:00:00 1.33926
2013-01-25 04:00:00 1.33497
2013-01-25 17:00:00 1.34783
2013-01-28 10:00:00 1.34246
2013-01-28 16:00:00 1.34771
2013-01-29 13:00:00 1.34143
2013-01-29 21:00:00 1.34972
2013-01-30 08:00:00 1.34820
2013-01-30 21:00:00 1.35873
2013-01-31 13:00:00 1.35411
2013-01-31 17:00:00 1.35944
But now i want to filter some values from df.extremes.
To explain what to filter i try with this "pseudocode":
IF following the index we move from: previous df.low --> df.low --> df.high:
IF df.low > previous df.low: delete df.low
IF df.low < previous df.low: delete previous df.low
If i try to work this out with a for loop, it gives me a KeyError: 1.3339399999999999.
day = df.groupby(pd.TimeGrouper('D'))
is_day_min = day.extremes.apply(lambda x: x == x.min())
for i in df.extremes:
if is_day_min[i] == True and is_day_min[i+1] == True:
if df.extremes[i] > df.extremes[i+1]:
del df.extremes[i]
for i in df.extremes:
if is_day_min[i] == True and is_day_min[i+1] == True:
if df.extremes[i] < df.extremes[i+1]:
del df.extremes[i+1]
How to filter/delete the values as i explained in pseudocode?
I am struggling with indexing and bools but i can't solve this. I strongly suspect that i need to use a lambda function, but i don't know how to apply it. So please have mercy it's too long that i'm trying on this. Hope i've been clear enough.
All you're really missing is a way of saying "previous low" in a vectorized fashion. That's spelled df['low'].shift(-1). Once you have that it's just:
prev = df.low.shift(-1)
filtered_df = df[~((df.low > prev) | (df.low < prev))]