I would like to compare 2 rows in a pandas dataframe but I always get an Error saying: AttributeError: 'float' object has no attribute 'MACD'.
This is the df:
time open high low close tick_volume spread real_volume EMA_LONG EMA_SHORT MACD SIGNAL HIST 200EMA
0 2018-01-05 03:00:00 1.20775 1.20794 1.20700 1.20724 2887 1 0 1.206134 1.206803 0.000669 0.000669 0.000000 1.207240
1 2018-01-05 04:00:00 1.20723 1.20743 1.20680 1.20710 2349 1 0 1.206216 1.206849 0.000633 0.000649 -0.000016 1.207170
2 2018-01-05 05:00:00 1.20709 1.20755 1.20709 1.20744 1869 1 0 1.206318 1.206941 0.000622 0.000638 -0.000016 1.207261
Now I want to count on how many times it would buy and sell based on some information in the rows so I'm trying to iterate through it like this:
buy = 0
sell = 0
for i, row in df.iterrows():
if i == 0:
continue
if row.MACD > row.SIGNAL and row[i - 1].MACD < row[i - 1].SIGNAL:
if row.HIST < 0 and row.MACD > row['200EMA'] and row.SIGNAL > row['200EMA']:
buy += 1
elif row.MACD < row.SIGNAL and row[i - 1].MACD > row[i - 1].SIGNAL:
if row.HIST > 0 and row.MACD < row['200EMA'] and row.SIGNAL < row['200EMA']:
sell += 1
print("BUY: " + buy + "SELL: " + sell)
I am getting the following Error:
AttributeError: 'float' object has no attribute 'MACD'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-75-ff4a2b3629bc> in <module>
8 if row.HIST < 0 and row.MACD > row['200EMA'] and row.SIGNAL > row['200EMA']:
9 buy += 1
---> 10 elif row.MACD < row.SIGNAL and row[i - 1].MACD > row[i - 1].SIGNAL:
11 if row.HIST > 0 and row.MACD < row['200EMA'] and row.SIGNAL < row['200EMA']:
12 sell += 1
AttributeError: 'float' object has no attribute 'MACD'
I know this Error has already been here but I the solutions there didn't help me.
Thank you already!
your problem his here row[i - 1].MACD
when you are accesessing the row[i-1] place you get the value of the location in the service
if i = 1 then you will get the row[0] for the row and not the preivice row in the dataframe you should probably switch it by df.iloc[i-1].MACD
Related
I am trying to fill up a column in a dataframe with 1, 0 or -1 depending on some factors by doing it like this:
def set_order_signal(row):
if (row.MACD > row.SIGNAL) and (df.iloc[i-1].MACD < df.iloc[i-1].SIGNAL):
if (row.MACD < 0 and row.SIGNAL < 0) and (row.close > row['200EMA']):
return 1
elif (row.MACD < row.SIGNAL) and (df.iloc[i-1].MACD > df.iloc[i-1].SIGNAL):
if (row.MACD > 0 and row.SIGNAL > 0) and (row.close < row['200EMA']):
return -1
else:
return 0
Sometimes it works but in other rows it returns "NaN". I can't find a reason or solution for this.
The dataframe I work with looks like this:
time open high low close tick_volume spread real_volume EMA_LONG EMA_SHORT MACD SIGNAL HIST 200EMA OrderSignal
0 2018-01-09 05:00:00 1.19726 1.19751 1.19675 1.19717 1773 1 0 1.197605 1.197152 -0.000453 -0.000453 0.000000e+00 1.197170 0.0
1 2018-01-09 06:00:00 1.19717 1.19724 1.19659 1.19681 1477 1 0 1.197538 1.197099 -0.000439 -0.000445 6.258599e-06 1.196989 0.0
2 2018-01-09 07:00:00 1.19681 1.19718 1.19642 1.19651 1622 1 0 1.197452 1.197008 -0.000444 -0.000445 5.327180e-07 1.196828 0.0
3 2018-01-09 08:00:00 1.19650 1.19650 1.19518 1.19560 3543 1 0 1.197298 1.196789 -0.000509 -0.000466 -4.237181e-05 1.196516 NaN
I'm trying to apply it to the df with this:
df['OrderSignal'] = df.apply(set_order_signal, axis=1)
Is it a format problem?
Thank you already!
If you are looking for the index of the row that is sent to function, you need to use row.name, not i.
Try this and see what you get for your results. Can't tell if the logic is correct in all cases, but the four rows returns 0 each time
def set_order_signal(row):
if (row.MACD > row.SIGNAL) and (df.iloc[row.name-1].MACD < df.iloc[row.name-1].SIGNAL):
if (row.MACD < 0 and row.SIGNAL < 0) and (row.close > row['200EMA']):
return 1
elif (row.MACD < row.SIGNAL) and (df.iloc[row.name-1].MACD > df.iloc[row.name-1].SIGNAL):
if (row.MACD > 0 and row.SIGNAL > 0) and (row.close < row['200EMA']):
return -1
else:
return 0
My code below is checking to see if a certain signal shows it self in some data.
I am trying to implement the solution from:
https://stackoverflow.com/a/50302942/11739577
especially these lines:
data['cor_price'] = data['close'].where((data['signal'] == 1) & (data['positions'] == 1), pd.np.nan)
data['cor_price'] = data['cor_price'].ffill().astype(data['close'].dtype)
data['diff_perc'] = (data['close'] - data['cor_price']) / data['cor_price']
data['positions2'] = np.where(data['diff_perc'] <= -0.05, 1, 0)
This is what the data looks like:
Index DATE NAME PRICE ORDER MACD MACD_SIGNAL MACD_HIST
33 2020-02-18 Close 39.450001 buy -0.473582 -0.775 0.301418
34 2020-02-19 Close 40.610001 buy -0.314391 -0.682879 0.368487
35 2020-02-20 Close 41.18 buy -0.140616 -0.574426 0.43381
36 2020-02-21 Close 39.959999 buy -0.100187 -0.479578 0.379391
37 2020-02-24 Close 38.669998 buy -0.170276 -0.417718 0.247441
38 2020-02-25 Close 38.41 buy -0.24399 -0.382972 0.138982
39 2020-02-26 Close 37.950001 buy -0.335657 -0.373509 0.037852
40 2020-02-27 Close 36.16 buy -0.546443 -0.408096 -0.138347
41 2020-02-28 Close 34.490002 buy -0.838581 -0.494193 -0.344388
42 2020-03-02 Close 33.98 buy -1.098591 -0.615073 -0.483518
43 2020-03-03 Close 34.169998 buy -1.274626 -0.746983 -0.527643
44 2020-03-04 Close 35.060001 buy -1.327023 -0.862991 -0.464032
45 2020-03-05 Close 34.110001 buy -1.428735 -0.97614 -0.452595
46 2020-03-06 Close 32.82 buy -1.595048 -1.099922 -0.495127
47 2020-03-09 Close 29.040001 buy -2.008712 -1.28168 -0.727032
48 2020-03-10 Close 29.200001 buy -2.297153 -1.484774 -0.812378
49 2020-03-11 Close 29.74 buy -2.453883 -1.678596 -0.775287
To give some context:
When the signal, line 1 and line 2 in the code, is triggered a buy takes place: buy += 1.
If buy == 1 I add the buying price to cor_price and use ffill() to fill cor_price "column" in order to calculate the diff_perc. As each day goes by and price, which is index[column][i] changes. diff_perc is the difference between when we bought and at each given day after buying.
When diff_perc is < 0.05 stop_loss is triggered: stop_loss +=1 and this means we sell: sell += 1 and are no longer buying: buy -= 1.
How can I implement the stop loss?
I can't seem to attach index[column][i] to cor_price and use ffill()
buy = 0
sell = 0
cor_price=0
diff_perc=0
stop_loss=0
temp = 0
stop_loss = 0
if (df["macd_signal"][i+1] < df["macd"][i+1]) != (df["macd_signal"][i] < df["macd"][i]):
if ((df["macd_signal"][i+1] < df["macd"][i+1]),(df["macd_signal"][i] < df["macd"][i])) == (True,False):
buy += 1 #we buy as we have spotted a signal
if sell == 1: #If we previously have sold but make a buy order, sells should be 0
sell -= 1
if stop_loss == 1: #If we previously sold on stop loss, but we buy again, stop loss should be 0
stop_loss -= 1
if buy == 1: #If we have bought we check if the price is negative, and hits our stop loss meaning we also sell.
cor_price += index[column][i]
cor_price = cor_price.ffill().astype(index[column][i].dtype)
diff_perc += (index[column][i] - cor_price) / cor_price
stop_loss += np.where(diff_perc <= -0.05, 1, 0)
buy -= 1
sell += 1
else:
sell += 1 #we sell as we have observed a sell signal
buy -= 1 #as we now have sold, we have no buy orders - until we buy again.
#append to a list to create a pd.DataFrame later
BUY_SIGNAL.append(buy)
SELL_SIGNAL.append(sell)
DF_CORR.append(cor_price)
DIFF_PER.append(diff_perc)
STOP_LOSS.append(stop_loss)
I have this single line of code that checks if a dataframe column is between the range of a value.
data.loc[data.day<6, 'month'] -= 1
The above code works fine for the entire dataframe, but I only want to apply it to the key column with value equal to salary
data
amount day month key
0 111627.94 1 6 salary
474 131794.61 31 10 salary
590 131794.61 29 11 salary
1003 102497.94 11 7 other_income
1245 98597.94 1 8 other_income
2446 5000.00 2 7 other_income
2447 10000.00 2 7 other_income
Expected output:
amount day month key
0 111627.94 1 5 salary
474 131794.61 31 10 salary
590 131794.61 29 11 salary
1003 102497.94 11 7 other_income
1245 98597.94 1 8 other_income
2446 5000.00 2 7 other_income
2447 10000.00 2 7 other_income
I have tried using this filter query
data[[data.key == 'salary'].day<13, 'month'] -= 1 which resulted to the below error
AttributeError Traceback (most recent call last)
<ipython-input-773-81b5a31a7b9f> in <module>
----> 1 test_df[[test_df.key == 'salary'].day<13, 'month'] -= 1
AttributeError: 'list' object has no attribute 'day'
tried this as well
new = data.loc[data.key == 'salary'], new.loc[new.day<6, 'month'] -=1 This worked but I want to do it in a single line rather than assigning a variable new to it.
You can combine multiple conditions into one Boolean index by using logical operators and surrounding each condition with parentheses:
data.loc[(data.day < 6) & (data.key == "salary"), "month"] -= 1
My current dataframe looks like this:
midprice ema12 ema26 difference
0 0.002990 0.002990 0.002990 0.000000e+00
1 0.002990 0.002990 0.002990 4.227920e-08
2 0.003018 0.002994 0.002992 2.295777e-06
3 0.003025 0.002999 0.002994 4.579221e-06
4 0.003067 0.003009 0.003000 9.708765e-06
5 0.003112 0.003025 0.003008 1.718520e-05
What I tried is the following:
df.loc[:, 'action'] = np.select(condlist=[df.difference[0] < df.difference[-1] < df.difference[-2], df.ema12 < df.ema26 ], choicelist=['buy', 'sell'], default='do nothing')
So update the column action with buy if three times in a row the values of the column difference is smaller than it's previous value. Any idea on how to proceed? Thanks!
I think you need:
m1= df['difference'] < df['difference'].shift(-1)
m2= df['difference'] < df['difference'].shift(-2)
m3= df['difference'] < df['difference'].shift(-3)
df['action'] = np.select(condlist=[m1 | m2 | m3, df.ema12 < df.ema26 ],
choicelist=['buy', 'sell'],
default='do nothing')
print (df)
midprice ema12 ema26 difference action
0 0.002990 0.002990 0.002990 0.000000e+00 buy
1 0.002990 0.002990 0.002990 4.227920e-08 buy
2 0.003018 0.002994 0.002992 2.295777e-06 buy
3 0.003025 0.002999 0.002994 4.579221e-06 buy
4 0.003067 0.003009 0.003000 9.708765e-06 buy
5 0.003112 0.003025 0.003008 1.718520e-05 do nothing
I am using pandas library and I am having some problems with performance using .iloc on pandas.
The idea for main software is to search in each row and column of dataframe and if reach in any condition, update this specific row and column of this dataframe with a new value.
Below follow some lines of this code:
for cont, val in enumerate(id_truck_list):
print cont
for index, row in all_travel.iterrows():
id_tr = int(all_travel.iloc[index, 0])
begin = all_travel.iloc[index, 5]
end = all_travel.iloc[index, 11]
if int(val) == id_tr:
#print "test1"
#print id_tr
#print begin_list[cont]
#print begin
#print end_list[cont]
#print end
if begin_list[cont] >= begin:
if end_list[cont] <= begin:
pass
else:
#print 'h1'
all_travel.iloc[index, 18] = all_travel.iloc[index, 18] + 3
else:
if begin < end_list[cont] :
if end <= end_list[cont]:
#print 'h2'
#print(all_travel.iloc[index, 18])
all_travel.iloc[index, 18] = all_travel.iloc[index, 18] + 5
#print(all_travel.iloc[index, 18])
#print str(index)
else:
#print 'h3'
all_travel.iloc[index, 18] = all_travel.iloc[index, 18] + 7
else:
pass
This idea is performing in very slow way (more or less 10 rows per minute). Do you have any idea using pandas library
Below follow the all_travel.head()
truck_id id_farm gatec_dist gps_go_dist gps_ret_dist t1gatec \
0 2010028.0 76.0 11 11.8617 0.211655 2016-03-09 00:24:00
1 2010028.0 1.0 16.2 9.86 0.0637544 2016-03-13 23:57:00
2 2010028.0 75.0 18 10.78 9.65 2016-03-18 09:17:00
3 2010028.0 62.0 6 8.51291 3.99291 2016-03-19 20:16:00
4 2010028.0 62.0 6 2.91 0.0428008 2016-03-21 03:00:00
t1gps t2gatec t2gps t3gatec \
0 03/09/2016 00:09:58 0 03/09/2016 00:43:46 0
1 03/13/2016 23:46:00 0 03/14/2016 00:53:10 0
2 03/18/2016 09:13:15 0 03/18/2016 10:17:14 0
3 03/19/2016 20:29:59 0 03/19/2016 21:22:40 0
4 03/21/2016 02:49:34 0 03/21/2016 03:38:59 0
t3gps t4gatec t4gps wait_mill \
0 03/09/2016 07:00:15 2016-03-09 02:14:55 03/09/2016 02:14:55 154.500000
1 03/14/2016 13:54:30 2016-03-14 01:12:58 03/14/2016 01:12:58 124.733333
2 03/18/2016 12:07:00 2016-03-18 12:37:41 03/18/2016 12:44:01 408.316667
3 03/19/2016 23:57:22 2016-03-19 22:00:08 03/19/2016 22:00:08 256.083333
4 03/22/2016 00:09:56 2016-03-21 04:01:20 03/21/2016 04:01:20 47.333333
go_field wait_field ret_mill tot_trav maintenance_level
0 33.800000 376.483333 -285.333333 124.950000 1
1 67.166667 781.333333 -761.533333 86.966667 1
2 63.983333 109.766667 37.016667 210.766667 1
3 52.683333 154.700000 -117.233333 90.150000 1
4 49.416667 1230.950000 -1208.600000 71.766667 1
I have done another solution that has improved a lot my speed performance.
I changed parts of dataframe to list, due the better performance using lists against normal dataframe.
The conclusion, now I need to wait two minutes for the answer, not 3 days.
Bellow follow the modification
for cont, val in enumerate(id_truck_list):
for cont2, val2 in enumerate(id_truck_list2):
id_tr = int(id_truck_list2[cont2])
begin = begin_list2[cont2]
end = end_list2[cont2]
if int(id_truck_list[cont]) == id_tr:
if begin_list[cont] >= begin:
if begin_list[cont] >= end:
pass
else:
maintenance_list[cont2] = maintenance_list[cont2] + 3
else:
if begin < end_list[cont] :
if end <= end_list[cont]:
#print 'h2'
maintenance_list[cont2] = maintenance_list[cont2] +
#print str(index)
else:
#print 'h3'
maintenance_list[cont2] = maintenance_list[cont2] +
else:
pass
print 'list size ' + str(len(maintenance_list))
for cont3, val3 in enumerate(maintenance_list):
print 'list update ' + str(cont3)
all_travel.iloc[cont3, 18] = maintenance_list[cont3]