Finding MACD Divergence - python

I want to create a loop to automate finding MACD divergence with specific scenario/criterion, but I am finding it difficult to execute although its very easy to spot when looking at chart by eyes. Note: you can easily get this as ready available scanner but i want to improve my python knowledge, hope someone will be able to help me here with this mission.
My main issue is how to make it reference 40 rows up, and test forward - couldn't get my head around the logic itself.
The rules are as follow: lets say we have the table below
Date
Price
MACD Hist
04/08/2021
30
1
05/08/2021
29
0.7
06/08/2021
28
0.4
07/08/2021
27
0.1
08/08/2021
26
-0.15
09/08/2021
25
-0.70
10/08/2021
26
-0.1
11/08/2021
27
0.2
12/08/2021
28
0.4
13/08/2021
29
0.5
14/08/2021
30
0.55
15/08/2021
31
0.6
16/08/2021
30
0.55
17/08/2021
29
0.5
18/08/2021
28
0.4225
19/08/2021
27
0.4
20/08/2021
26
0.35
21/08/2021
25
0.3
22/08/2021
24
0.25
23/08/2021
23
0.2
24/08/2021
22
0.15
25/08/2021
21
0.1
26/08/2021
20
0.05
27/08/2021
19
0
28/08/2021
18
-0.05
29/08/2021
17
-0.1
30/08/2021
16
-0.25
i want the code to:
look back 40 days from today, within these 40 days get the lowest
point reached in MACDHist and Price corresponding to it(i.e. price 25$ on
09/08/2021 in this example and the MACDHist -0.7)
compare it with today's price & MACDHist and give divergence or not based on below 3 rules:
If today's price < the recorded price in point 1 (16$ < 25$ in this example) AND
Today's MACDHist > the recorded MACD in Absolute terms in point 1 (ABS(-0.7) > ABS(-0.20)) AND
During the same period we recorded those Price and MACDHist (between 09/08/2021 and today) the MACDHist was positive at least once.
I am sorry if my explanation isn't very clear, for that the below picture might help illustrate the scenario I am after:
A. The Lowes MACDHist in specfied period
B. Within the same period, MACDHist were positive at least once
C. Price is lower than in point A (Price C is lower than A) and MACDHist was higher than MACDHist in Point A (i.e. Lower in ABS terms)

In a similar case i have used backtrader. Its a feature-rich Python framework for backtesting and trading and you can also use it in order to generate lots of predefined indicators. In addition with this framework you are able to develop your own custom indicator as shown here. Its very easy to use and it supports lots of data formats like pandas data frames. Please take a look!

I found the answer in this great post. its not direct implementation but at least the logic is the same and by replacing RSI info with MACDHist you get to the same conclusion.
How to implement RSI Divergence in Python

Related

Determining average values over irregular number of rows in a csv file

I have a csv file with days of the year in one column and temperature in another. The days are split into sections and I want to find the average temperature over each day.Eg day 0,1,2,3 etc
The measurements of temperatures has been taken irregularly meaning there are different numbers of measurements at certain times for each day.
Typically I would use df.groupby(np.arange(len(df)) // n).mean() but n, the number of rows will be varying in this case.
I have an example of what the data is like.
Days
Temp
0.75
19
0.8
18
1.2
18
1.25
18
1.75
19
3.05
18
3.55
21
3.60
21
3.9
18
4.5
20
You could convert Days to an integer and use that to group.
>>> df.groupby(df["Days"].astype(int)).mean()
Days Temp
Days
0 0.775 18.500000
1 1.400 18.333333
3 3.525 19.500000
4 4.500 20.000000

Pandas query after groupby and agg function

I have the following piece of code already, basically I'm grouping by week and then by the tenant, then manupulating the data by calculating the p95 and p90, now I only want to print violators, as in when p90 is >10 or p95 is > 50. I tried using query after manupulating the data frame, it doesn't work as it complains about key error. Any way I can run my computations inside the agg function itself ? Below is the code I have come up with :
total_logs_percentiles = data.groupby(['version_week', 'tenant']).agg({'time_in_queue': {'p90': p90_agg, 'p95': p95_agg},
'duration': {'p90': p90_agg, 'p95': p95_agg}})
This is an example data output from the above expression:
time_in_queue duration
p90 p95 p90 p95
version_week tenant
1 google 0.9 22 44 0.5
yahoo 12 21 4 0.5
bing 0.5 22 5 0.5
duckduckgo 0.7 23 4 0.5
IE 25 24 46 0.5
Edge 60 25 47 0.5
Then I'm doing a query to filter the violators like below, but it doesn't work and complains with key error, how do I fix it ?
total_access_logs_percentiles.query('time_in_queue.p90 > 10', engine='python')
How do I fix the above and also optimizze this in such way that I can filter the values out in the agg function itself ? p90_agg and p95_agg are functions I already have.

Summarising features with multiple values in Python for Machine Learning model

I have a data file containing different foetal ultrasound measurements. The measurements are collected at different points during pregnancy, like so:
PregnancyID MotherID gestationalAgeInWeeks abdomCirc
0 0 14 150
0 0 21 200
1 1 20 294
1 1 25 315
1 1 30 350
2 2 8 170
2 2 9 180
2 2 18 NaN
As you can see from the table above, I have multiple measurements per pregnancy (between 1 and 26 observations each).
I want to summarise the ultrasound measurements somehow such that I can replace the multiple measurements with a fixed amount of features per pregnancy. So I thought of creating 3 new features, one for each trimester of pregnancy that would hold the maximum measurement recorded during that trimester:
abdomCirc1st: this feature would hold the maximum value of all abdominal circumference measurements measured between 0 to 13 Weeks
abdomCirc2nd: this feature would hold the maximum value of all abdominal circumference measurements measured between 14 to 26 Weeks
abdomCirc3rd: this feature would hold the maximum value of all abdominal circumference measurements measured between 27 to 40 Weeks
So my final dataset would look like this:
PregnancyID MotherID abdomCirc1st abdomCirc2nd abdomCirc3rd
0 0 NaN 200 NaN
1 1 NaN 315 350
2 2 180 NaN NaN
The reason for using the maximum here is that a larger abdominal circumference is associated with the adverse outcome I am trying to predict.
But I am quite confused about how to go about this. I have used the groupby function previously to derive certain statistical features from the multiple measurements, however this is a more complex task.
What I want to do is the following:
Group all abdominal circumference measurements that belong to the same pregnancy into 3 trimesters based on gestationalAgeInWeeks value
Compute the maximum value of all abdominal circumference measurements within each trimester, and assign this value to the relevant feature; abdomCirc1st, abdomCir2nd or abdomCirc3rd.
I think I have to do something along the lines of:
df["abdomCirc1st"] = df.groupby(['MotherID', 'PregnancyID', 'gestationalAgeInWeeks'])["abdomCirc"].transform('max')
But this code does not check what trimester the measurement was taken in (gestationalAgeInWeeks). I would appreciate some help with this task.
You can try this. a bit of a complicated query but it seems to work:
(df.groupby(['MotherID', 'PregnancyID'])
.apply(lambda d: d.assign(tm = (d['gestationalAgeInWeeks']+ 13 - 1 )// 13))
.groupby('tm')['abdomCirc']
.apply(max))
.unstack()
)
produces
tm 1 2 3
MotherID PregnancyID
0 0 NaN 200.0 NaN
1 1 NaN 294.0 350.0
2 2 180.0 NaN NaN
Let's unpick this a bit. First we groupby on MontherId, PregnancyID. Then we apply a function to each grouped dataframe (d)
For each d, we create a 'trimester' column 'tm' via assign (I assume I got the math right here, but correct it if it is wrong!), then we groupby by 'tm' and apply max. For each sub-dataframe d then we obtain a Series which is tm:max(abdomCirc).
Then we unstack() that moves tm to the column names
You may want to rename this columns later, but I did not bother
Solution 2
Come to think of it you can simplify the above a bit:
(df.assign(tm = (df['gestationalAgeInWeeks']+ 13 - 1 )// 13))
.drop(columns = 'gestationalAgeInWeeks')
.groupby(['MotherID', 'PregnancyID','tm'])
.agg('max')
.unstack()
)
similar idea, same output.
There is a magic command called query. This should do your work for now:
abdomCirc1st = df.query('MotherID == 0 and PregnancyID == 0 and gestationalAgeInWeeks <= 13')['abdomCirc'].max()
abdomCirc2nd = df.query('MotherID == 0 and PregnancyID == 0 and gestationalAgeInWeeks >= 14 and gestationalAgeInWeeks <= 26')['abdomCirc'].max()
abdomCirc3rd = df.query('MotherID == 0 and PregnancyID == 0 and gestationalAgeInWeeks >= 27 and gestationalAgeInWeeks <= 40')['abdomCirc'].max()
If you want something more automatic (and not manually changing the values of your ID's: MotherID and PregnancyID, every time for each different group of rows), you have to combine it with groupby (as you did on your own)
Check this as well: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html

Using a for loop is there a way to control which # loop appends values to a list?

I'm currently working with 3 data frames named doctorate, high_school and bachelor that look a bit like this:
ID age education marital_status occupation annual_income Age_25 Age_30 Age_35 Age_40 Age_45 Age_50
1 2 50 doctorate married professional mid 25 and over 30 and over 35 and over 40 and over 45 and over 50 and over
7 8 40 doctorate married professional high 25 and over 30 and over 35 and over 40 and over under 45 under 50
11 12 45 doctorate married professional mid 25 and over 30 and over 35 and over 40 and over 45 and over under 50
16 17 44 doctorate divorced transport mid 25 and over 30 and over 35 and over 40 and over under 45 under 50
I'm trying to create probabilities based on the annual_income column using the following for loop:
income_levels = ['low','mid','high']
education_levels = [bachelor,doctorate,high_school]
for inc_level in income_levels:
for ed_level in education_levels:
print(inc_level,len(ed_level[ed_level['annual_income'] == inc_level]) / len(ed_level))
It produces this, which is what I want:
low 0.125
low 0.0
low 0.25
mid 0.625
mid 0.75
mid 0.5
high 0.25
high 0.25
high 0.25
However, I want to be able to append these values to a list depending on the income category, the lists would be low_income,mid_income,high_income. I'm sure there's a way that I can modify my for loop to be able to do this, but I can't bridge the gap to getting there. Could anyone help me?
In this case, you're trying to find list via a key/string. Why not just use a dict of lists?
income_levels = ['low','mid','high']
education_levels = [bachelor,doctorate,high_school]
# initial dictionary
inc_level_rates = {il: list() for il in income_levels}
for inc_level in income_levels:
for ed_level in education_levels:
rate = len(ed_level[ed_level['annual_income'] == inc_level]) / len(ed_level)
inc_level_rates[inc_level].append(rate)
print(inc_level, rate)
print(inc_level_rates)

Pandas timeseries bins and indexing

I have some experimental data collected from a number of samples at set time intervals, in a dataframe organised like so:
Studynumber Time Concentration
1 20 80
1 40 60
1 60 40
2 15 95
2 44 70
2 65 30
Although the time intervals are supposed to be fixed, there is some variation in the data based on when they were actually collected. I want to create bins of the Time column, calculate an 'average' concentration, and then compare the difference between actual concentration and average concentration for each studynumber, at each time.
To do this, I created a column called 'roundtime', then used a groupby to calculate the mean:
data['roundtime']=data['Time'].round(decimals=-1)
meanconc = data.groupby('roundtime')['Concentration'].mean()
This gives a pandas series of the mean concentrations, with roundtime as the index. Then I want to get this back into the main frame to calculate the difference between each actual concentration and the mean concentration:
data['meanconcentration']=meanconc.loc[data['roundtime']].reset_index()['Concentration']
This works for the first 60 or so values, but then returns NaN for each entry, I think because the index of data is longer than the index of meanconcentration.
On the one hand, this looks like an indexing issue - equally, it could be that I'm just approaching this the wrong way. So my question is: a) can this method work? and b) is there another/better way of doing it? All advice welcome!
Use transform to add a column from a groupby aggregation, this will create a Series with it's index aligned to the original df so you can assign it back correctly:
In [4]:
df['meanconcentration'] = df.groupby('roundtime')['Concentration'].transform('mean')
df
Out[4]:
Studynumber Time Concentration roundtime meanconcentration
0 1 20 80 20 87.5
1 1 40 60 40 65.0
2 1 60 40 60 35.0
3 2 15 95 20 87.5
4 2 44 70 40 65.0
5 2 65 30 60 35.0

Categories

Resources