Lookup dictionary instead of function [duplicate]

Lookup dictionary instead of function [duplicate] - python

This question already has answers here:
Python: Efficient lookup by interval
(1 answer)
Find value within a range in lookup table
(4 answers)
Closed 8 months ago.
I am pretty new to python and use it only for data analysis.
I got this function to look up specific parameters based on a category and a conditions. I have the idea that this can be done more elegantly. Probably with a dictionary, but I couldn't figure out how to make the x < x1 aspect work
What would be the best way to do this?
def stab_class_z(x,cl):
# Get the sigma_z parameters for each stability class and the distance
if cl == "A":
if x < 0.1:
a = 122.8
b = 0.9447
elif x < 0.16:
a = 158.08
b = 1.0542
elif x < 0.21:
a = 170.22
b = 1.0932
elif x < 0.26:
a = 179.52
b = 1.1262
elif x < 0.31:
a = 217.41
b = 1.2644
elif x < 0.41:
a = 258.89
b = 1.4094
elif x < 0.51:
a = 346.75
b = 1.7283
elif x <= 3.11:
a = 453.85
b = 2.11660
else:
print("not defined________")
if cl == "B":
if x < 0.2:
a = 90.673
b = 0.93198
elif x <= 0.4:
a = 98.483
b = 0.98332
else:
a = 109.3
b = 1.0971
# And so it continues for cl: 'C', 'D', 'E', 'F'
return a,b

Yes for sure a dictionary can be used here.
First of all your dictionary keys can be your cl parameter. Then your can use tuple as key for your dictionary.
You could do something like the following (maybe not optimal)
stab_class_dict = { 'A' : {(0,0.1):(122.8,0.9457), (0.1,0.16): (158.08,1.0542)}}
def stab_class_z(x: float, cl: str):
for limit in stab_class_dict[cl]:
if limit[0] <= x < limit[1]:
return stab_class_dict[cl][limit]
return (default_value_a, default_value_b)
Of course be careful that cl is indeed a key of stab_class_dict.

Related

how do I identify sequence equation Python

Am I able to identify sequence, but not formula
I have the whole code
def analyse_sequence_type(y:list[int]):
if len(y) >= 5:
res = {"linear":[],"quadratic":[],"exponential":[],"cubic":[]}
for i in reversed(range(len(y))):
if i-2>=0 and (y[i] + y[i-2] == 2*y[i-1]): res["linear"].append(True)
elif i-3>=0 and (y[i] - 2*y[i-1] + y[i-2] == y[i-1] - 2*y[i-2] + y[i-3]): res["quadratic"].append(True)
for k, v in res.items():
if v:
if k == "linear" and len(v)+2 == len(y): return k
elif k == "quadratic" and len(v)+3 == len(y): return k
return
print(f"A relation cannot be made with just {len(y)} values.\nPlease enter a minimum of 5 values!")
return
I can identify linear and quadratic but how do I make a function

So, firstly we will need to create two functions for linear and quadratic (formulae attached below).
def linear(y):
"""
Returns equation in format (str)
y = mx + c
"""
d = y[1]-y[0] # get difference
c = f"{y[0]-d:+}" # get slope
if d == 0: c = y[0] - d # if no difference then intercept is 0
return f"f(x) = {d}x {c} ; f(1) = {y[0]}".replace("0x ","").replace("1x","x").replace(" + 0","");
We apply a similar logic for quadratic:
def quadratic(y):
"""
Returns equation in format (str)
y = ax² + bx + c
"""
a = logic_round((y[2] - 2*y[1] + y[0])/2) # get a
b = logic_round(y[1] - y[0] - 3*a) # get b
c = logic_round(y[0]-a-b) # get c
return f"f(x) = {a}x² {b:+}x {c:+} ; f(1) = {y[0]}".replace('1x²','x²').replace('1x','x').replace(' +0x','').replace(' +0','')
If you try the code with multiple inputs such as 5.0 you will get 5.0x + 4 (example). To omit that try:
def logic_round(num):
splitted = str(num).split('.') # split decimal
if len(splitted)>1 and len(set(splitted[-1])) == 1 and splitted[-1].startswith('0'): return int(splitted[0]) # check if it is int.0 or similar
elif len(splitted)>1: return float(num) # else returns float
return int(num)
The above functions will work in any way provided that the y is a list where the domain is [1, ∞).
Hope this helps :) Also give cubic a try.

Make This Input Function Faster

I'm practicing some exam questions and I've encountered a time limit issue that I can't figure out. I think its to do with how I'm iterating through the inputs.
It's the famous titanic dataset so I won't bother printing a sample of the df as I'm sure everyone is familiar with it.
The function compares the similarity between two passengers which are provided as input. Also, I am mapping the Sex column with integers in order to compare between passengers you'll see below.
I was also thinking it could be how I'm indexing and locating the values for each passenger but again I'm not sure
The function is as follows and the time limit is 1 second but when no_of_queries == 100 the function takes 1.091s.
df = pd.read_csv("titanic.csv")
mappings = {'male': 0, 'female':1}
df['Sex'] = df['Sex'].map(mappings)
def function_similarity(no_of_queries):
for num in range(int(no_of_queries)):
x = input()
passenger_a, passenger_b = x.split()
passenger_a, passenger_b = int(passenger_a), int(passenger_b)
result = 0
if int(df[df['PassengerId'] == passenger_a]['Pclass']) == int(df[df['PassengerId'] == passenger_b]['Pclass']):
result += 1
if int(df[df['PassengerId'] ==passenger_a]['Sex']) == int(df[df['PassengerId'] ==passenger_b]['Sex']):
result += 3
if int(df[df['PassengerId'] ==passenger_a]['SibSp']) == int(df[df['PassengerId'] ==passenger_b]['SibSp']):
result += 1
if int(df[df['PassengerId'] == passenger_a]['Parch']) == int(df[df['PassengerId'] == passenger_b]['Parch']):
result += 1
result += max(0, 2 - abs(float(df[df['PassengerId'] ==passenger_a]['Age']) - float(df[df['PassengerId'] ==passenger_b]['Age'])) / 10.0)
result += max(0, 2 - abs(float(df[df['PassengerId'] ==passenger_a]['Fare']) - float(df[df['PassengerId'] ==passenger_b]['Fare'])) / 5.0)
print(result / 10.0)
function_similarity(input())

Calculate passenger row by id value once per passengers a and b.
df = pd.read_csv("titanic.csv")
mappings = {'male': 0, 'female':1}
df['Sex'] = df['Sex'].map(mappings)
def function_similarity(no_of_queries):
for num in range(int(no_of_queries)):
x = input()
passenger_a, passenger_b = x.split()
passenger_a, passenger_b = df[df['PassengerId'] == int(passenger_a)], df[df['PassengerId'] == int(passenger_b)]
result = 0
if int(passenger_a['Pclass']) == int(passenger_b['Pclass']):
result += 1
if int(passenger_a['Sex']) == int(passenger_b['Sex']):
result += 3
if int(passenger_a['SibSp']) == int(passenger_b['SibSp']):
result += 1
if int(passenger_a['Parch']) == int(passenger_b['Parch']):
result += 1
result += max(0, 2 - abs(float(passenger_a['Age']) - float(passenger_b['Age'])) / 10.0)
result += max(0, 2 - abs(float(passenger_a['Fare']) - float(passenger_b['Fare'])) / 5.0)
print(result / 10.0)
function_similarity(input())

Need to Compare DataFrames in Pandas using < and > operators for specific data in a column

I am trying to compare the following dataframes:
I have a pair of Z Scores with a specific ENST number here :
Z_SCORE_Raw
ENST00000547849 ENST00000587894
0 -1.3099506 21.56600492
I have to compare each of these numbers to their corresponding ENST code in this dataFrame:
df_new
ENST00000547849High_Avg ENST00000587894 High_Avg
ENST00000547849 Low_Avg ENST00000587894 Low_Avg
0.0026421609368421000 -0.0457525087368421
-0.040015074588235300 -0.04140853107142860
I am given the following formula:
if Z_Score[given ENSTCode] > Avg_High[ENSTCode]
return 1
elif Z_Score[given ENSTCode] > Avg_Low[ENSTCode]
return 0
Elif Avg_High>Z_Score>AVg_Low
return 0.5
I currently have the following code to gather the correct ENST code and compare that ZScore to the corresponding High and Low average of each ENST Code:
for x in Z_score_raw:
if Z_score_raw[x].any() > df_new[x + ' High_Avg'].any():
print('1')
elif Z_score_raw[x].any() < df_new[x + ' Low_Avg'].any():
print('0')
elif df_new[x + ' High_Avg'].any() > Z_score_raw[x].any() > df_new[x + ' Low_Avg']:
print('0.5')
The expected output would be for
ENST00000547849: 0 (as -1.309 < -0.0400150745882353)
ENST00000587894: 1 (as 21.56600492 > -0.45725)
My current code gives me no results and skips by all of the checks. How can I get this to work properly?

The problem is, that you are iterating correctly, but then you are comparing a boolean value that is returned by .any() using > or <.
What is True > False or True < True?
So that doesn't make sense.
If you only have one value per column, just use [0] to select the value at the index 0.
Also, Make sure your column naming pattern is consistent (e.g. no spaces everywhere).
Your Example:
ENST00000547849High_Avg ENST00000587894 High_Avg
My Correction (no Space):
ENST00000547849High_Avg ENST00000587894High_Avg
This will provide your desired result:
import pandas as pd
d = {"ENST00000547849": [-1.3099506], "ENST00000587894": [21.56600492]}
d_2 = {"ENST00000547849High_Avg": [0.0026421609368421000], "ENST00000587894High_Avg" : [-0.0457525087368421], "ENST00000547849Low_Avg" : [-0.040015074588235300], "ENST00000587894Low_Avg": [-0.04140853107142860]}
Z_score_raw = pd.DataFrame(data = d)
df_new = pd.DataFrame(data = d_2)
for x in Z_score_raw:
if Z_score_raw[x][0] > df_new[x + 'High_Avg'][0]:
print(f"{x}: 1")
elif Z_score_raw[x][0] < df_new[x + 'Low_Avg'][0]:
print(f"{x}: 0")
elif df_new[x + 'High_Avg'][0] > Z_score_raw[x][0] > df_new[x + 'Low_Avg'][0]:
print(f"{x}: 0.5")
Output:
ENST00000547849: 0
ENST00000587894: 1

Creating a dictionary with keys being two seperate inputs

I would like to be able to create a dictionary that combines two inputs as the key and stores the outputs of if and elif functions as the values.
my code so far looks like:
dict = {}
r = int(input("what row: "))
c = int(input("what column: "))
a = 0.0
b = 0.0
def weightOn(r,c):
global a
global b
if r < 0:
print('Not valid')
elif r == 0 and c == 0:
return a
elif r > 0 and c == 0:
a += 200 / (2 ** r)
return weightOn(r - 1, 0)
elif r > 0 and c == r:
a += 200 / (2 ** r)
return weightOn(r - 1, 0)
weightOn(r,c)
if r > c > 0:
print(b)
else:
print(a)
I would like variables r & c to be the keys, so if I input r as 2 and c as 1 it would save the value as 100. So hopefully my dictionary could look something like: dict = {2.1 :100} and so on.

You can use a tuple as key:
d = {(r,c): weightOn(r,c)}
Depending on the use-case, hierarchical dictionaries might also be useful:
d = {r : {c: weightOn(r,c)}}

To answer the question you asked, since I don't see you using a dict anywhere in your code snippet, in order to use multiple keys as a dictionary in python, you have to change the data in some way. You can use a tuple, which is essentially creating a list, but instead of using [], you use (). This is the same as a list, but it is immutable. You can also use a list. You can also convert them into a string. Something like:
key = str(r) + str(c)
Presumably if you were to do something like:
key = str(r) + " " + str(c)
dict[key] = weightOn(r, c)
You would get dict = {"2 1" :100}, so long as your function weightOn works as you intended it to.

How do I get the rows to columns to be the row and vice versa [duplicate]

This question already has answers here:
Transpose list of lists
(14 answers)
Closed 5 years ago.
Here is some code I wrote
def binomial_coefficient(x,y):
if y == x:
div = 1
elif y == 1:
div = x
elif y > x:
div = 0
else:
a = math.factorial(x)
b = math.factorial(y)
c = math.factorial(x-y)
div = a // (b * c)
return(div)
def problem_9():
for k in range(6):
empty = '\t'
for zed in range(1,6):
X_sub = (10*zed,(1/5)*zed)
n = X_sub[0]
P = X_sub[1]
formula = binomial_coefficient(n,k)*(P**k)*(1-P)**(n-k)
empty = empty + str(formula) + '\t'
print(empty)
problem_9()
I have the code giving me the correct mathematical values but I need the first column to switch places with the first row. I would like the same thing to happen for each subsequent iteration of the loops. Can anyone help?

just permute the indices :
for zed in range(1,6):
empty = '\t'
for k in range(6):

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Lookup dictionary instead of function [duplicate] - python

Related

how do I identify sequence equation Python

Make This Input Function Faster

Need to Compare DataFrames in Pandas using < and > operators for specific data in a column

Creating a dictionary with keys being two seperate inputs

How do I get the rows to columns to be the row and vice versa [duplicate]

Categories

Resources