Conditional operations in dataframe (if else)

Conditional operations in dataframe (if else) - python

I have a data frame called Install_Date. I want to assign values to another data frame called age under two conditions- if value in Install_Date is null then age = current year - plant construct date, if value is not null then age = current year - INPUT_Asset["Install_Date"],
This is the code I have. First condition works fine but the second condition still gives 0 as values. :
Plant_Construct_Year = 1975
this_year= 2020
for i in INPUT_Asset["Install_Date"]:
if i != 0.0:
INPUT_Asset["Asset_Age"] = this_year- INPUT_Asset["Install_Date"]
else
INPUT_Asset["Asset_Age"] = this_year- Plant_Construct_Year

INPUT_Asset["Install_Date"] = pd.to_numeric(INPUT_Asset["Install_Date"], errors='coerce').fillna(0)
INPUT_Asset["Asset_Age"] = np.where(INPUT_Asset["Install_Date"] ==0.0, this_year- Plant_Construct_Year,INPUT_Asset["Asset_Age"])
INPUT_Asset["Asset_Age"] = np.where(INPUT_Asset["Install_Date"] !=0.0, this_year- INPUT_Asset["Install_Date"],INPUT_Asset["Asset_Age"])
print(INPUT_Asset["Asset_Age"])

Related

Assign Variables Based on Datagrame Values

I have a dataframe in which I am trying to define variables according to the values of particular cells in the dataframe in order to populate a (currently) empty final column based on the relationship between the price targets and current prices of the companies. Currently, the dataframe I’m working with looks like this, with the index being the companies:
Company
Current Price
High
Median
Low
Suggest
Company 1
$296.12
$410.00
$398.00
$365.43
Company 2
$143.18
$212.05
$200.34
$155.12
Company 3
$184.23
$214.09
$192.88
$123.63
How would I assign a variables (for example: target_high(company) = value in the “ticker, target_high” cell)? I don't think I can hard code it because the list of companies will be constantly changing. So far I’ve tried the following but it doesn’t seem to work:
for ticker in Company_List:
target_high(company) = str(Target_Frame.loc[ticker, "High"])
target_mid(company) = str(Target_Frame.loc[ticker, "Median"])
target_low(company) = str(Target_Frame.loc[ticker, "Low"])
current_price = str(Target_Frame.loc[ticker, "Price"])
if current_price(ticker) > target_high(ticker):
Target_Frame.loc[[ticker], ['Suggest']] = "Sell"
elif current_price(ticker) < target_low(ticker):
Target_Frame.loc[[ticker], ['Suggest']] = "Buy"
elif target_mid(ticker) < current_price(ticker) < target_high(ticker):
Target_Frame.loc[[ticker], ['Suggest']] = "Hold"
elif target_low(ticker) < current_price(ticker) < target_mid(ticker):
Target_Frame.loc[[ticker], ['Suggest']] = "Consider"
Thank you!

Otherwise you could use np.where or .map (see this question) with the conditions inside, rather than creating separate variables. So maybe something like this:
Target_Frame["Suggest"] = np.where(
Target_Frame["Price"] > Target_Frame["High"], "Sell", # if above high then sell
np.where(Target_Frame["Price"] < Target_Frame["Low"], "Buy", # if below low then buy
np.where(Target_Frame["Price"].between(
Target_Frame["Median"], Target_Frame["High"]), "Hold", # if between median and high then hold
"Consider"))) # else consider

Matched 3 different column element of 2 different dataframe

I am trying to solve a problem where I have two dataframe which are df1 and df2. Both dataframe has the same column. I wanted to check if df1['column1'] == df2["column1"] and df1['column2'] == df2['column2] and df1['column3'] == df2['column3'] if this true wanted to get index of the both dataframe where condition is matched. I tried this but it takes a long time because I have around 250 000 row dataframe. Does anyone suggest some efficient way to find out this?
Tried solution :
from datetime import datetime
MS_counter = 0
matched_ws_index = []
start = datetime.now()
for MS_id in Mastersheet_df["Index"]:
WS_counter = 0
for WS_id in Weekly_sheet_df["Index"]:
if (Weekly_sheet_df.loc[WS_counter,"Trial ID"] == Mastersheet_df.loc[MS_counter,"Trial ID"]) and (Mastersheet_df.loc[MS_counter,"Biomarker Type"] == Weekly_sheet_df.loc[WS_counter,"Biomarker Type"]) and (WS_id == MS_id): # match trial id
print("Trial id, index and biomarker type are matched")
print(WS_counter)
print(MS_counter)
matched_ws_index.append(WS_counter)
WS_counter +=1
MS_counter +=1
end = datetime.now()
print("The time of execution of above program is :",
str(end-start)[5:])
Expected output is :
If above three condition is true it should gives the dataframe index postion like this
Matched
df1 index is = 170
Matched df2 index is = 658

CSV: How to find name of the value from the list

I have a code that reads CSV file which has 3 columns: Zone, Number, and ARPU and I try to write a recommendation system that finds the best match for each value of ARPU from the list provided in the code (creates column "Suggested Plan"). Also, it finds the next greater value (creates column "Potential updated plan") and next lower value("Potential downgrade plan"):
tp_usp15 = 1500
tp_usp23 = 2300
tp_usp27 = 2700
list_usp = [tp_usp15,tp_usp23, tp_usp27]
tp_bsnspls_s = 600
tp_bsnspls_steel = 1300
tp_bsnspls_chrome = 1800
list_bsnspls = [tp_bsnspls_s,tp_bsnspls_steel,tp_bsnspls_chrome]
tp_bsnsrshn10 = 1000
tp_bsnsrshn15 = 1500
tp_bsnsrshn20 = 2000
list_bsnsrshn = [tp_bsnsrshn10,tp_bsnsrshn15,tp_bsnsrshn20]
#Common list#
common_list = list_usp + list_bsnspls + list_bsnsrshn
import pandas as pd
def get_plans(p):
best = min(common_list, key=lambda x : abs(x - p['ARPU']))
best_index = common_list.index(best) # get location of best in common_list
if best_index < len(common_list) - 1:
next_greater = common_list[best_index + 1]
else:
next_greater = best # already highest
if best_index > 0:
next_lower = common_list[best_index - 1]
else:
next_lower = best # already lowest
return best, next_greater, next_lower
`common_list = list_usp + list_bsnspls + list_bsnsrshn
common_list = sorted(common_list) # ensure it is sorted
df = pd.read_csv('root/test.csv')
df[['Suggested plan', 'Potential updated plan', 'Potential downgraded plan']] = df.apply(get_plans, axis=1, result_type="expand")
df.to_csv('Recommendation System.csv') `
It creates 3 additional columns and does the corresponding task (best match or closes value, next greater value, and next smaller value).The code works perfectly but as you can see each numeric value has its name
How to change the code to create additional columns with name next to new columns with numeric values?
For example, right now code produces:
Zone, Number, ARPU, Suggested plan, Potential Updated Plan, and Potential downgrade plan
!BUT! I need to create:
Zone, Number, ARPU, Suggested plan (numeric), Suggested plan (name), Potential Updated Plan(numeric), Potential Updated Plan(name), Potential downgrade plan (numeric),Potential downgrade plan(name)
Where columns with (name) will show the corresponding name to the value used in (numeric) columns. Thanks in advance, guys!
Photo examples:
Here is the starting CSV file.
Then, after executing the code I have this:
And I want to create additional columns with corresponding names of valuables. Example columns in in yellow

pandas get the min/max value of a row in a dataframe of only those rows that contain a certain string in another column

I feel really stupid now, this should be easy.
I got good help here how-to-keep-the-index-of-my-pandas-dataframe-after-normalazation-json
I need to get the min/max value in the column 'price' only where the value in the column 'type' is buy/sell. Ultimately I want to get back the 'id' also for that specific order.
So first of I need the price value and second I need to get back the value of 'id' corresponding.
You can find the dataframe that I'm working with in the link.
What I can do is find the min/max value of the whole column 'price' like so :
x = df['price'].max() # = max price
and I can sort out all the "buy" type like so:
d = df[['type', 'price']].value_counts(ascending=True).loc['buy']
but I still can't do both at the same time.

you have to use the .loc method in the dataframe in order to filter the type.
import pandas as pd
data = {"type":["buy","other","sell","buy"], "price":[15,222,11,25]}
df = pd.DataFrame(data)
buy_and_sell = df.loc[df['type'].isin(["sell","buy"])]
min_value = buy_and_sell['price'].min()
max_value = buy_and_sell['price'].max()
min_rows = buy_and_sell.loc[buy_and_sell['price']==min_value]
max_rows = buy_and_sell.loc[buy_and_sell['price']==max_value]
min_rows and max_rows can contain multiple rows because is posible that the same min price is repeated.
To extract the index just use .index.

hbid = df.loc[df.type == 'buy'].min()[['price', 'txid']]
gives me the lowest value of price and the lowest value of txid and not the id that belongs to the order with lowest price . . any help or tips would be greatly appreciated !
0 OMG4EA-Z2WUP-AQJ2XU None ... buy 0.00200000 XBTEUR # limit 14600.0
1 OBTJMX-WTQSU-DNEOES None ... buy 0.00100000 XBTEUR # limit 14700.0
2 OAULXQ-3B5WJ-LMLSUC None ... buy 0.00100000 XBTEUR # limit 14800.0
[3 rows x 23 columns]
highest buy order =
14800.0
here the id and price . . txid =
price 14600.0
txid OAULXQ-3B5WJ-LMLSUC

I' m still not sure how your line isin works. buy_and_sell not specified ;)
How I did it -->
I now first found the highest buy, then found the 'txid' for that price, then I had to remove the index from the returned series. And finally I had to remove a whitespace before my string. no idea how it came there
def get_highest_sell_txid():
hs = df.loc[df.type == 'sell', :].max()['price']
hsid = df.loc[df.price == hs, :]
xd = hsid['txid']
return xd.to_string(index=False)
xd = get_highest_sell_txid()
sd = xd.strip()
cancel_order = 'python -m krakenapi CancelOrder txid=' + sd #
subprocess.run(cancel_order)

How to create a function for a dataframe with pandas

I have this data frame of clients purchases and I would like to create a function that gave me the total purchases for a given input of month and year.
I have a dataframe (df) with lots of columns but i'm going to use only 3 ("year", "month", "value")
This is what I'm trying but not working:
def total_purchases():
y = input('Which year do you want to consult?')
m = int(input('Which month do you want to consult?')
sum = []
if df[df['year']== y] & df[df['month']== m]:
for i in df:
sum = sum + df[df['value']]
return sum

You're close, you need to ditch the IF statement and the For loop.
additionally, when dealing with multiple logical operators in pandas you need to use parenthesis to seperate the conditions.
def total_purchases(df):
y = input('Which year do you want to consult? ')
m = int(input('Which month do you want to consult? '))
return df[(df['year'].eq(y)) & (df['month'].eq(m))]['value'].sum()
setup
df_p = pd.DataFrame({'year' : ['2011','2011','2012','2013'],
'month' : [1,2,1,2],
'value' : [200,500,700,900]})
Test
total_purchases(df_p)
Which year do you want to consult? 2011
Which month do you want to consult? 2
500

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Conditional operations in dataframe (if else) - python

Related

Assign Variables Based on Datagrame Values

Matched 3 different column element of 2 different dataframe

CSV: How to find name of the value from the list

pandas get the min/max value of a row in a dataframe of only those rows that contain a certain string in another column

How to create a function for a dataframe with pandas

Categories

Resources