I need to find top 10 customers for each city in terms of their repayment amount by different products and by different time periods i.e. year or month. The user should be able to specify the product (Gold/Silver/Platinum) and time period (yearly or monthly) and the function should automatically take these inputs while identifying the top 10 customers.
So what I did:
create another dataset:
Cust_table_repayment=customer_repayment[['Customer','Age','City','Product','Limit','Company','Segment','Month','Amount']]
converted the month column to pd.to_datetime
Cust_table_repayment['Month']=pd.to_datetime(Cust_table_repayment['Month'])
created another variable in new dataset : monthly,yearly
Cust_table_repayment['monthly']=Cust_table_repayment['Month'].apply(lambda x:x.month)
Cust_table_repayment['yearly']=Cust_table_repayment['Month'].apply(lambda x:x.year)
Then created the function and this is the part where I'm stuck and facing problem:
def top10Customers(prod_cat,time_period):
return Cust_table_repayment.loc[(Cust_table_repayment['Product']==prod_cat)&((Cust_table_repayment.monthly==time_period)|(Cust_table_repayment.yearly==time_period))].groupby(['Customer','Product','City','Month']).Amount.sum().reset_index().sort_values('Amount',ascending=False).head(10)
then I declared the input:
prod_cat=str(input("Please Enter Product either in Gold/Silver/Platinum: "))
time_period=input("Please Enter Time Period and time period should be in yearly/monthly: ")
then I stored that function in new dataset
Top_10_customer=top10Customers(prod_cat,time_period)
and called that dataset but getting no output
Top_10_customer
My expected output: when I search yearly or monthly then it should display either monthly or yearly like this:
Please help!
Please check the data type of input data and data type of each column in the given dataframe.
The column "monthly" and "yearly" have data type "int"
The input value has data type "str"
So the comparison in module top10Customers doesn't work.
Cust_table_repayment = customer_repayment[['Customer','Age','City'
,'Product','Limit','Company'
,'Segment','Month','Amount']].copy()
# check the date format, if it is "YYYY-MM-DD", put yearfirst=True
Cust_table_repayment['Month'] = pd.to_datetime(Cust_table_repayment['Month'])
# use Cust_table_repayment["monthly"].dtypes to check the type, which is "int"
Cust_table_repayment['monthly'] = Cust_table_repayment['Month'].apply(lambda x : x.month)
# use Cust_table_repayment["yearly"].dtypes to check the type, which is "int"
Cust_table_repayment['yearly'] = Cust_table_repayment['Month'].apply(lambda x:x.year)
#### Solution
# if you want to convert the monthly and yearly to string
Cust_table_repayment["monthly"] = Cust_table_repayment["Month"].dt.month.astype(str)
# else, convert input to appropriate type
time_period = int(input("Please Enter Time Period and time period should be in yearly/monthly: "))
# and do a little preprocessing for prod_cat
prod_cat = str(input("Please Enter Product either in Gold/Silver/Platinum:")).lower().strip()
def top10Customers(prod_cat,time_period):
return Cust_table_repayment.loc[(Cust_table_repayment['Product'].str.lower()==prod_cat)
&(
(Cust_table_repayment.monthly==time_period)
|(Cust_table_repayment.yearly==time_period)
)].groupby(['Customer', 'Product', 'City', 'Month']).Amount.sum().reset_index().sort_values('Amount', ascending=False).head(10)
# another way
def top10Customers2(prod_cat, time_period):
selected_df = Cust_table_repayment.loc[(Cust_table_repayment['Product'].str.lower()==prod_cat)
& ((Cust_table_repayment.monthly==time_period)
|(Cust_table_repayment.yearly==time_period)
)].copy()
selected_df = selected_df.groupby(['Customer', 'Product', 'City', 'Month']).Amount.sum().reset_index()
return selected_df.nlargest(10, 'Amount')
Related
I am learning python and would like to practice some basic financial analysis. How can I sum the values of the PEratio variables that come from a list built from input?
import yfinance as yf
#user input
list = input("Enter ticker(s): ")
#building a list
ticker_list = list.split(", ")
all_symbols = " ".join(list)
all_symbols = " ".join(ticker_list)
tickers = yf.Tickers(all_symbols)
#calling data
for ticker in ticker_list:
price = tickers.tickers[ticker].info["currentPrice"]
market_cap = tickers.tickers[ticker].info["marketCap"]
PEratio = tickers.tickers[ticker].info["trailingPE"]
FWPEratio = tickers.tickers[ticker].info["forwardPE"]
print(ticker,"\nMarket cap:", market_cap,"\nShare Price:", price, "\nTrailingPE:", PEratio, "\nForward PE:", FWPEratio)
#analysis: I would like to sum all the values for the PEratios and divide them by the list size to compute the average
print(sum(PEratio["trailingPE"])/float(len(ticker_list))) #This isnt correct but is my thought process
EDIT: To clarify, the error I am receiving is the final line of code:
Traceback (most recent call last):
File "C:\Users\Sean\OneDrive\Programming\yfinance\multiples\test.py", line 24, in <module>
print(sum(PEratio["trailingPE"])/float(len(ticker_list))) #This isnt correct but is my thought process
TypeError: 'float' object is not subscriptable
Additionally, it tries to print the average underneath each ticker individually, and not as a sum but as the individual PE ratio divided by the list size
The error that you are receiving is because you are trying to subscript an item that doesn't have the item that you are looking for in your sum()
This is because you are doing ["trailingPE"] twice, first when you are setting PEratio and then a second time in the sum() calculation. Therefore to fix the issue that you have you should instead just do.
print(sum(PEratio)/float(len(ticker_list)))
Assuming that PEratio only points toward a single ratio and not the full list:
However, you want to sum() up all of the PEratios for the entire group of stock tickers which you would likely not be accomplished with this statement. If you were looking to average out all of the PEratios you would likely have to do something like this:
PEratios = [] # to store the PEratios for each company
for ticker in ticker_list:
price = tickers.tickers[ticker].info["currentPrice"]
market_cap = tickers.tickers[ticker].info["marketCap"]
PEratio = tickers.tickers[ticker].info["trailingPE"]
PEratios.append(PEratio) # adding the PEratios to the list
FWPEratio = tickers.tickers[ticker].info["forwardPE"]
print(ticker,"\nMarket cap:", market_cap,"\nShare Price:", price, "\nTrailingPE:", PEratio, "\nForward PE:", FWPEratio)
print(sum(PEratios)/float(len(ticker_list))) # using the same formula with the list object `PEratios` just outside of the for loop
Information Class uses the marks from Employee class and the Date Of Joining date from the Joining Detail class to calculate the top 3 Employees based on their Ratings and then Display, using read Data, all the details on these employees in Ascending order of their Date Of Joining.
I am unable to retrieve top 3 employees based on ratings. Is sorted() method works here or any other method to use?
from datetime import date
class Employee():
num_emp=input("Enter the number of employees: ")
Gender=""
Salary=0
PerformanceRating=0
def __init__(self,Gender,Salary,PerformanceRating):
self.EmployeeID =input("Enter employeeid: ")
self.Gender = Gender
self.Salary = Salary
self.PerformanceRating = PerformanceRating
def get(self):
print("EmployeeID\t:", self.EmployeeID, "Employee Gender\t:", self.Gender, "Employee Salary\t:", self.Salary, "Employee PerformanceRating:", self.PerformanceRating)
class Joiningdetail():
DateOfJoining= date(year=int(input("year: ")), month=int(input("month:")), day=int(input("day:")))
def __init__ (self,DateOfJoining):
self.DateOfJoining=DateOfJoining
def getDoJ(self):
print("Employee DOJ is:", self.DateOfJoining)
class Information(Employee,Joiningdetail):
def __init__(self,Gender,Salary,PerformanceRating):
super().__init__(self,Salary,PerformanceRating)
def readData(self,PerformanceRating):
#self.PerformanceRating.sort()
sorted(PerformanceRating())
def displayData(self,DateOfJoining):
print(self.getDoJ)
emp1=Employee("Female",34343,2)
emp1.get()
doj_emp1=Joiningdetail((2004, 3, 4))
doj_emp1.getDoJ()
emp2=Employee("Female",34579,4)
emp2.get()
doj_emp2=Joiningdetail((2000, 5, 7))
doj_emp2.getDoJ()
emp3=Employee("Male",34982,4)
emp3.get()
doj_emp3=Joiningdetail((2001, 9, 10))
doj_emp3.getDoJ()
emp4=Employee("Male",34579,4)
emp4.get()
doj_emp4=Joiningdetail((2020, 5, 6))
doj_emp4.getDoJ()
top3_rating= Information("Male",34000,4,)
top3_rating.displayData(5)
print (top3_rating.readData(3))
You can use .sort() to sort a list.
Be advised that this line of code top3_rating= Information("Male",34000,4,) is an integer and not a list, and you can't sort an int :)
Send the list. of users you want to preform your code on.
I'm still trying to figure out what you want to do with your code.
It looks like you do not have a data structure with the Employee data in it. When you do your .get() call for every employee created, try to append it to a list (or a tuple if you do not want to perform changes in the data).
It looks like you want to have a top3 object:
top3_rating = Information("Male", 34000, 4, )
top3_rating.displayData(5)
But you are referring to a class object that is another kind go employee, instead of a list or some kind of data manager.
The program includes a class called employee which contains three private numbers name of type string, age of type Integer, a salary of type float, a function set_data () to read class elements and a get_data () to print the above information based on salary a function year_salary () to print the yearly salary for the employee. An increment of 30% of the employee's salary is given if the employee's age ranges between 45-74. Note: Enter information about three employees (use array)
I have the following table:
user
id
time
event
a
1
2021.12.12 10:08:39:399
viewed
a
2
2021.12.12 10:08:39:402
clicked
a
3
2021.12.23 3:43:19:397
viewed
a
4
2021.12.23 3:47:11:131
viewed
a
5
2021.12.30 19:20:31:493
viewed
How would I go about trying to find the conversion rate grouped by each user? By this I mean the
percentage of views that are followed up by a click within a certain timeframe (lets say 30s). In this case user a has viewed four times and clicked once, with the click being in the allotted
timeframe - giving us a conversion rate of 1/4 = 25%.
I tried doing this by splitting the frame by event then using pd.merge_asof() which works for most cases but sometimes user id's are replaced by nulls and sometimes not all viewed events are carried over into the new table. Would appreciate the help!
Try this:
# Convert the `time` column to Timestamp, which make time calculation a lot easier
df["time"] = pd.to_datetime(df["time"], format="%Y.%m.%d %H:%M:%S:%f")
# Sort the dataframe
df = df.sort_values(["user", "time"])
# Definition of success
success = (
df["user"].eq(df["user"].shift()) # same user as previous row
& df["event"].eq("clicked") # current row is "clicked"
& df["event"].shift().eq("viewed") # previous row is "viewed"
& df["time"].diff().dt.total_seconds().lt(30) # time difference is less than 30 secs
)
# Assemble the result
result = (
df.assign(is_view=lambda x: x["event"].eq("viewed"), success=success)
.groupby("user").agg(
views=("is_view", "sum"), # count number of views
success=("success", "sum") # count number of successes
).assign(
rate=lambda x: x["success"] / x["views"]
)
)
You could parse clicked to 1 and viewed to 0 and then do a groupby with sum and count aggregations. Afterwards you divide the count column from the sum column and get your result.
df # your data
df["success"] = df["event"].apply(lambda x: 1 if x == "clicked" else 0)
results = df.groupby("user").agg({'success' : ['sum', 'count']})
results["conversion"] = results["sum"] / results["count"]
I have a data frame called Install_Date. I want to assign values to another data frame called age under two conditions- if value in Install_Date is null then age = current year - plant construct date, if value is not null then age = current year - INPUT_Asset["Install_Date"],
This is the code I have. First condition works fine but the second condition still gives 0 as values. :
Plant_Construct_Year = 1975
this_year= 2020
for i in INPUT_Asset["Install_Date"]:
if i != 0.0:
INPUT_Asset["Asset_Age"] = this_year- INPUT_Asset["Install_Date"]
else
INPUT_Asset["Asset_Age"] = this_year- Plant_Construct_Year
INPUT_Asset["Install_Date"] = pd.to_numeric(INPUT_Asset["Install_Date"], errors='coerce').fillna(0)
INPUT_Asset["Asset_Age"] = np.where(INPUT_Asset["Install_Date"] ==0.0, this_year- Plant_Construct_Year,INPUT_Asset["Asset_Age"])
INPUT_Asset["Asset_Age"] = np.where(INPUT_Asset["Install_Date"] !=0.0, this_year- INPUT_Asset["Install_Date"],INPUT_Asset["Asset_Age"])
print(INPUT_Asset["Asset_Age"])
I am new to Python, and working on a project for school.I need to find a similar user profile based on the dataset and user inputs. Essentially a user inputs his/her information, i would like to assign a grade and interest rate of a similar existing applicant in dataframe. However, I am failing miserably. Could somoene please help.
loading data
df = pd.read_csv("LoanStats_2017Q1.csv")
df = df[["verification_status","loan_amnt", "term","grade","int_rate","dti","delinq_amnt","annual_inc", "emp_length" ]]
loan = int(input("What loan amount are you looking to obtain? "))
inc = int(input("What is your annual income? "))
dti = int(input("What is your current Debt-to-Equity Ratio (DTI)? "))
lst=[loan,inc,dti]
similar user is someone within 1% of potential applicant
def simUser(a,b):
return (abs(a-b)/b) <=0.01
if dti > 35:
print('\n'+"Your Debt-to-Income Ratio is too High. Please lower it before proceeding.")
else:
print('\n'+"analyzing..."+'\n')
#create dataframe for analysis
columns = ["loan_amnt","annual_inc","dti"]
lst1 = list(zip(lst,columns))
#go over the loan grades and interest rates
for rowNum in range(len(df)):
lamnt = df.iloc[rowNum]['loan_amnt']
ainc = df.iloc[rowNum]['annual_inc']
dtiu = df.iloc[rowNum]['dti']
#scan data for similar a similar loan profile
for user_input,col in lst1:
lst2 = df[simUser(df.iloc[rowNum][col],user_input) == True]