I'm trying to write a conditional statement for fields from an imported CSV (data_dict), however, my current code using np.where does not seem to work. I am trying to find the age from (data_dict['Age']) of people depending on whether they are male or female, from (data_dict['Gender']). How would I approach solving this? Please see my code below. Many thanks.
Sample Data
Index,Age,Year,Crash_Month,Crash_Day,Crash_Time,Road_User,Gender,Crash_Type,Injury_Severity,Crash_LGA,Crash_Area_Type
1,37,2000,1,1,4:30:59,PEDESTRIAN,MALE,UNKNOWN,1,MARIBYRNONG,MELBOURNE
2,22,2000,1,1,0:07:35,DRIVER,MALE,ADJACENT DIRECTION,1,YARRA,MELBOURNE
3,47,2000,1,1,4:51:37,DRIVER,FEMALE,ADJACENT DIRECTION,0,YARRA,MELBOURNE
4,70,2000,1,1,4:27:56,DRIVER,MALE,ADJACENT DIRECTION,1,BANYULE,MELBOURNE
Expected Result
Age of Males: [37,22,70,...]
Age of Females: [47,...]
Current Result
Age of Males: []
Age of Females: []
gender1 = np.array(data_dict['Gender'])
age1 = np.array(data_dict['Age'])
age_females = age1[np.where(gender1 == 'Female')]
age_males = age1[np.where(gender1 == 'Male')]
Related
I have this dataframe of drug info and would like to filter the data based on user input.
If I explicitly state the search as follows the result are ok:
df = pd.read_excel("safety.xlsx")
search = ['Lisinopril', 'Perindopril']
print(df.query('`Drug Name` in #search'))
But, if I try to pass the search to user input I can only enter a single drug name without error:
while True:
search = input("Enter drug name...")
print(df.query('`Drug Name` in #search'))
if search == "exit":
break
So I would like for the user to be able to enter a list of drugs, not one at a time. If I enter Lisinopril, Perindopril the result is 'Empty DataFrame'
Terminal:
Enter drug name...Lisinopril
Drug Name U&E C
0 Lisinopril Before commencing, at 1-2 weeks after starting... NaN
Enter drug name...Lisinopril, Perindopril
Empty DataFrame
Columns: [Drug Name, U&E, C]
Index: []
Thanks for any help!
If you would like to query multiple fields in the dataframe, you should append the user input to an array.
query = []
in = input("Drug name: ")
query.append(in)
print(df.query('`Drug Name` in #search'))
If you would like to query a single drug multiple times you should use the == notation in the query.
user_input = input("Drug name: ")
subframe = df.query('`Drug Name` == #user_input'))
I recently came across this website using api to call a company and display the company numbers as an output. I used this as a basis and then was trying to use one specific company number where each time the code loops the number adds 1. How would I be able to do this at the moment the display is random numbers, any help would be great thank you?
For example, currently, the code runs and displays a number of random numbers i want it to display numbers starting from 09628955 then adding 1 to that number 09628956,09628957 etc.
ComapnySICS.py file
df = pd.read_csv(company_numbers_file)
ch_api = CompaniesHouseService("API KEY")
tic = datetime.datetime.now()
for index, row in df.iterrows():
company_number = row["Company Number"]
ch_profile = ch_api.get_company_profile(company_number)
df.at[index, "Company Name"] = ch_profile.get("company_name", None)
sics = ch_profile.get("sic_codes", [None])
for i in range(0,len(sics)):
df.at[index, f"SIC {i+1}"] = sics[i]
print(f"Number: {row['Company Number']} | "\
f"Name: {df.at[index,'Company Name']}")
#End timer
toc = datetime.datetime.now()
avg_time = ((toc-tic).total_seconds())/(len(df.index)-1)
print(f"Average time between API calls: {avg_time:0.2f} seconds")
I need to find top 10 customers for each city in terms of their repayment amount by different products and by different time periods i.e. year or month. The user should be able to specify the product (Gold/Silver/Platinum) and time period (yearly or monthly) and the function should automatically take these inputs while identifying the top 10 customers.
So what I did:
create another dataset:
Cust_table_repayment=customer_repayment[['Customer','Age','City','Product','Limit','Company','Segment','Month','Amount']]
converted the month column to pd.to_datetime
Cust_table_repayment['Month']=pd.to_datetime(Cust_table_repayment['Month'])
created another variable in new dataset : monthly,yearly
Cust_table_repayment['monthly']=Cust_table_repayment['Month'].apply(lambda x:x.month)
Cust_table_repayment['yearly']=Cust_table_repayment['Month'].apply(lambda x:x.year)
Then created the function and this is the part where I'm stuck and facing problem:
def top10Customers(prod_cat,time_period):
return Cust_table_repayment.loc[(Cust_table_repayment['Product']==prod_cat)&((Cust_table_repayment.monthly==time_period)|(Cust_table_repayment.yearly==time_period))].groupby(['Customer','Product','City','Month']).Amount.sum().reset_index().sort_values('Amount',ascending=False).head(10)
then I declared the input:
prod_cat=str(input("Please Enter Product either in Gold/Silver/Platinum: "))
time_period=input("Please Enter Time Period and time period should be in yearly/monthly: ")
then I stored that function in new dataset
Top_10_customer=top10Customers(prod_cat,time_period)
and called that dataset but getting no output
Top_10_customer
My expected output: when I search yearly or monthly then it should display either monthly or yearly like this:
Please help!
Please check the data type of input data and data type of each column in the given dataframe.
The column "monthly" and "yearly" have data type "int"
The input value has data type "str"
So the comparison in module top10Customers doesn't work.
Cust_table_repayment = customer_repayment[['Customer','Age','City'
,'Product','Limit','Company'
,'Segment','Month','Amount']].copy()
# check the date format, if it is "YYYY-MM-DD", put yearfirst=True
Cust_table_repayment['Month'] = pd.to_datetime(Cust_table_repayment['Month'])
# use Cust_table_repayment["monthly"].dtypes to check the type, which is "int"
Cust_table_repayment['monthly'] = Cust_table_repayment['Month'].apply(lambda x : x.month)
# use Cust_table_repayment["yearly"].dtypes to check the type, which is "int"
Cust_table_repayment['yearly'] = Cust_table_repayment['Month'].apply(lambda x:x.year)
#### Solution
# if you want to convert the monthly and yearly to string
Cust_table_repayment["monthly"] = Cust_table_repayment["Month"].dt.month.astype(str)
# else, convert input to appropriate type
time_period = int(input("Please Enter Time Period and time period should be in yearly/monthly: "))
# and do a little preprocessing for prod_cat
prod_cat = str(input("Please Enter Product either in Gold/Silver/Platinum:")).lower().strip()
def top10Customers(prod_cat,time_period):
return Cust_table_repayment.loc[(Cust_table_repayment['Product'].str.lower()==prod_cat)
&(
(Cust_table_repayment.monthly==time_period)
|(Cust_table_repayment.yearly==time_period)
)].groupby(['Customer', 'Product', 'City', 'Month']).Amount.sum().reset_index().sort_values('Amount', ascending=False).head(10)
# another way
def top10Customers2(prod_cat, time_period):
selected_df = Cust_table_repayment.loc[(Cust_table_repayment['Product'].str.lower()==prod_cat)
& ((Cust_table_repayment.monthly==time_period)
|(Cust_table_repayment.yearly==time_period)
)].copy()
selected_df = selected_df.groupby(['Customer', 'Product', 'City', 'Month']).Amount.sum().reset_index()
return selected_df.nlargest(10, 'Amount')
I am currently building a fake dataset to play with. I have one dataset, called patient_data that has the patient's info:
patient_data = pd.DataFrame(np.random.randn(100,5),columns='id name dob sex state'.split())
This gives me a sample of 100 observations, with variables like name, birthday, etc.
Clearly, some of these (like name sex and state) are categorical variables, and makes no sense to have random numbers attached to it.
So for "sex" column, I created a function that will turn every random number <0 to read "male" and everything else to read "female." I would like to create a new variable called "gender" and store this inside this variable:
def malefemale(x):
if x < 0:
print('male')
else:
print('female')
And then I wrote a code to apply this function into the data frame to officially create a new variable "gender."
patient_data.assign(gender = patient_data['sex'].apply(malefemale))
But when I type "patient_data" in the jupiter notebook, I do not see the data frame updated to include this new variable. Seems like nothing was done.
Does anybody know what I can do to permanently add this new gender variable into my patient_data dataframe, with the function properly working?
I think you need assign back and for new values use numpy.where:
patient_data = patient_data.assign(gender=np.where(patient_data['sex']<0, 'male', 'female'))
print(patient_data.head(10))
id name dob sex state gender
0 0.588686 1.333191 2.559850 0.034903 0.232650 female
1 1.606597 0.168722 0.275342 -0.630618 -1.394375 male
2 0.912688 -1.273570 1.140656 -0.788166 0.265234 male
3 -0.372272 1.174600 0.300846 1.959095 -1.083678 female
4 0.413863 0.047342 0.279944 1.595921 0.585318 female
5 -1.147525 0.533511 -0.415619 -0.473355 1.045857 male
6 -0.602340 -0.379730 0.032407 0.946186 0.581590 female
7 -0.234415 -0.272176 -1.160130 -0.759835 -0.654381 male
8 -0.149291 1.986763 -0.675469 -0.295829 -2.052398 male
9 0.600571 -1.577449 -0.906590 1.042335 -2.104928 female
You need to change your custom function as
def malefemale(x):
if x < 0:
return "Male"
else:
return "female"
then simply apply the custom function
patient_data['gender'] = patient_data['sex'].apply(malefemale)
I am new to Python, and working on a project for school.I need to find a similar user profile based on the dataset and user inputs. Essentially a user inputs his/her information, i would like to assign a grade and interest rate of a similar existing applicant in dataframe. However, I am failing miserably. Could somoene please help.
loading data
df = pd.read_csv("LoanStats_2017Q1.csv")
df = df[["verification_status","loan_amnt", "term","grade","int_rate","dti","delinq_amnt","annual_inc", "emp_length" ]]
loan = int(input("What loan amount are you looking to obtain? "))
inc = int(input("What is your annual income? "))
dti = int(input("What is your current Debt-to-Equity Ratio (DTI)? "))
lst=[loan,inc,dti]
similar user is someone within 1% of potential applicant
def simUser(a,b):
return (abs(a-b)/b) <=0.01
if dti > 35:
print('\n'+"Your Debt-to-Income Ratio is too High. Please lower it before proceeding.")
else:
print('\n'+"analyzing..."+'\n')
#create dataframe for analysis
columns = ["loan_amnt","annual_inc","dti"]
lst1 = list(zip(lst,columns))
#go over the loan grades and interest rates
for rowNum in range(len(df)):
lamnt = df.iloc[rowNum]['loan_amnt']
ainc = df.iloc[rowNum]['annual_inc']
dtiu = df.iloc[rowNum]['dti']
#scan data for similar a similar loan profile
for user_input,col in lst1:
lst2 = df[simUser(df.iloc[rowNum][col],user_input) == True]