Pass a list to a pandas query via user input - python

I have this dataframe of drug info and would like to filter the data based on user input.
If I explicitly state the search as follows the result are ok:
df = pd.read_excel("safety.xlsx")
search = ['Lisinopril', 'Perindopril']
print(df.query('`Drug Name` in #search'))
But, if I try to pass the search to user input I can only enter a single drug name without error:
while True:
search = input("Enter drug name...")
print(df.query('`Drug Name` in #search'))
if search == "exit":
break
So I would like for the user to be able to enter a list of drugs, not one at a time. If I enter Lisinopril, Perindopril the result is 'Empty DataFrame'
Terminal:
Enter drug name...Lisinopril
Drug Name U&E C
0 Lisinopril Before commencing, at 1-2 weeks after starting... NaN
Enter drug name...Lisinopril, Perindopril
Empty DataFrame
Columns: [Drug Name, U&E, C]
Index: []
Thanks for any help!

If you would like to query multiple fields in the dataframe, you should append the user input to an array.
query = []
in = input("Drug name: ")
query.append(in)
print(df.query('`Drug Name` in #search'))
If you would like to query a single drug multiple times you should use the == notation in the query.
user_input = input("Drug name: ")
subframe = df.query('`Drug Name` == #user_input'))

Related

How can I display five rows of data based on user in Python?

df = pd.read_csv(CITY_DATA[city])
def user_stats(df,city):
"""Displays statistics of users."""
print('\nCalculating User Stats...\n')
start_time = time.time()
print('User Type Stats:')
print(df['User Type'].value_counts())
if city != 'washington':
print('Gender Stats:')
print(df['Gender'].value_counts())
print('Birth Year Stats:')
most_common_year = df['Birth Year'].mode()[0]
print('Most Common Year:',most_common_year)
most_recent_year = df['Birth Year'].max()
print('Most Recent Year:',most_recent_year)
earliest_year = df['Birth Year'].min()
print('Earliest Year:',earliest_year)
print("\nThis took %s seconds." % (time.time() - start_time))
print('-'*40)
I want to ask the user in the first step: "Do you want to see the first 5 rows of data?". if He typed yes it will show the first 5 rows, then it asks user again " Do you want to see the next 5 rows of data?" then he says yes and it shows the next 5 data. I need to keep asking until he says no.
Hints:
-We will show the data based on the location. namely, we will show the first 5 data in the first attempt, then the second 5 data for the second "yes" So we need to keep track of this. How? (I used start_loc variable)
-Please check the iloc function. It returns dataframe based on the position. For example, df.iloc[0:5] will return the first 5 rows of data.
Can I do it using the following code:
view_data = input('\nWould you like to view 5 rows of individual trip data? Enter yes or no\n')
start_loc = 0
while (?????):
print(df.iloc[????:????])
start_loc += 5
view_display = input("Do you wish to continue?: “).lower()
Something like this?
use your variable start_loc inside the iloc
just assign a boolean to your for loop, which you set to false if user types "no"
view_data = input('\nWould you like to view 5 rows of individual trip data? Enter yes or no\n')
start_loc = 0
keep_asking = True
while (keep_asking):
print(df.iloc[start_loc:start_loc + 5])
start_loc += 5
view_display = input("Do you wish to continue?: ").lower()
if view_display == "no":
keep_asking = False

Search for column in pandas

How do you search if a value exist in a specific row?
Example I have this file which contains the following:
ID Name
1 Mark
2 John
3 Mary
The user will input 1 and it will
print("the value already exist.")
But if the user input 4 it will add a new row containing 4 and
name = input('Name')
and update the file like this
ID Name
1 Mark
2 John
3 Mary
4 (userinput)
An easy approach will be:
import pandas as pd
bool_val = False
for i in range(0, df.shape[0]):
if str(df.iloc[i]['ID']) == str(input_str):
bool_val = False
break
else:
print("there")
bool_val = True
if bool_val == True:
df = df.append(pd.Series([input_str, name], index = ['ID', 'Name']), ignore_index=True)
Remember to add the parameter ignore_index to avoid TypeError. I added a bool value to avoid appending a row multiple times.
searchid=20 #use sys.argv[1] if needed to be passed as argument to the program. Or read it as raw_input
if str(searchid) in df.index.astype(str):
print("ID found")
else:
name=raw_input("ID not found. Specify the name for this ID to update the data:") #use input() if python version >= 3
df.loc[searchid]=[str(name)]
If ID is not index:
if str(searchid) in df.ID.values.astype(str):
print("ID found")
else:
name=raw_input("ID not found. Specify the name for this ID to update the data:") #use input() if python version >= 3
df.loc[searchid]=[str(searchid),str(name)]
specifying column headers to update during df update might avoid errors of mismatch:
df.loc[searchid]={'ID': str(searchid), 'Name': str(name)}
This should help
Also read at https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html, that mentions the inherent nature of append and concat to copy the full dataframe.
df.loc['ID'] will return the row containing the ID in the index of the dataframe. Assuming IDs are the index values of the df you are referring to.
If you have a list of IDs and wish to search for them all together then:
assuming:
listofids=['ID1','ID2','ID3']
df.loc[listofids]
will yield the rows containing the above IDs
If IDs are not in index then:
Assuming df['ids'] contain the given ID list:
'searchedID' in df.ids.values
will return True or False based on presence or absence

Select array entries by criteria

I'm trying to write a conditional statement for fields from an imported CSV (data_dict), however, my current code using np.where does not seem to work. I am trying to find the age from (data_dict['Age']) of people depending on whether they are male or female, from (data_dict['Gender']). How would I approach solving this? Please see my code below. Many thanks.
Sample Data
Index,Age,Year,Crash_Month,Crash_Day,Crash_Time,Road_User,Gender,Crash_Type,Injury_Severity,Crash_LGA,Crash_Area_Type
1,37,2000,1,1,4:30:59,PEDESTRIAN,MALE,UNKNOWN,1,MARIBYRNONG,MELBOURNE
2,22,2000,1,1,0:07:35,DRIVER,MALE,ADJACENT DIRECTION,1,YARRA,MELBOURNE
3,47,2000,1,1,4:51:37,DRIVER,FEMALE,ADJACENT DIRECTION,0,YARRA,MELBOURNE
4,70,2000,1,1,4:27:56,DRIVER,MALE,ADJACENT DIRECTION,1,BANYULE,MELBOURNE
Expected Result
Age of Males: [37,22,70,...]
Age of Females: [47,...]
Current Result
Age of Males: []
Age of Females: []
gender1 = np.array(data_dict['Gender'])
age1 = np.array(data_dict['Age'])
age_females = age1[np.where(gender1 == 'Female')]
age_males = age1[np.where(gender1 == 'Male')]

I want to find top 10 customer using def in python

I need to find top 10 customers for each city in terms of their repayment amount by different products and by different time periods i.e. year or month. The user should be able to specify the product (Gold/Silver/Platinum) and time period (yearly or monthly) and the function should automatically take these inputs while identifying the top 10 customers.
So what I did:
create another dataset:
Cust_table_repayment=customer_repayment[['Customer','Age','City','Product','Limit','Company','Segment','Month','Amount']]
converted the month column to pd.to_datetime
Cust_table_repayment['Month']=pd.to_datetime(Cust_table_repayment['Month'])
created another variable in new dataset : monthly,yearly
Cust_table_repayment['monthly']=Cust_table_repayment['Month'].apply(lambda x:x.month)
Cust_table_repayment['yearly']=Cust_table_repayment['Month'].apply(lambda x:x.year)
Then created the function and this is the part where I'm stuck and facing problem:
def top10Customers(prod_cat,time_period):
return Cust_table_repayment.loc[(Cust_table_repayment['Product']==prod_cat)&((Cust_table_repayment.monthly==time_period)|(Cust_table_repayment.yearly==time_period))].groupby(['Customer','Product','City','Month']).Amount.sum().reset_index().sort_values('Amount',ascending=False).head(10)
then I declared the input:
prod_cat=str(input("Please Enter Product either in Gold/Silver/Platinum: "))
time_period=input("Please Enter Time Period and time period should be in yearly/monthly: ")
then I stored that function in new dataset
Top_10_customer=top10Customers(prod_cat,time_period)
and called that dataset but getting no output
Top_10_customer
My expected output: when I search yearly or monthly then it should display either monthly or yearly like this:
Please help!
Please check the data type of input data and data type of each column in the given dataframe.
The column "monthly" and "yearly" have data type "int"
The input value has data type "str"
So the comparison in module top10Customers doesn't work.
Cust_table_repayment = customer_repayment[['Customer','Age','City'
,'Product','Limit','Company'
,'Segment','Month','Amount']].copy()
# check the date format, if it is "YYYY-MM-DD", put yearfirst=True
Cust_table_repayment['Month'] = pd.to_datetime(Cust_table_repayment['Month'])
# use Cust_table_repayment["monthly"].dtypes to check the type, which is "int"
Cust_table_repayment['monthly'] = Cust_table_repayment['Month'].apply(lambda x : x.month)
# use Cust_table_repayment["yearly"].dtypes to check the type, which is "int"
Cust_table_repayment['yearly'] = Cust_table_repayment['Month'].apply(lambda x:x.year)
#### Solution
# if you want to convert the monthly and yearly to string
Cust_table_repayment["monthly"] = Cust_table_repayment["Month"].dt.month.astype(str)
# else, convert input to appropriate type
time_period = int(input("Please Enter Time Period and time period should be in yearly/monthly: "))
# and do a little preprocessing for prod_cat
prod_cat = str(input("Please Enter Product either in Gold/Silver/Platinum:")).lower().strip()
def top10Customers(prod_cat,time_period):
return Cust_table_repayment.loc[(Cust_table_repayment['Product'].str.lower()==prod_cat)
&(
(Cust_table_repayment.monthly==time_period)
|(Cust_table_repayment.yearly==time_period)
)].groupby(['Customer', 'Product', 'City', 'Month']).Amount.sum().reset_index().sort_values('Amount', ascending=False).head(10)
# another way
def top10Customers2(prod_cat, time_period):
selected_df = Cust_table_repayment.loc[(Cust_table_repayment['Product'].str.lower()==prod_cat)
& ((Cust_table_repayment.monthly==time_period)
|(Cust_table_repayment.yearly==time_period)
)].copy()
selected_df = selected_df.groupby(['Customer', 'Product', 'City', 'Month']).Amount.sum().reset_index()
return selected_df.nlargest(10, 'Amount')

How to correctly replace data in text files based on input

I have the following problem. Im trying to replace the name based on the input of gender. If anyone could help improve my code it would be really appreciated.
The text file(duedate.txt):
User: Tommy
Gender: Male
Date due: 2020-02-18
The code I have so far is:
with open f = ('duedate.txt).read()
z = input("Please select gender to change)
zz = input("Please select new name")
if z == 'female'
line.startswith('User'):
field, value = line.split(:)
value = zz
print (zz)
I know the code isn't 100% right but the output, if Jessica was chosen as the name, should be:
User: Jessica
Gender: Female
Date due: 2020-02-18
This should work. Code explanation is given in the comments:
import pandas as pd
import numpy as np
# Read the text file into a dataframe
df = pd.read_csv('duedate.txt', sep = "\n",header=None)
# Do dataframe manipulations
df[['Variable','Value']] = df[0].str.split(':',expand=True)
del df[0]
# Collect inputs from user:
z = input("Please select gender to change")
zz = input("Please select new name")
# modify dataframe based on user inputs
df.loc[0,"Value"]=zz
df.loc[1,"Value"]=z
#Construct output column
df["Output"] = df["Variable"] + ": " + df["Value"] + "\n"
# Save the file back to disk
np.savetxt(r'duedate.txt', df["Output"].values,fmt='%s')

Categories

Resources