Working through my first project in pandas and trying to build a simple program that retrieves the ID's and information from a large CSV.
I'm able to get it working when I print it. However when I try to make it a conditional statement it won't work. Is this because of the way I set the index?
The intent is to input an ID and in return list out all the rows with this ID in the CSV.
import numpy as np
import pandas as pd
file = PartIdsWithVehicleData.csv
excel = pd.read_csv(file, index_col = "ID", dtype={'ID': "str"})
UserInput = input("Enter a part number: ")
result = (excel.loc[UserInput])
#If I print this.... it will work
print(result)
#however when I try to make it a conditional statement it runs my else clause.
if UserInput in excel:
print(result)
else:
print("Part number is not in file.")
#one thing I did notice is that if I find the boolean value of the part number (index) it says false..
print(UserInput in excel)
Is this what you are trying to accomplish? I had to build my own table to help visualize the process, but you should be able to accomplish what you asked with a little tweeking to your own data
df = pd.DataFrame({
'Part_Number' : [1, 2, 3, 4, 5],
'Part_Name' : ['Name', 'Another Name', 'Third Name', 'Almost last name', 'last name']
})
user_input = int(input('Please enter part number'))
if user_input in df['Part_Number'].tolist():
print(df.loc[df['Part_Number'] == user_input])
else:
print('part number does not exist')
Related
I am have two excels, I am trying to take second column values from exel1 and placing the values in second column by joining them with Underscore in excel2
Excel1:
Word
Variable
identifier
id
user
us
phone
ph
number
num
phone number
pn
Excel2:
Word
Variable Should be
user identifier
us_id
user phone number
us_pn
identifier number
id_num
I am not getting whole row while looping.
import pandas as pd
import os
file1= 'C:/Users/madhu/Desktop/Excel1.xlsx'
file2= 'C:/Users/madhu/Desktop/Book1.xlsx'
df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)
#df1.to_dict()
#df2.to_dict()
var=[]
print(df1)
print(df2)
for row,col in range(len(df1)):
for row1,col in range(len(df2)):
if row.isspace() == True:
var.append(df1[row])
return '_'.join(var)
elif row == row1:
var.append(df1[row])
return '_'.join(var)
else:
pass
Can anyone please help me? Thanks.
I am assuming that for "user phone number" you need "us_pn" as a variable. I am also assuming that the code does not need to return any values.
import pandas as pd
import os
file1= 'C:/Users/madhu/Desktop/Excel1.xlsx'
file2= 'C:/Users/madhu/Desktop/Book1.xlsx'
df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)
# Uncomment the following piece of code if the excel files have nan.
# df2.fillna('0', inplace=True)
print(df2)
for row2 in df2.values:
word_list = list(row2[0].split(' '))
# This is to handle the special case of 'user phone number'
# with output of 'us_pn'.
# If the desired output is otherwise
# 'us_ph_num', then this piece of code is not needed.
if 'phone number' in row2[0]:
word_list[word_list.index('phone')] = 'phone number'
word_list[word_list.index('number')] = ''
var_list = []
for word in word_list:
for row1 in df1.values:
if word == row1[0]:
var_list.append(row1[1])
row2[1] = "_".join(var_list)
If there is anything wrong with my assumptions, then do let me know and I will fix the code accordingly.
IIUC! Create a dict out of df1 and map for split items(S1 and S2) from df2. Refer below code
df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)
Map = dict(zip(df1.Word, df1.Variable))
pat='('+'|'.join(Map.keys())+')'
df2['S1']= df2['Word'].str.extract(pat=pat,expand=False).fillna('')
df2['S2'] = df2.apply(lambda x: x.Word.replace(x['S1'],''), axis =1)
df2['S2'] = df2['S2'].apply(lambda x: x.strip())
cols = ['S1', 'S2']
for col in cols:
df2[col] = df2[col].replace(Map)
df2['Variable Should be'] = df2['S1'] +'_'+ df2['S2']
df2.drop(columns = ['S1', 'S2'], inplace = True)
Let's say I have the following data of a match in a CSV file:
name,match1,match2,match3
Alice,2,4,3
Bob,2,3,4
Charlie,1,0,4
I'm writing a python program. Somewhere in my program I have scores collected for a match stored in a list, say x = [1,0,4]. I have found where in the data these scores exist using pandas and I can print "found" or "not found". However I want my code to print out to which name these scores correspond to. In this case the program should output "charlie" since charlie has all these values [1,0,4]. how can I do that?
I will have a large set of data so I must be able to tell which name corresponds to the numbers I pass to the program.
Yes, here's how to compare entire rows in a dataframe:
df[(df == x).all(axis=1)].index # where x is the pd.Series we're comparing to
Also, it makes life easiest if you directly set name as the index column when you read in the CSV.
import pandas as pd
from io import StringIO
df = """\
name,match1,match2,match3
Alice,2,4,3
Bob,2,3,4
Charlie,1,0,4"""
df = pd.read_csv(StringIO(df), index_col='name')
x = pd.Series({'match1':1, 'match2':0, 'match3':4})
Now you can see that doing df == x, or equivalently df.eq(x), is not quite what you want because it does element-wise compare and returns a row of True/False. So you need to aggregate those rows with .all(axis=1) which finds rows where all comparison results were True...
df.eq(x).all(axis=1)
df[ (df == x).all(axis=1) ]
# match1 match2 match3
# name
# Charlie 1 0 4
...and then finally since you only want the name of such rows:
df[ (df == x).all(axis=1) ].index
# Index(['Charlie'], dtype='object', name='name')
df[ (df == x).all(axis=1) ].index.tolist()
# ['Charlie']
which is what you wanted. (I only added the spaces inside the expression for clarity).
You need to use DataFrame.loc which would work like this:
print(df.loc[(df.match1 == 1) & (df.match2 == 0) & (df.match3 == 4), 'name'])
Maybe try something like this:
import pandas as pd
import numpy as np
# Makes sample data
match1 = np.array([2,2,1])
match2 = np.array([4,4,0])
match3 = np.array([3,3,4])
name = np.array(['Alice','Bob','Charlie'])
df = pd.DataFrame({'name': id, 'match1': match1, 'match2':match2, 'match3' :match3})
df
# example of the list you want to get the data from
x=[1,0,4]
#x=[2,4,3]
# should return the name Charlie as well as the index (based on the values in the list x)
df['name'].loc[(df['match1'] == x[0]) & (df['match2'] == x[1]) & (df['match3'] ==x[2])]
# Makes a new dataframe out of the above
mydf = pd.DataFrame(df['name'].loc[(df['match1'] == x[0]) & (df['match2'] == x[1]) & (df['match3'] ==x[2])])
# Loop that prints out the name based on the index of mydf
# Assuming there are more than one name, it will print all. if there is only one name, it will print only that)
for i in range(0,len(mydf)):
print(mydf['name'].iloc[i])
you can use this
here data is your Data frame ,you can change accordingly your data frame name,
and
considering [1,0,4] is int type
data = data[(data['match1']== 1)&(data['match2']==0)&(data['match3']== 4 ).index
print(data[0])
if data is object type then use this
data = data[(data['match1']== "1")&(data['match2']=="0")&(data['match3']== "4" ).index
print(data[0])
Problem:
I need to make a code that lists off data from a online web data set with pandas pd.read_HTML then call out a temperature based on that list and have it display that row of data with a few parameters.
The trouble is the final part I need to make it to where it loops for when the user input is out of range or not == to one of the documented temperatures it returns a retry command a message saying something like invalid input
What I have tried:
I tried running it through a while loop, try except commands and if and elif but I'm sure I did it wrong because it almost all the time breaks my spyder program so I have to close it and try again.
Any recommendation or solutions would be super helpful cause I'm past the point of vague hints that supposed to lead me to an answer but leave me more confused.
My code:
def get_t_data(t):
t_table = pd.read_html('https://thermo.pressbooks.com/chapter/saturation-properties-temperature-table/', header=0)
t_df = t_table[0]
data_df =t_df.loc[t_df['Temp'] == t]
df_result = data_df[['Pressure', 'Volume ()', 'Energy (kJ/kg)', 'Enthalpy (kJ/kg)', 'Entropy (kJ/kg.K)']]
df_final = df_result.to_string(index=False)
return df_final
user_t = input('Please enter the temp you will like to research: ')
print('\n')
data = get_t_data(user_t)
print('For temperature {}°C your outputs are \n'.format(user_t))
print(data)```
[upd]
something like this:
import pandas as pd
def get_t_data(t):
t_table = pd.read_html('https://thermo.pressbooks.com/chapter/saturation-properties-temperature-table/', header=0)
t_df = t_table[0]
t_df = t_df.iloc[1:,:] # to skip additional line of header
ind = list(t_df['Temp'].astype(float)) # get all indexes as float as you have not only integer (0.01 and 373.95)
if float(t) not in ind: # check if the 't' in index
return {'exist': False, 'result':'no such temp'}
data_df =t_df.loc[t_df['Temp'] == t]
df_result = data_df[['Pressure', 'Volume ()', 'Energy (kJ/kg)', 'Enthalpy (kJ/kg)', 'Entropy (kJ/kg.K)']]
df_final = df_result.to_string(index=False)
return {'exist': True, 'result': df_final}
# data format for get_t_data response
data = {'exist': False, 'result':''}
while data['exist'] == False:
user_t = input('Please enter the temp you will like to research: ')
print('\n')
data = get_t_data(user_t)
print('For temperature {}°C your outputs are \n'.format(user_t))
print(data['result'])
Beginner
I want to add a loop for each time the input is yes it adds 5 to the number of df.head()
while True :
check = input(' Do You Wish to continue yes/no')
if check != 'yes' :
break
else :
print(df.head(5))
The meaning of df.head(5) is that it shows the first 5 rows of the dataframe.
It wont add any number of rows in a loop. You need to use a variable
I think you mean this program to work in the following manner :
import pandas as pd
df = pd.read_csv("train.csv")
i = 5
#df.shape[0] gives the number of rows
while(i< df.shape[0]):
check = input(' Do You Wish to continue yes/no: ')
if check == 'yes' :
print(df.head(i))
i+=5#increment 5
else :
#if input is not 'yes' end loop
break;
I'm trying to create a python script using pandas where it prompts the user for a value from column 'Name'(or column 0) and then prints the value in column 'Location'(or column 9).
So far I have the following but it prints all columns in the row. How can I make it print a specific column?
import pandas as pd
df = pd.read_csv("Servers.csv")
user_input = raw_input("Enter server name: ")
for index, row in df.iterrows():
if row[0] == user_input:
print row
I would like to only have it print the 9th column from the row labled 'Location' when I enter a value from the first column labled 'Name'.
Currently it's printing all columns in the row when I enter a value from the first column Name.
Don't use a loop here, construct a series and then query the series via at. This assumes you do not have duplicate names.
df = pd.read_csv("Servers.csv")
series_map = df.set_index('Name')['Location']
user_input = raw_input("Enter server name: ")
print series_map.at[user_input]
The problem with your loop method is you don't index row, you can just use:
print row[9]
Adding the line print(df[df['Name'] == user_input].loc[:,'Location'].values[0]) should do the trick.
Here's a simple example with a dataframe containing 3 rows and columns:
d = {'Name': ['John', 'Laura', 'Sam'],
'Food': ['Sushi', 'Spaghetti', 'Sandwich'],
'Location': ['Houston', 'San Francisco', 'Hawaii']}
df = pd.DataFrame(data = d)
Name Food Location
0 John Sushi Houston
1 Laura Spaghetti San Francisco
2 Sam Sandwich Hawaii
If user_input = 'John', here's how we print out his location:
print(df[df['Name'] == user_input].loc[:,'Location'].values[0])
Which will output the string Houston.
This approach avoids loops and should be faster than using .iterrows().