Ok, my frustration has hit epic proportions. I am new to Pandas and trying to use it on an excel db i have, however, i cannot seem to figure out what should be a VERY simple action.
I have a dataframe as such:
ID UID NAME STATE
1 123 Bob NY
1 123 Bob PA
2 124 Jim NY
2 124 Jim PA
3 125 Sue NY
all i need is to be able to locate and print the ID of a record by the unique combination of UID and STATE.
The closest I can come up with is this:
temp_db = fd_db.loc[(fd_db['UID'] == "1") & (fd_db['STATE'] == "NY")]
but this still grabs all UID and not ONLY the one with the STATE
Then, when i try to print the result
temp_db.ID.values
prints this:
['1', '1']
I need just the data and not the structure.
My end result needs to be just to print to the screen : 1
Any help is much appreciated.
I think it's because your UID condition is wrong : the UID column an Integer and you give a String.
For example when I run this :
df.loc[(df['UID'] == "123") & (df['STATE'] == 'NY')]
The output is :
Empty DataFrame
Columns: [ID, UID, NAME, STATE]
Index: []
but when I consider UID as an Integer :
df.loc[(df['UID'] == 123) & (df['STATE'] == 'NY')]
It output :
ID UID NAME STATE
0 1 123 Bob NY
I hope that will help you !
fd_db.loc[(fd_db['UID'] == 123) & (fd_db['STATE'] == 'NY')]['ID'].iloc[0]
Related
Below is a sample dataframe from a much larger set of data. I need to create a new column 'Is a Manager?', that contains boolean results 'True' or 'False'. The condition; is the 'Employee ID' listed anywhere within the 'Manager ID' column within the dataset?
df = pd.DataFrame({'Worker': ['Sam','Tom','Justin','Jake'], 'Employee ID':[12345,12121,67891,99991], 'Manager ID': [97483, 29601,85863, 19739]})
df
Worker Employee ID Manager ID
0 Sam 12345 97483
1 Tom 12121 29601
2 Justin 67891 85863
3 Jake 99991 19739
and so on....
I have tried to use the .isin function.
The column was added successfully, but all values state False, when I know some should be True.
For example, Sam's Employee ID 12345 is listed on line 245 as person X's manager 'Manager ID' = 12345
Any idea where i've gone wrong? My code is:
df3 = df.loc[:, ['Worker', 'Employee ID', 'Manager ID']]
df3.insert(1, 'Is a Manager?', df3['Employee ID'].isin(['Manager ID']))
df3
Worker Is a Manager? Employee ID Manager ID
0 A False 221113 1210236
1 B False 221359 86082653
2 C False 295142 1718020
3 D False 775199 1910236
The problem is in this line:
df3.insert(1, 'Is a Manager?', df3['Employee ID'].isin(['Manager ID']))
You are checking whether the Employee ID is in a list containing the string "Manager ID".
The line should be:
df3.insert(1, 'Is a Manager?', df3['Employee ID'].isin(df3['Manager ID']))
I'm just wondering how one might overcome the below error.
AttributeError: 'list' object has no attribute 'str'
What I am trying to do is create a new column "PrivilegedAccess" and in this column I want to write "True" if any of the names in the first_names column match the ones outlined in the "Search_for_These_values" list and "False" if they don't
Code
## Create list of Privileged accounts
Search_for_These_values = ['Privileged','Diagnostics','SYS','service account'] #creating list
pattern = '|'.join(Search_for_These_values) # joining list for comparision
PrivilegedAccounts_DF['PrivilegedAccess'] = PrivilegedAccounts_DF.columns=[['first_name']].str.contains(pattern)
PrivilegedAccounts_DF['PrivilegedAccess'] = PrivilegedAccounts_DF['PrivilegedAccess'].map({True: 'True', False: 'False'})
SAMPLE DATA:
uid last_name first_name language role email_address department
0 121 Chad Diagnostics English Team Lead Michael.chad#gmail.com Data Scientist
1 253 Montegu Paulo Spanish CIO Paulo.Montegu#gmail.com Marketing
2 545 Mitchel Susan English Team Lead Susan.Mitchel#gmail.com Data Scientist
3 555 Vuvko Matia Polish Marketing Lead Matia.Vuvko#gmail.com Marketing
4 568 Sisk Ivan English Supply Chain Lead Ivan.Sisk#gmail.com Supply Chain
5 475 Andrea Patrice Spanish Sales Graduate Patrice.Andrea#gmail.com Sales
6 365 Akkinapalli Cherifa French Supply Chain Assistance Cherifa.Akkinapalli#gmail.com Supply Chain
Note that the dtype of the first_name column is "object" and the dataframe is multi index (not sure how to change from multi index)
Many thanks
It seems you need select one column for str.contains and then use map or convert boolean to strings:
Search_for_These_values = ['Privileged','Diagnostics','SYS','service account'] #creating list
pattern = '|'.join(Search_for_These_values)
PrivilegedAccounts_DF = pd.DataFrame({'first_name':['Privileged 111',
'aaa SYS',
'sss']})
print (PrivilegedAccounts_DF.columns)
Index(['first_name'], dtype='object')
print (PrivilegedAccounts_DF.loc[0, 'first_name'])
Privileged 111
print (type(PrivilegedAccounts_DF.loc[0, 'first_name']))
<class 'str'>
PrivilegedAccounts_DF['PrivilegedAccess'] = PrivilegedAccounts_DF['first_name'].str.contains(pattern).astype(str)
print (PrivilegedAccounts_DF)
first_name PrivilegedAccess
0 Privileged 111 True
1 aaa SYS True
2 sss False
EDIT:
There is problem one level MultiIndex, need:
PrivilegedAccounts_DF = pd.DataFrame({'first_name':['Privileged 111',
'aaa SYS',
'sss']})
#simulate problem
PrivilegedAccounts_DF.columns = [PrivilegedAccounts_DF.columns.tolist()]
print (PrivilegedAccounts_DF)
first_name
0 Privileged 111
1 aaa SYS
2 sss
#check columns
print (PrivilegedAccounts_DF.columns)
MultiIndex([('first_name',)],
)
Solution is join values, e.g. by empty string:
PrivilegedAccounts_DF.columns = PrivilegedAccounts_DF.columns.map(''.join)
So now columns names are correct:
print (PrivilegedAccounts_DF.columns)
Index(['first_name'], dtype='object')
PrivilegedAccounts_DF['PrivilegedAccess'] = PrivilegedAccounts_DF['first_name'].str.contains(pattern).astype(str)
print (PrivilegedAccounts_DF)
There might be a more elegant solution, but this should work (without using patterns):
PrivilegedAccounts_DF.loc[PrivilegedAccounts_DF['first_name'].isin(Search_for_These_values), "PrivilegedAccess"]=True
PrivilegedAccounts_DF.loc[~PrivilegedAccounts_DF['first_name'].isin(Search_for_These_values), "PrivilegedAccess"]=False
I have a data-frame (df)
which looks like:
first_name surname location identifier
0 Fred Smith London FredSmith
1 Jane Jones Bristol JaneJones
I am trying to query a particular field and return it to a variable value using:
value = df.loc[df['identifier'] == query_identifier ,'location']
so where query_identifier is equal to FredSmith I get returned to value:
0 London
How can I remove the 0 so I just have:
London
Try this statement:
value = df.loc[df['identifier'] == "FredSmith" ,'location'].values[0]
This will help you.
If there is multiple values for the same identifier, then:
value = df.loc[df['identifier'] == "FredSmith" ,'location'].values
for df_values in value:
print(df_values)
This is just enhancement.
Is there any ways to using pandas checking the existing value row by row ?
This how my Data Frame looks like
Data Frame
Log ID User ID Name Phone Number
1 001 Jack 123456789
2 002 Jackie 123456780
3 003 Jacky 123456700
4 004 Ben 123456000
The data I want to check is (Jacky, 123456700) is it in the Data Frame or not.
If exist then I just ignore this data else insert into the Data Frame
Select and check with logical AND & for Name and Phone Number with pandas.DataFrame.any
>>> name, phno = ('Jacky', 123456700)
>>> ((df['Name'] == name) & (df['Phone Number'] == phno)).any()
True
if True then you can ignore and if not, you can insert into the df
I'm trying to write a script that will check if manager numbers match with employee number. It will continue down the column until all numbers are checked.
When finished it will print a list of how many matched or didn't match.
[In:]
import pandas as pd
#reading in csv file to Data Frame
employeeData = pd.read_csv("C:/Users/Desktop/EmployeeList.csv")
#creatig a Data Frame
dataF = pd.DataFrame(employeeData);
#empty list where instances of T/F will be stored
booleans = [];
#256 manager numbers + 1896 empty rows
managers = pd.Series(employeeData['Manager ID Number']
#Edit Forgot to include this line
condition = managers.equals(merge['Employee ID'])
#check each row of employee data. 2153 rows of Employee Numbers
for index, row in employeeData.iterrows():
#Check every single Manager number for a match
for index, row in managers.iteritems():
if condition:
booleans.append(True)
print("Something matched!")
else:
print("Didn't match!"
booleans.append(False)
#A length of all booleans is printed.
print(len(booleans))
[Out:] Actual
"Didn't match!" x 2153 times. (number of employees in list)
[Out:] Desired:
"Something matched!"
"Didn't match!"
"Something matched!"
"Something matched!"
"Didn't match!"
"Something matched!".... to line 2153
My problem is it seems the index count won't move down. It will only output that it didn't match with the first number hundreds of times. I want to move the row position down so it all the employee numbers are checked against the Manager list. Some managers have more employee's that others so I have to check every single one!(256) I'm embarrassed to say I've been stuck on this problem for quite a while. New to python so any tips would be greatly appreciated
IIUC you need to use Pandas Merge()
df_emp_mng= pd.merge(df_Emp,df_Mang,left_on='EMP ID',right_on='Manager ID')
print (df_emp_mng)
print 'Number of managers in Employee' ,len(df_emp_mng)
print 'Number of managers not in Employee' ,len(df_Emp)-len(df_emp_mng)
input - Emplyee Data
EMP ID name MID
0 123 E3 1
1 124 E1 1
2 125 E2 2
3 4 X4 5
Input - Manager Data
Manager ID Manager name Dep
0 1 X1 C
1 2 X2 D
2 3 X3 E
3 4 X4 F
4 5 X5 F
Output
EMP ID name MID Manager ID Manager name Dep
0 4 X4 5 4 X4 F
Number of managers in Employee 1
Number of managers not in Employee 3