Creating manager levels in Python - python

I am trying to create levels of managers within a dataset I have. It looks similar to this:
EID ReporngManager ManagerLevel1 Manager Level2 ManagerLevel3
123 201 101 201 301
124 101 101 204 306
125 401 101 206 304
The "EID" is the employee the Reporting manager the is ID of who they report to and the Mangers Levels starting at 1 is the highest level manager to 3 being a lower level manager. What I want is to be able to create another column that ranks the level of the manager's ID.
Something like this:
EID ReportingManager ManagerLevel1 Managerevel2 ManagerLevel3 ReportingManagerLevel
123 201 101 201 301 1
124 101 101 204 306 0
125 401 101 206 304 3
The idea is to see how far the reporting manager is away from the top level. If the Reporting manager is the top then 0 and everyone that reports to him would be a 1. if the EID is reporting to the level 2 manager then that manager is 1 away from the top manager and all the EIDs would then be 2 away from the top. So far I have been just working on getting the managers' levels figured out but run into an issue of all managers having a Manager level of 3.
My code looks like this:
manager_level = []
num = list(df['ID'])
for num in df['ReportingManager']:
if num is df['ManagerLevel1']
manager_level.append('0')
elif num is df['ManagerLeve2']:
manager_level.append('1')
elif num is df['ManagerLevel3']:
manager_level.append('2')
else:
manager_level.append('3')
df['Manager_Level'] = manager_level
Note: the 'df['postitonNum'] contains the ID of all the managers and employees.

Reproduced you df with this:
import pandas as pd
data={
"EID":[123,124,125],
"ReportingManager": [201,101,401],
"ManagerLevel1": [101, 101, 101],
"Managerevel2": [201, 204, 206],
"ManagerLevel3": [301, 306,304],
}
df = pd.DataFrame(data=data)
I suggest leveraging the report numbers themselves. 101 = 0, 201 = 1 and so on. Assuming you use pandas based on the df variable and dataframe tag you can use the apply method as such:
import math
df["ReportingManagerLevel"] = df["ReportingManager"].apply(lambda x: math.floor(x/100)) -1
This will take the values of the Reporting Manager and find the starting number, then take away 1. This would mean that if you had a manager with the ID 502 it would get the value 4. If this is something you would like to avoid you could always use the modulo operator.

Insted of use in you need to use the equality operator == to compare the values in the columns.
You can try with this code :
manager_level = []
for i, row in df.iterrows():
if row['ReportingManager'] == row['ManagerLevel1']:
manager_level.append(0)
elif row['ReportingManager'] == row['ManagerLevel2']:
manager_level.append(1)
elif row['ReportingManager'] == row['ManagerLevel3']:
manager_level.append(2)
else:
manager_level.append(3)
df['ReportingManagerLevel'] = manager_level

Related

Trying to Create a Manager Hierarchy

I am trying to create manager levels and I am getting stuck on the proper approach. I am using a csv file and have imported pandas and numpy, I want to take the "Manager 1" as the start and then show how many levels away the rest of the managers are from them. Below is an example of what I mean.
Employee_ID Manager_1 Manager_2 Reporting_Managers
101 111 112 112
102 111 102 111
103 111 118 300
So the goal is to have the Reporting Manager be the tested one and if the the reported manager is not on the list then they fall to lowest manager level (manager level 3) Something Like this: '
Employee_ID Manager_1 Manager_2 Reporting_Manager Level_of_Reporting_MGR
101 111 112 112 2
102 111 102 111 1
103 111 118 300 3
I have tried using a for loop and iterating through the reporting managers but I am not sure if that is the right approach or not. I am new to coding so this may be simple but I am not sure.
Current code looks like this:
Level_of_Reporting_MGR = []
for num in df['Manager_']:
if num in df['Manager_1']:
Level_of_Reporting_MGR.append(1)
elif num in df['Manager_2']:
Level_of_Reporting_MGR.append(2)
else:
Level_of_Reporting_MGR.append(3)
df['Level_of_Reporting_MGR'] = Level_of_Reporting_MGR
Not had a chance to try this out properly, but here's an outline of how I might approach the job.
def manager_score(series):
sweep_list = ["Manager_1", "Manager_2"]
for e,m in enumerate(sweep_list):
if series['Reporting_Manager']==series[m]:
return e + 1
return len(sweep_list)
df['distance'] = df.apply(manager_score, axis=1)

Find out emps reporting to same level hierarchy or lower-level hierarchy or superior-level hierarchy - python

I've a table which has emp_id, emp_desg, and mgr_id. I'm trying to find and print the employees who are reporting to lower-level hierarchy or same level hierarchy or superior-level hierarchy.
I have a mapping for the hierarchy levels and a mapping to find opposing role reporting, if the cases in 2nd MAPPING matches in the table, then it should print it.
1st MAPPING (Hierarchy Levels)
2nd MAPPING (Opposing role) - These records need to be printed.
I need to iterate through each employee and their managers. If the levels of emp and mgr matches with the 2nd mapping, I need to print it. Please help me to solve this, thanks in advance.
emp_id
emp_desg
mgr_id
111
ASM
112
112
ASM
116
113
BSM
114
114
CSM
115
115
ASM
116
116
DSM
117
Expected output:
df['emp_role'] = df['emp_desg'].map(hrchy_levels)
df['mgr_role'] = df['mgr_desg'].map(hrchy_levels)
Is there a way to compare 'emp_role' and 'mgr_role' with ranks_subords and just print the emp_id and mgr_id. I need not want to change anything in df, So after printing, I'll remove the added new columns emp_role and mgr_role. Thanks!
We start with defining the needed mappings for hierarchy and subordination.
hrchy_levels = {'ASM':'L1', 'BSM':'L2', 'CSM':'L3', 'DSM':'L4'}
ranks_subords = [('L1' , 'L1'),('L1' , 'L4'),('L2' , 'L1'),('L2' , 'L2'),('L3' , 'L3'),('L3' , 'L1'),('L3' , 'L2'),('L4' , 'L1'),('L4' , 'L2'),('L4' , 'L3')]
Then map manager ids to employee ids:
df['mgr_desg'] = df['mgr_id'].map(dict(df[['emp_id', 'emp_desg']].values))
Making replacements for level descriptions into another df and filtering by rank relations:
df2 = df.replace({'emp_desg': hrchy_levels, 'mgr_desg': hrchy_levels})
df2[df2.apply(lambda x: (x['emp_desg'], x['mgr_desg']) in ranks_subords, axis=1)]
emp_id emp_desg mgr_id mgr_desg
0 111 L1 112 L1
1 112 L1 116 L4
3 114 L3 115 L1
4 115 L1 116 L4
Now, it's easy to iterate over the rows and print a formatted output.

Pandas - getting an index of a row in a pandas apply function

Here is my pandas code:
def calcObj(row):
d = dict(calc1 = iferror(row.Hours1.sum(), row.Hours2.sum(), 0, '+'))
if row.Process == 'A': # this doesn't work
d['ProcessKey'] = 700
else:
d['ProcessKey'] = 500
return pd.Series(d)
df.groupby(['MainProcess']).apply(calcObj)
I am trying to check if a process name is A and if it is return a different value.
Unfortunately it doesn't work and i get the following error:
AttributeError: 'DataFrame' object has no attribute 'Process '
I assume it's because i am not grouping by process only by MainProcess.
Is there any way to get access to this item within the apply function ? Any other work-around would also be very helful
Here is my example dataframe, BG/MainProcess, CoreProcess and Process1 are indexes, Hours1/Hours2 are columns :
Bg MainProcess CoreProcess Process Hours1 Hours2
Building1 MainProcess-1 CoreProcess-1 S-Process-1 150 250
S-Process-2 150 250
CoreProcess-2 S-Process-3 150 250
S-Process-1 150 250
S-Process-2 150 250
Building2 MainProcess-2 CoreProcess-3 S-Process-1 150 250
S-Process-2 150 250
MainProcess-3 CoreProcess-4 S-Process-1 150 250
S-Process-2 150 250
S-Process-3 150 250
Beware, the columns in the index are not columns of the DataFrame!
In your example, as Process is in the (multi-) index, df['Process'] will raise a KeyError independently of the groupby. You must reset the column from the index to be able to use it. For example you could reset it before the groupby:
df.reset_index(level='Process').groupby(['MainProcess']).apply(calcObj)
But beware: calcObj will not receive rows here but the sub-dataframes having same values in the MainProcess column...

Create categories based on Partial Values Python

Hi I have a data frame as below:
response ticket
so service reset performed 123
reboot done 343
restart performed 223
no value 444
ticket created 765
Im trying something like this:
import pandas as pd
df = pd.read_excel (r'C:\Users\Downloads\response.xlsx')
print (df)
count_other = 0
othersvocab = ['Service reset' , 'Reboot' , 'restart']
if df.response = othersvocab
{
count_other = count_other + 1
}
What I'm trying to do is get the count of how many have either of 'othersvocab' and how many don't.
I'm really new to Python, and I'm not sure how to do this.
Expected Output:
other ticketed
3 2
Can you help me figure it out, hopefully with what's happening in your code?
I am doing this on lunch break, I don't like the for other in others thing I have and there are better ways using pandas DataFrame methods you can use but it will have to do.
import pandas as pd
df = pd.DataFrame({"response": ["so service reset performed", "reboot done",
"restart performed"],
"ticket": [123, 343, 223]})
others = ['service reset' , 'reboot' , 'restart']
count_other = 0
for row in df["response"].values:
for other in others:
if other in row:
count_other += 1
So first you are going to need to address that if you want to perform this in the way I have you're going to have to lowercase the response column and the others variable, that's not very hard (lookup for pandas apply and the string operator .lower).
What I have done in this is I am looping first over the values in the loop column.
Then within this loop I am looping over the others list items.
Finally seeing whether any of these is in the list.
I hope my rushed response gives a hand.
Consider below df:
In [744]: df = pd.DataFrame({'response':['so service reset performed', 'reboot done', 'restart performed', 'no value', 'ticket created'], 'ticket':[123, 343, 223, 444, 765]})
In [745]: df
Out[745]:
response ticket
0 so service reset performed 123
1 reboot done 343
2 restart performed 223
3 no value 444
4 ticket created 765
Below is your othersvocab:
In [727]: othersvocab = ['Service reset' , 'Reboot' , 'restart']
# Converting all elements to lowercase
In [729]: othersvocab = [i.lower() for i in othersvocab]
Use Series.str.contains:
# Converting response column to lowercase
In [733]: df.response = df.response.str.lower()
In [740]: count_in_vocab = len(df[df.response.str.contains('|'.join(othersvocab))])
In [742]: count_others = len(df) - count_in_vocab
In [752]: res = pd.DataFrame({'other': [count_in_vocab], 'ticketed': [count_others]})
In [753]: res
Out[753]:
other ticketed
0 3 2

Python Pandas: Unstack nested Excel Pivot report in compact form to get machine readable data

Hey guys I'm trying to turn this legacy data from an excel file into a machine readable form for further use.
Data looks like this (This is the result of an excel pivot in compact form, which was copy pasted and the original data source is lost):
Hierarchical nested data. (csv at the end of this text).
The expected result should be like this:
Machine readable form.
The length of the number before the category name indicates the hierarchy level of the category. E.g is 1cat the sum of 11cat, 12cat, 13cat and so on.
What I tried to do was to shift the cells depending on the length of the cat number and create a multindex and then transpose or melt or unpivot. But I am really struggling with that one and I hope someone can help me.
I will share some code snippets but they are incomplete and are mere attempts:
df = pd.read_excel(...)
mycolumns = df.columns.tolist()
df["cat"] = df[mycolumns[0]].str.split(' ', n=1).str[0]
df["catlen"] = [len(nr) for nr in df[("cat",)] ]
df["fourdigit"] = df[df[("catlen",)] == 4]
digitsmap = DataFrame(df[mycolumns[0]].str.split(' ', n=1).str[0])
digitsmap[index_name] = df[mycolumns[0]].str.split(' ', n=1).str[-1]
mycolumns = digitsmap.columns.tolist()
digitsmap["catlen"] = [len(nr) for nr in digitsmap[mycolumns[0]] ]
onedigit = digitsmap[digitsmap[("catlen",)] == 1]
twodigit = digitsmap[digitsmap[("catlen",)] == 2]
threedigit = digitsmap[digitsmap[("catlen",)] == 3]
fourdigit = digitsmap[digitsmap[("catlen",)] == 4]
CSV
Category;Value
1 Main Category;25,27
11 Subcategory;10,16
111 Subcategory a;4,34
112 Subcategory b;1,90
113 Subcategory c;
119 Subcategory i;3,92
12 Subcategory2;15,11
121 Subcategory2 a;9,84
1211 Subcategory2 aa;8,24
1212 Subcategory2 ab;0,11
1219 Subcategory2 ai;1,49
122 Subcategory2 b;5,27
Thanks guys!

Categories

Resources