I am trying to create manager levels and I am getting stuck on the proper approach. I am using a csv file and have imported pandas and numpy, I want to take the "Manager 1" as the start and then show how many levels away the rest of the managers are from them. Below is an example of what I mean.
Employee_ID Manager_1 Manager_2 Reporting_Managers
101 111 112 112
102 111 102 111
103 111 118 300
So the goal is to have the Reporting Manager be the tested one and if the the reported manager is not on the list then they fall to lowest manager level (manager level 3) Something Like this: '
Employee_ID Manager_1 Manager_2 Reporting_Manager Level_of_Reporting_MGR
101 111 112 112 2
102 111 102 111 1
103 111 118 300 3
I have tried using a for loop and iterating through the reporting managers but I am not sure if that is the right approach or not. I am new to coding so this may be simple but I am not sure.
Current code looks like this:
Level_of_Reporting_MGR = []
for num in df['Manager_']:
if num in df['Manager_1']:
Level_of_Reporting_MGR.append(1)
elif num in df['Manager_2']:
Level_of_Reporting_MGR.append(2)
else:
Level_of_Reporting_MGR.append(3)
df['Level_of_Reporting_MGR'] = Level_of_Reporting_MGR
Not had a chance to try this out properly, but here's an outline of how I might approach the job.
def manager_score(series):
sweep_list = ["Manager_1", "Manager_2"]
for e,m in enumerate(sweep_list):
if series['Reporting_Manager']==series[m]:
return e + 1
return len(sweep_list)
df['distance'] = df.apply(manager_score, axis=1)
Related
I am trying to create levels of managers within a dataset I have. It looks similar to this:
EID ReporngManager ManagerLevel1 Manager Level2 ManagerLevel3
123 201 101 201 301
124 101 101 204 306
125 401 101 206 304
The "EID" is the employee the Reporting manager the is ID of who they report to and the Mangers Levels starting at 1 is the highest level manager to 3 being a lower level manager. What I want is to be able to create another column that ranks the level of the manager's ID.
Something like this:
EID ReportingManager ManagerLevel1 Managerevel2 ManagerLevel3 ReportingManagerLevel
123 201 101 201 301 1
124 101 101 204 306 0
125 401 101 206 304 3
The idea is to see how far the reporting manager is away from the top level. If the Reporting manager is the top then 0 and everyone that reports to him would be a 1. if the EID is reporting to the level 2 manager then that manager is 1 away from the top manager and all the EIDs would then be 2 away from the top. So far I have been just working on getting the managers' levels figured out but run into an issue of all managers having a Manager level of 3.
My code looks like this:
manager_level = []
num = list(df['ID'])
for num in df['ReportingManager']:
if num is df['ManagerLevel1']
manager_level.append('0')
elif num is df['ManagerLeve2']:
manager_level.append('1')
elif num is df['ManagerLevel3']:
manager_level.append('2')
else:
manager_level.append('3')
df['Manager_Level'] = manager_level
Note: the 'df['postitonNum'] contains the ID of all the managers and employees.
Reproduced you df with this:
import pandas as pd
data={
"EID":[123,124,125],
"ReportingManager": [201,101,401],
"ManagerLevel1": [101, 101, 101],
"Managerevel2": [201, 204, 206],
"ManagerLevel3": [301, 306,304],
}
df = pd.DataFrame(data=data)
I suggest leveraging the report numbers themselves. 101 = 0, 201 = 1 and so on. Assuming you use pandas based on the df variable and dataframe tag you can use the apply method as such:
import math
df["ReportingManagerLevel"] = df["ReportingManager"].apply(lambda x: math.floor(x/100)) -1
This will take the values of the Reporting Manager and find the starting number, then take away 1. This would mean that if you had a manager with the ID 502 it would get the value 4. If this is something you would like to avoid you could always use the modulo operator.
Insted of use in you need to use the equality operator == to compare the values in the columns.
You can try with this code :
manager_level = []
for i, row in df.iterrows():
if row['ReportingManager'] == row['ManagerLevel1']:
manager_level.append(0)
elif row['ReportingManager'] == row['ManagerLevel2']:
manager_level.append(1)
elif row['ReportingManager'] == row['ManagerLevel3']:
manager_level.append(2)
else:
manager_level.append(3)
df['ReportingManagerLevel'] = manager_level
I've a table which has emp_id, emp_desg, and mgr_id. I'm trying to find and print the employees who are reporting to lower-level hierarchy or same level hierarchy or superior-level hierarchy.
I have a mapping for the hierarchy levels and a mapping to find opposing role reporting, if the cases in 2nd MAPPING matches in the table, then it should print it.
1st MAPPING (Hierarchy Levels)
2nd MAPPING (Opposing role) - These records need to be printed.
I need to iterate through each employee and their managers. If the levels of emp and mgr matches with the 2nd mapping, I need to print it. Please help me to solve this, thanks in advance.
emp_id
emp_desg
mgr_id
111
ASM
112
112
ASM
116
113
BSM
114
114
CSM
115
115
ASM
116
116
DSM
117
Expected output:
df['emp_role'] = df['emp_desg'].map(hrchy_levels)
df['mgr_role'] = df['mgr_desg'].map(hrchy_levels)
Is there a way to compare 'emp_role' and 'mgr_role' with ranks_subords and just print the emp_id and mgr_id. I need not want to change anything in df, So after printing, I'll remove the added new columns emp_role and mgr_role. Thanks!
We start with defining the needed mappings for hierarchy and subordination.
hrchy_levels = {'ASM':'L1', 'BSM':'L2', 'CSM':'L3', 'DSM':'L4'}
ranks_subords = [('L1' , 'L1'),('L1' , 'L4'),('L2' , 'L1'),('L2' , 'L2'),('L3' , 'L3'),('L3' , 'L1'),('L3' , 'L2'),('L4' , 'L1'),('L4' , 'L2'),('L4' , 'L3')]
Then map manager ids to employee ids:
df['mgr_desg'] = df['mgr_id'].map(dict(df[['emp_id', 'emp_desg']].values))
Making replacements for level descriptions into another df and filtering by rank relations:
df2 = df.replace({'emp_desg': hrchy_levels, 'mgr_desg': hrchy_levels})
df2[df2.apply(lambda x: (x['emp_desg'], x['mgr_desg']) in ranks_subords, axis=1)]
emp_id emp_desg mgr_id mgr_desg
0 111 L1 112 L1
1 112 L1 116 L4
3 114 L3 115 L1
4 115 L1 116 L4
Now, it's easy to iterate over the rows and print a formatted output.
What I want is to see how I should group my CD's so that I have a similar group count for each 'bin' eg A+B C+D and E+F+G+H for example. It's more of an exercise rather than a need, but I don't have enough space to have a pile for each letter of the alphabet, so I'd rather have say 10 piles, but how to split them up.
So I have the following obtained from my DataFrame, showing the cumulative sum of entries through numbers (#) and the alphabet;
In[135]:csum
Out[135]:
key
# 9
A 25
B 43
C 63
D 76
E 82
F 98
G 105
H 116
I 120
J 125
K 130
L 139
M 154
N 160
O 164
P 186
R 221
S 234
T 298
U 302
V 319
W 325
Y 326
Name: count, dtype: int64
I've written a function 'distribution' to get the result I wanted... i.e. 10 separate groups, showing which alphabetic clusters to use.
dist = distribution(byvar, various=True)
dist
Out[138]:
quants
(8.999, 49.0] #AB
(49.0, 79.6] CD
(79.6, 104.3] EF
(104.3, 121.0] GHI
(121.0, 134.5] JK
(134.5, 158.8] LM
(158.8, 189.5] NOP
(189.5, 259.6] RS
(259.6, 313.9] TU
(313.9, 326.0] VWY
dtype: object
The code is here;
import pandas as pd
import numpy as np
def distribution(df, various=False):
'''
Parameters
----------
df : dataframe
various : boolean, optional
Select if Various df
Returns
-------
df
Shows how to distribute groupings to get similar size bunches.
'''
global gar, csum
if various:
df['AZ'] = df['album'].apply(lambda x: '#' if x[0] in map(str,range(10)) else x[0].upper())
else:
df['AZ'] = df['artist'].apply(lambda x: '#' if x[0] in map(str,range(10)) else x[0].upper())
gar = df.groupby('AZ')
csum = gar.size().cumsum() ### => csum becomes a Series obj
sdf = pd.DataFrame(csum.iteritems(), columns=['key','count'])
sdf['quants'] = pd.qcut(sdf['count'], q=np.array(range(11))*0.1)
gsdf = sdf.groupby('quants')
return gsdf.apply(lambda x: x['key'].sum())
So my question arises from the fact that I couldn't see how to achieve this without converting my Series object (csum) back into a DataFrame before using pd.qcut to split it up.
Can anyone see a more concise approach that bypasses the creating of the intermediate 'sdf' DataFrame ?
Am trying to analyze a legacy menu system and trace the path of the menu options. The menu system has main menu and followed by sub menu. I am trying to get the details from bottom to top.Here is records i extracted from the csv for the 'Pay' screen.
If we look at it the Pay Menu is called from 3 sub menu. Example Rules and Dispatch. Rules is inturn called from Test Menu
So for the 3 instances of where pay needs to be called. I want to extract as
2-10
18-2-10
98-13-4-4
How is this possible
MOKEY#MO MOMNU#MO MOMNUOPT MOMNUSEQ MOOPTDES MOOPTCMD
111 0 2 20 Dispatch Menu
131 111 10 120 Pay CALL AS650G
283 0 98 980 Utilities Menu
985 3,028 2 30 Rules CALL IS216G PARM(' ')
1,131 985 10 120 Pay CALL AS650G
2,391 283 13 300 Key Performance Indicator Menu
2,434 2,445 4 380 Pay CALL AS650G
2,445 2,391 4 40 Quick Look Weekly Menu
3,028 0 18 190 Test Menu
Below is something i have been doing, and i just have a very basic knowledge on pandas. How can i combine all these statements and get the output
import pandas as pd
statDir = 'C:/Users/jerry/Documents/STAT_TABLES/'
csvFile = statDir + 'menu' + '.csv';
dd = pd.read_csv(csvFile,low_memory=False);
fd1 = dd[dd['MOOPTCMD'].str.contains('AS650G')][['MOKEY#MO','MOMNU#MO','MOMNUOPT']]
print(fd1)
print('==============')
fd2 = dd[dd['MOKEY#MO'].isin(fd1['MOMNU#MO'])][['MOKEY#MO','MOMNU#MO','MOMNUOPT']]
print(fd2)
print('==============')
fd3 = dd[dd['MOKEY#MO'].isin(fd2['MOMNU#MO'])][['MOKEY#MO','MOMNU#MO','MOMNUOPT']]
print(fd3)
print('==============')
fd4 = dd[dd['MOKEY#MO'].isin(fd3['MOMNU#MO'])][['MOKEY#MO','MOMNU#MO','MOMNUOPT']]
print(fd4)
print('==============')
fd5 = dd[dd['MOKEY#MO'].isin(fd4)]
fd5
I'm reading a csv file which has data in the form.
person,1,125,321,123,532
person,1,123,521,123,632
person,10,324,345,12,456
chair,7000,123,45,12,643
I can read it with my_data = np.genfromtxt(filename,delimiter=",",dtype=None)
and then I have a ndarray.
I'd like to re order them based on their second column value.
The output should be an ndarray in the form
[
[[person,1,125,321,123,532],[person,1,123,521,123,632]]
[person,10,324,345,12,456]
[chair,7000,123,45,12,643]
]
My way is to
my_data = np.genfromtxt(filename,delimiter=",",dtype=None)
tem = []
for x in range(0,8000,22):
fake_array=([a_value for a_value in my_data if (a_value[1]==x)])
if (len(fake_array)>0):
tem.append(fake_array)
This gives me the write result But I feel its a very bad way to do this.
Can anyone suggest me an optimized way to do this??
Especially cause I iterate from 0 to 8000 even though there might be only 10 values.
And nested in this iterating through all rows of the array.
I think the function numpy.unique https://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html can be used as well.
I'm not sure how to implement it ?
You should use the python library pandas here instead of of trying to do everything in numpy (it's too low level for what you're doing). With pandas you can do want you want via:
import pandas as pd
df = pd.read_csv(file_name)
df_sorted = df.sort_values(by="col_2")
Just do this,
my_data = my_data[my_data[:,1].argsort()]
Try using the code below:
df = pd.read_csv("D:/path/test.csv",header=None,sep=',')
df=df.rename(columns={0:"Name",1:"Value1",2:"Value2",3:"Value3",4:"Value4",5:"Value5"})
df=df.sort_values(by="Value1")
You will get the output below:
Name Value1 Value2 Value3 Value4 Value5
0 person 1 125 321 123 532
1 person 1 123 521 123 632
2 person 10 324 345 12 456
3 chair 7000 123 45 12 643