Dataframe recursively trace the values until one column is zero

Dataframe recursively trace the values until one column is zero - python

Am trying to analyze a legacy menu system and trace the path of the menu options. The menu system has main menu and followed by sub menu. I am trying to get the details from bottom to top.Here is records i extracted from the csv for the 'Pay' screen.
If we look at it the Pay Menu is called from 3 sub menu. Example Rules and Dispatch. Rules is inturn called from Test Menu
So for the 3 instances of where pay needs to be called. I want to extract as
2-10
18-2-10
98-13-4-4
How is this possible
MOKEY#MO MOMNU#MO MOMNUOPT MOMNUSEQ MOOPTDES MOOPTCMD
111 0 2 20 Dispatch Menu
131 111 10 120 Pay CALL AS650G
283 0 98 980 Utilities Menu
985 3,028 2 30 Rules CALL IS216G PARM(' ')
1,131 985 10 120 Pay CALL AS650G
2,391 283 13 300 Key Performance Indicator Menu
2,434 2,445 4 380 Pay CALL AS650G
2,445 2,391 4 40 Quick Look Weekly Menu
3,028 0 18 190 Test Menu
Below is something i have been doing, and i just have a very basic knowledge on pandas. How can i combine all these statements and get the output
import pandas as pd
statDir = 'C:/Users/jerry/Documents/STAT_TABLES/'
csvFile = statDir + 'menu' + '.csv';
dd = pd.read_csv(csvFile,low_memory=False);
fd1 = dd[dd['MOOPTCMD'].str.contains('AS650G')][['MOKEY#MO','MOMNU#MO','MOMNUOPT']]
print(fd1)
print('==============')
fd2 = dd[dd['MOKEY#MO'].isin(fd1['MOMNU#MO'])][['MOKEY#MO','MOMNU#MO','MOMNUOPT']]
print(fd2)
print('==============')
fd3 = dd[dd['MOKEY#MO'].isin(fd2['MOMNU#MO'])][['MOKEY#MO','MOMNU#MO','MOMNUOPT']]
print(fd3)
print('==============')
fd4 = dd[dd['MOKEY#MO'].isin(fd3['MOMNU#MO'])][['MOKEY#MO','MOMNU#MO','MOMNUOPT']]
print(fd4)
print('==============')
fd5 = dd[dd['MOKEY#MO'].isin(fd4)]
fd5

Related

Creating manager levels in Python

I am trying to create levels of managers within a dataset I have. It looks similar to this:
EID ReporngManager ManagerLevel1 Manager Level2 ManagerLevel3
123 201 101 201 301
124 101 101 204 306
125 401 101 206 304
The "EID" is the employee the Reporting manager the is ID of who they report to and the Mangers Levels starting at 1 is the highest level manager to 3 being a lower level manager. What I want is to be able to create another column that ranks the level of the manager's ID.
Something like this:
EID ReportingManager ManagerLevel1 Managerevel2 ManagerLevel3 ReportingManagerLevel
123 201 101 201 301 1
124 101 101 204 306 0
125 401 101 206 304 3
The idea is to see how far the reporting manager is away from the top level. If the Reporting manager is the top then 0 and everyone that reports to him would be a 1. if the EID is reporting to the level 2 manager then that manager is 1 away from the top manager and all the EIDs would then be 2 away from the top. So far I have been just working on getting the managers' levels figured out but run into an issue of all managers having a Manager level of 3.
My code looks like this:
manager_level = []
num = list(df['ID'])
for num in df['ReportingManager']:
if num is df['ManagerLevel1']
manager_level.append('0')
elif num is df['ManagerLeve2']:
manager_level.append('1')
elif num is df['ManagerLevel3']:
manager_level.append('2')
else:
manager_level.append('3')
df['Manager_Level'] = manager_level
Note: the 'df['postitonNum'] contains the ID of all the managers and employees.

Reproduced you df with this:
import pandas as pd
data={
"EID":[123,124,125],
"ReportingManager": [201,101,401],
"ManagerLevel1": [101, 101, 101],
"Managerevel2": [201, 204, 206],
"ManagerLevel3": [301, 306,304],
}
df = pd.DataFrame(data=data)
I suggest leveraging the report numbers themselves. 101 = 0, 201 = 1 and so on. Assuming you use pandas based on the df variable and dataframe tag you can use the apply method as such:
import math
df["ReportingManagerLevel"] = df["ReportingManager"].apply(lambda x: math.floor(x/100)) -1
This will take the values of the Reporting Manager and find the starting number, then take away 1. This would mean that if you had a manager with the ID 502 it would get the value 4. If this is something you would like to avoid you could always use the modulo operator.

Insted of use in you need to use the equality operator == to compare the values in the columns.
You can try with this code :
manager_level = []
for i, row in df.iterrows():
if row['ReportingManager'] == row['ManagerLevel1']:
manager_level.append(0)
elif row['ReportingManager'] == row['ManagerLevel2']:
manager_level.append(1)
elif row['ReportingManager'] == row['ManagerLevel3']:
manager_level.append(2)
else:
manager_level.append(3)
df['ReportingManagerLevel'] = manager_level

Trying to Create a Manager Hierarchy

I am trying to create manager levels and I am getting stuck on the proper approach. I am using a csv file and have imported pandas and numpy, I want to take the "Manager 1" as the start and then show how many levels away the rest of the managers are from them. Below is an example of what I mean.
Employee_ID Manager_1 Manager_2 Reporting_Managers
101 111 112 112
102 111 102 111
103 111 118 300
So the goal is to have the Reporting Manager be the tested one and if the the reported manager is not on the list then they fall to lowest manager level (manager level 3) Something Like this: '
Employee_ID Manager_1 Manager_2 Reporting_Manager Level_of_Reporting_MGR
101 111 112 112 2
102 111 102 111 1
103 111 118 300 3
I have tried using a for loop and iterating through the reporting managers but I am not sure if that is the right approach or not. I am new to coding so this may be simple but I am not sure.
Current code looks like this:
Level_of_Reporting_MGR = []
for num in df['Manager_']:
if num in df['Manager_1']:
Level_of_Reporting_MGR.append(1)
elif num in df['Manager_2']:
Level_of_Reporting_MGR.append(2)
else:
Level_of_Reporting_MGR.append(3)
df['Level_of_Reporting_MGR'] = Level_of_Reporting_MGR

Not had a chance to try this out properly, but here's an outline of how I might approach the job.
def manager_score(series):
sweep_list = ["Manager_1", "Manager_2"]
for e,m in enumerate(sweep_list):
if series['Reporting_Manager']==series[m]:
return e + 1
return len(sweep_list)
df['distance'] = df.apply(manager_score, axis=1)

Turn table-element into Pandas DataFrame

I would like to turn a table into a pandas.DataFrame.
URL = 'https://ladieseuropeantour.com/reports-page/?tourn=1202&tclass=rnk&report=tmscores~season=2015~params=P*4ESC04~#/profile'
The element in question is
from selenium import webdriver
from selenium.webdriver.common.by import By
driver.get(URL)
ranking = driver.find_element(By.XPATH, ".//*[#id='maintablelive']")
I tried the following:
import pandas as pd
pd.read_html(ranking.get_attribute('outerHTML'))[0]
I am also using the dropdown-menu to select multiple rounds. When a different round is selected, driver.current_url doesn't change so I think it's not possible to load these new tables with requests or anything.
Please advice!

Instead of using selenium, you want to access the URL's API endpoint.
Finding the API endpoint
You can trace it as follows:
Open the URL in Chrome
Use Ctrl + Shift + J to open DevTools, navigate to Network, select Fetch/XHR from the sub navbar, and refresh the URL.
This will reload the network connections, and when you click on one of the lines that will appear, you can select Response from the second sub navbar to see if they are returning data.
Going through them, we can locate the connection that is responsible for returning the data for the table, namely https://ladieseuropeantour.com/api/let/cache/let/2015/2015-1202-scores-P*4ESC04.json (you can leave out the query part ?randomadd=1673086921319 at the end).
Now, with this knowledge, we can simply load all the data with requests, investigate the type of information that is contained in the JSON and dump the required sub part in a df.
For example, let's recreate the table from your URL. This one:
Code
import requests
import pandas as pd
url = 'https://ladieseuropeantour.com/api/let/cache/let/2015/2015-1202-scores-P*4ESC04.json'
data = requests.get(url).json()
df = pd.DataFrame(data['scores']['scores_entry'])
cols = ['pos','name','nationality','vspar',
'score_R1','score_R2','score_R3','score_R4', 'score']
df_table = df.loc[:,cols]
df_table.head()
pos name nationality vspar score_R1 score_R2 score_R3 \
0 1 Lydia Ko (NZL) NZL -9 70 70 72
1 2 Amy Yang (KOR) KOR -7 73 70 70
2 3 Ariya Jutanugarn (THA) THA -4 69 71 72
3 4 Jenny Shin (KOR) KOR -2 76 71 74
4 4 Ilhee Lee (KOR) KOR -2 68 82 69
score_R4 score
0 71 283
1 72 285
2 76 288
3 69 290
4 71 290
If you check print(df.columns), you'll see that the main df contains all the data that lies behind all of the tables you can select on the URL. So, have a go at that to select whatever you are looking for.

How to make a function dynamic and accept new parameters in python?

Following up on my previous question
I have a list of records as shown below
taken from this table
itemImage
name
nameFontSize
nameW
nameH
conutry
countryFont
countryW
countryH
code
codeFontSize
codeW
codeH
sample.jpg
Apple
142
1200
200
US
132
1200
400
1564
82
1300
600
sample2.jpg
Orange
142
1200
200
UK
132
1200
400
1562
82
1300
600
sample3.jpg
Lemon
142
1200
200
FR
132
1200
400
1563
82
1300
600
Right now, I have one function setText which takes all the elements of a row from this table.
I only have name, country and code for now but will be adding other stuff in the future.
I want to make this code more future proof and dynamic. For example, If I added four new columns in my data following the same pattern. How do I make python automatically adjust to that? instead of me going and declaring variables in my code every time.
Basically, I want to send each 4 columns starting from name to a function then continue till no column is left. Once that's done go to the next row and continue the loop.
Thanks to #Samwise who helped me clean up the code a bit.
import os
from PIL import Image,ImageFont,ImageDraw, features
import pandas as pd
path='./'
files = []
for (dirpath, dirnames, filenames) in os.walk(path):
files.extend(filenames)
df = pd.read_excel (r'./data.xlsx')
records = list(df.to_records(index=False))
def setText(itemImage, name, nameFontSize, nameW, nameH,
conutry, countryFontSize,countryW, countryH,
code, codeFontSize, codeW, codeH):
font1 = ImageFont.truetype(r'./font.ttf', nameFontSize)
font2 = ImageFont.truetype(r'./font.ttf', countryFontSize)
font3 = ImageFont.truetype(r'./font.ttf', codeFontSize)
file = Image.open(f"./{itemImage}")
draw = ImageDraw.Draw(file)
draw.text((nameW, nameH), name, font=font1, fill='#ff0000',
align="right",anchor="rm")
draw.text((countryW, countryH), conutry, font=font2, fill='#ff0000',
align="right",anchor="rm")
draw.text((codeW, codeH), str(code), font=font3, fill='#ff0000',
align="right",anchor="rm")
file.save(f'done {itemImage}')
for i in records:
setText(*i)

Sounds like df.columns might help. It returns a list, then you can iterate through whatever cols are present.
for col in df.columns():
The answers in this thread should help dial you in:
How to iterate over columns of pandas dataframe to run regression
It sounds like you also want row-wise results, so you could nest within df.iterrows or vice versa...though going cell by cell is generally not desirable and could end up being quite slow as your df grows.
So perhaps be thinking about how you could use your function with df.apply()

Getting errors when converting a text file to VCF format

I have a python code where I am trying to convert a text file containing variant information in the rows to a variant call format file (vcf) for my downstream analysis.
I am getting everything correct but when I am trying to run the code I miss out the first two entries , I mean the first two rows. The code is below, The line which is not reading the entire file is highlighted. I would like some expert advice.
I just started coding in python so I am not well versed entirely with it.
##fileformat=VCFv4.0
##fileDate=20140901
##source=dbSNP
##dbSNP_BUILD_ID=137
##reference=hg19
#CHROM POS ID REF ALT QUAL FILTER INFO
import sys
text=open(sys.argv[1]).readlines()
print text
print "First print"
text=filter(lambda x:x.split('\t')[31].strip()=='KEEP',text[2:])
print text
print "################################################"
text=map(lambda x:x.split('\t')[0]+'\t'+x.split('\t')[1]+'\t.\t'+x.split('\t')[2]+'\t'+x.split('\t')[3]+'\t.\tPASS\t.\n',text)
print text
file=open(sys.argv[1].replace('.txt','.vcf'),'w')
file.write('##fileformat=VCFv4.0\n')
file.write('##source=dbSNP')
file.write('##dbSNP_BUILD_ID=137')
file.write('##reference=hg19\n')
file.write('#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n')
for i in text:
file.write(i)
file.close()
INPUT:
chrM 152 T C T_S7998 N_S8980 0 DBSNP COVERED 1 1 1 282 36 0 163.60287 0.214008 0.02 11.666081 202 55 7221 1953 0 0 TT 14.748595 49 0 1786 0 KEEP
chr9 311 T C T_S7998 N_S8980 0 NOVEL COVERED 0.993882 0.999919 0.993962 299 0 0 207.697923 1 0.02 1.854431 0 56 0 1810 1 116 CC -44.649001 0 12 0 390 KEEP
chr13 440 C T T_S7998 N_S8980 0 NOVEL COVERED 1 1 1 503 7 0 4.130339 0.006696 0.02 4.124606 445 3 16048 135 0 0 CC 12.942762 40 0 1684 0 KEEP
OUTPUT desired:
##fileformat=VCFv4.0
##source=dbSNP##dbSNP_BUILD_ID=137##reference=hg19
#CHROM POS ID REF ALT QUAL FILTER INFO
chrM 152 . T C . PASS .
chr9 311 . T C . PASS .
chr13 440 . C T . PASS .
OUTPUT obtained:
##fileformat=VCFv4.0
##source=dbSNP##dbSNP_BUILD_ID=137##reference=hg19
#CHROM POS ID REF ALT QUAL FILTER INFO
chr13 440 . C T . PASS .
I would like to have some help regarding how this error can be rectified.

There are couple of issues with your code
In the filter function you are passing text[2:]. I think you want to pass text to get all the rows.
In the last loop where you write to the .vcf file, you are closing the file inside the loop. You should first write all the values and then close the file outside the loop.
So your code will look like (I removed all the prints):
import sys
text=open(sys.argv[1]).readlines()
text=filter(lambda x:x.split('\t')[31].strip()=='KEEP',text) # Pass text
text=map(lambda x:x.split('\t')[0]+'\t'+x.split('\t')[1]+'\t.\t'+x.split('\t')[2]+'\t'+x.split('\t')[3]+'\t.\tPASS\t.\n',text)
file=open(sys.argv[1].replace('.txt','.vcf'),'w')
file.write('##fileformat=VCFv4.0\n')
file.write('##source=dbSNP')
file.write('##dbSNP_BUILD_ID=137')
file.write('##reference=hg19\n')
file.write('#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n')
for i in text:
file.write(i)
file.close() # close after writing all the values, in the end

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dataframe recursively trace the values until one column is zero - python

Related

Creating manager levels in Python

Trying to Create a Manager Hierarchy

Turn table-element into Pandas DataFrame

How to make a function dynamic and accept new parameters in python?

Getting errors when converting a text file to VCF format

Categories

Resources