Python - How to iterate over nested data

Python - How to iterate over nested data - python

I've seen a lot of information on how to iterate over nested data, but none seem to be applicable to my problem - perhaps my method of nesting is not ideal...
I'm working on a web scraping problem to increase the efficiency of my workflow (breaking out and presenting data in a better format than the website provides). I have a class that contains information about a contractor called ContractorData that has a property subs whcih is a list that contains more ContractorData and this nest can continue (each contractor can have sub contractors and those sub contractor can have sub contractors ...)
I have an efficient way to build this hierarchy, but I'm now struggling to find a way to iterate over every contractor down the hierarchy.
class ContractorData():
def __init__(self, soup:BeautifulSoup, parentId=None):
self.contractorName = soup.find('a', id=True).getText()
self.id = int(soup.find('a', id=True).get('href').split('&')[-1].split('=')[1])
status_options = ['Enrolled', 'Excluded', 'Pending']
self.status = 'Unknown'
for i in range(3):
cls = f'contractorStatusCol{i+1}'
if 'gray' not in soup.find('div', class_=cls).find('img').get('src'):
self.status = status_options[i]
break
self.date_enrolled = soup.find('div', class_ = 'contractorStatusCol4').get_text().replace(u'\xa0', '')
self.has_subs = soup.find('a', class_="expand") is not None
self.subs = []
self.parent = parentId
building the hierarchy: (general_contractor is the top of the hierarchy)
def load_subs(pid, cid, level):
sub_data = {'mode': 'loadEnrollmentCRUD', 'projectId': pid, 'contractorId': cid, 'level':level}
sub_resp = session.post('https://my-site.com/ajax-contractor.html', data=sub_data)
sub_soup = BeautifulSoup(sub_resp.text, 'html.parser')
return [ContractorData(x, cid) for x in sub_soup.find_all('div', class_='contractorStatusCRUD')]
level = 0
has_sub_list = [general_contractor]
while has_sub_list:
new_sub_list = []
level = level + 1
for x in has_sub_list:
x.subs = load_subs(PROJECT_ID, x.id, level)
new_sub_list.extend([y for y in x.subs if y.has_subs])
has_sub_list = new_sub_list
I'm thinking to use another while loop like the one I used to build the data, but I can't help but think my data architecture isn't optimal for this type of problem.
Edit based on comments:
The goal is to traverse through all contractors under the general_contractor and check their status to see if I need to take action on them.
I did just learn about a recursive function that can call itself which may work for this.
def walk(sub):
if sub.status != 'Enrolled':
# handle this case
pass
if sub.subs:
walk(sub.subs)
Thanks!

Recursive functions to the rescue! I didn't know this was possible, but it's very crisp and clean IMO.
flags = []
def check_subs(contractor:ContractorData):
if contractor.status == 'Pending': flags.append(contractor)
for x in contractor.subs:
check_subs(x)
check_subs(general_contractor)
for c in flags:
print(f'{c.contractorName}, {c.status}')

Related

What is the best way to return a variable or call a function to maximize code reuse?

I was wondering if i could get some input from some season python exports, i have a couple questions
I am extracting data from an api request and calculating the total vulnerabilities,
what is the best way i can return this data so that i can call it in another function
what is the way i can add up all the vulnerabilities (right now its just adding it per 500 at a time, id like to do the sum of every vulnerability
def _request():
third_party_patching_filer = {
"asset": "asset.agentKey IS NOT NULL",
"vulnerability" : "vulnerability.categories NOT IN ['microsoft patch']"}
headers = _headers()
print(headers)
url1 = f"https://us.api.insight.rapid7.com/vm/v4/integration/assets"
resp = requests.post(url=url1, headers=headers, json=third_party_patching_filer, verify=False).json()
jsonData = resp
#print(jsonData)
has_next_cursor = False
nextKey = ""
if "cursor" in jsonData["metadata"]:
has_next_cursor = True
nextKey = jsonData["metadata"]["cursor"]
while has_next_cursor:
url2 = f"https://us.api.insight.rapid7.com/vm/v4/integration/assets?&size=500&cursor={nextKey}"
resp2 = requests.post(url=url2, headers=headers, json=third_party_patching_filer, verify=False).json()
cursor = resp2["metadata"]
print(cursor)
if "cursor" in cursor:
nextKey = cursor["cursor"]
print(f"next key {nextKey}")
#print(desktop_support)
for data in resp2["data"]:
for tags in data['tags']:
total_critical_vul_osswin = []
total_severe_vul_osswin = []
total_modoer_vuln_osswin = []
if tags["name"] == 'OSSWIN':
print("OSSWIN")
critical_vuln_osswin = data['critical_vulnerabilities']
severe_vuln_osswin = data['severe_vulnerabilities']
modoer_vuln_osswin = data['moderate_vulnerabilities']
total_critical_vul_osswin.append(critical_vuln_osswin)
total_severe_vul_osswin.append(severe_vuln_osswin)
total_modoer_vuln_osswin.append(modoer_vuln_osswin)
print(sum(total_critical_vul_osswin))
print(sum(total_severe_vul_osswin))
print(sum(total_modoer_vuln_osswin))
if tags["name"] == 'DESKTOP_SUPPORT':
print("Desktop")
total_critical_vul_desktop = []
total_severe_vul_desktop = []
total_modorate_vuln_desktop = []
critical_vuln_desktop = data['critical_vulnerabilities']
severe_vuln_desktop = data['severe_vulnerabilities']
moderate_vuln_desktop = data['moderate_vulnerabilities']
total_critical_vul_desktop.append(critical_vuln_desktop)
total_severe_vul_desktop.append(severe_vuln_desktop)
total_modorate_vuln_desktop.append(moderate_vuln_desktop)
print(sum(total_critical_vul_desktop))
print(sum(total_severe_vul_desktop))
print(sum(total_modorate_vuln_desktop))
else:
pass
else:
has_next_cursor = False

If you have a lot of parameters to pass, consider using a dict to combine them. Then you can just return the dict and pass it along to the next function that needs that data. Another approach would be to create a class and either access the variables directly or have helper functions that do so. The latter is a cleaner solution vs a dict, since with a dict you have to quote every variable name, and with a class you can easily add additional functionally beyond just being a container for a bunch of instance variables.
If you want the total across all the data, you should put these initializations:
total_critical_vul_osswin = []
total_severe_vul_osswin = []
total_modoer_vuln_osswin = []
before the while has_next_cursor loop (and similarly for the desktop totals). The way your code is currently, they are initialized each cursor (ie, each 500 samples based on the URL).

Confused regarding the usage of -init_ & self in a class

I first wrote the necessary code to get the information I wanted from the internet, and it works. But now I'm trying to make the code look a bit nicer, therefore I want to put it into functions that are in a class. But I'm a bit confused when it comes to the usages of self and _init_. Currently, the code isn't working as I want, meaning it isn't adding the information to my dictionary.
As I have understood, you have to add self as a parameter in every function you create in a class. But I don't think I'm using the _init_ in a correct way.
from bs4 import BeautifulSoup
import requests
# Importing data from Nasdaq
page_link = "https://www.nasdaq.com/symbol/aapl/financials?query=balance-sheet"
page_response = requests.get(page_link, timeout=1000)
page_content = BeautifulSoup(page_response.content, "lxml")
# Creating class that gather essential stock information
class CompanySheet:
# creating dictionary to store stock information
def __init__(self):
self.stockInfo = {
"ticker": "",
"sharePrice": "",
"assets": "",
"liabilities": "",
"shareholderEquity": ""
}
def ticker(self):
# Finding ticker
self.tickerSymbol = page_content.find("div", attrs={"class":"qbreadcrumb"})
self.a_TickerList = self.tickerSymbol.findAll("a")
self.a_TickerList = (self.a_TickerList[2].text)
# Adding ticker to dictionary
self.stockInfo["ticker"] = self.a_TickerList
print(self.a_TickerList)
def share(self):
# Finding share price
self.sharePrice = page_content.findAll("div", attrs={"id":"qwidget_lastsale"})
self.aSharePrice = (self.sharePrice[0].text)
# Transforming share price to desired format
self.aSharePrice = str(self.aSharePrice[1:]).replace( ',' , '' )
self.aSharePrice = float(self.aSharePrice)
# Adding results to dictionary
self.stockInfo["sharePrice"] = self.aSharePrice
"""
def assets(self):
# Finding total assets
totalAssets = page_content.findAll("tr", attrs={"class":"net"})[1]
td_assetList = totalAssets.findAll("td")
tdAssets = (td_assetList[22].text)
# Transforming share price to desired format
tdAssets = str(tdAssets[1:]).replace( ',' , '' )
tdAssets = float(tdAssets)
# Adding results to dictionary
self.stockInfo["assets"] = tdAssets
def liabilites(self):
# Finding total liabilities
totalLiabilities = page_content.findAll("tr", attrs={"class":"net"})[3]
td_liabilityList = totalLiabilities.findAll("td")
tdLiabilities = (td_liabilityList[24].text)
# Transforming share price to desired format
tdLiabilities = str(tdLiabilities[1:]).replace( ',' , '' )
tdLiabilities = float(tdLiabilities)
# Adding results to dictionary
self.stockInfo["liabilities"] = tdLiabilities
def equity(self):
# Finding shareholder equity
netEquity = page_content.findAll("tr", attrs={"class":"net"})[4]
td_equityList = netEquity.findAll("td")
tdShareholderEquity = (td_equityList[24].text)
# Transforming shareholder equity to desired format
tdShareholderEquity = str(tdShareholderEquity[1:]).replace( ',' , '' )
tdShareholderEquity = float(tdShareholderEquity)
# Adding results to dictionary
self.stockInfo["shareholderEquity"] = tdShareholderEquity
"""
companySheet = CompanySheet()
print(companySheet.stockInfo)
All I want the code to do is for each function to parse it's information to my dictionary. I then want to access it outside of the class. Can someone help to clarify how I can use _init_ in this scenario, or do I even have to use it?

init is a constructor, which is called along with the creation of the class object. Whereas, self is an instance of the class, which is used accessing methods and attributes of a python class.
In your code, firstly change:
_init_(self) to __init__(self)
Then, in the methods:
def share(self):
# Finding share price
sharePrice = page_content.findAll("div", attrs={"id":"qwidget_lastsale"})
self.aSharePrice = (sharePrice[0].text)
# Transforming share price to desired format
self.aSharePrice = str(aSharePrice[1:]).replace( ',' , '' )
self.aSharePrice = float(aSharePrice)
# Adding results to dictionary
self.stockInfo["sharePrice"] = self.aSharePrice
Similarly, in all the remaining methods, access the variable through the self keyword.
Now, you also need to call the methods which are updating your dictionary.
So, after you have created the object, call the methods through the object and then print the dictionary, like this:
companySheet = CompanySheet()
companySheet.share()
print(companySheet.stockInfo)
Probably it would work!

How do I make it so I only need my api key referenced once?

I am teaching myself how to use python and django to access the google places api to make nearby searches for different types of gyms.
I was only taught how to use python and django with databases you build locally.
I wrote out a full Get request for they four different searches I am doing. I looked up examples but none seem to work for me.
allgyms = requests.get('https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=38.9208,-77.036&radius=2500&type=gym&key=AIzaSyDOwVK7bGap6b5Mpct1cjKMp7swFGi3uGg')
all_text = allgyms.text
alljson = json.loads(all_text)
healthclubs = requests.get('https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=38.9208,-77.036&radius=2500&type=gym&keyword=healthclub&key=AIzaSyDOwVK7bGap6b5Mpct1cjKMp7swFGi3uGg')
health_text = healthclubs.text
healthjson = json.loads(health_text)
crossfit = requests.get('https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=38.9208,-77.036&radius=2500&type=gym&keyword=crossfit&key=AIzaSyDOwVK7bGap6b5Mpct1cjKMp7swFGi3uGg')
cross_text = crossfit.text
crossjson = json.loads(cross_text)
I really would like to be pointed in the right direction on how to have the api key referenced only one time while changing the keywords.

Try this for better readability and better reusability
BASE_URL = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?'
LOCATION = '38.9208,-77.036'
RADIUS = '2500'
TYPE = 'gym'
API_KEY = 'AIzaSyDOwVK7bGap6b5Mpct1cjKMp7swFGi3uGg'
KEYWORDS = ''
allgyms = requests.get(BASE_URL+'location='+LOCATION+'&radius='+RADIUS+'&type='+TYPE+'&key='+API_KEY) all_text = allgyms.text
alljson = json.loads(all_text)
KEYWORDS = 'healthclub'
healthclubs = requests.get(BASE_URL+'location='+LOCATION+'&radius='+RADIUS+'&type='+TYPE+'&keyword='+KEYWORDS+'&key='+API_KEY)
health_text = healthclubs.text
healthjson = json.loads(health_text)
KEYWORDS = 'crossfit'
crossfit = requests.get(BASE_URL+'location='+LOCATION+'&radius='+RADIUS+'&type='+TYPE+'&keyword='+KEYWORDS+'&key='+API_KEY)
cross_text = crossfit.text
crossjson = json.loads(cross_text)
as V-R suggested in a comment you can go further and define function which makes things more reusable allowing you to use the that function in other places of your application
Function implementation
def makeRequest(location, radius, type, keywords):
BASE_URL = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?'
API_KEY = 'AIzaSyDOwVK7bGap6b5Mpct1cjKMp7swFGi3uGg'
result = requests.get(BASE_URL+'location='+location+'&radius='+radius+'&type='+type+'&keyword='+keywords+'&key='+API_KEY)
jsonResult = json.loads(result)
return jsonResult
Function invocation
json = makeRequest('38.9208,-77.036', '2500', 'gym', '')
Let me know if there is an issue

create pull down from folders in nuke, then evaluate first pull down to populate second pull down inpython

I am trying to create a panel that opens on nuke start up and sets a few parameters.
What it is I want to do is have a series of pulldowns on the same panel, the items in the pulldowns will be from folders.
Problem I am having is, I would like to set the first pull down and from the choice of this pull down the second pull down reflects that choice and it menu items reflect that change and so on with each pull down, basically digging down a folder structure but each pull down result is used a variable.
I have not got very far but
import os
import nuke
import nukescripts
## define panel
pm = nuke.Panel("project Manager")
## create pulldown menus
jobPath = pm.addEnumerationPulldown( 'project', os.walk('/Volumes/Production_02/000_jobs/projects').next()[1])
seqPath = pm.addEnumerationPulldown('sequence', os.walk('/Volumes/Production_02/000_jobs/projects').next()[1])
shotPath = pm.addEnumerationPulldown('shot', os.walk('/Volumes/Production_02/000_jobs/projects').next()[1])
print jobPath
print seqPath
print shotPath
#pm.addKnob(job)
#pm.addKnob(seq)
#pm.addKnob(shot)
pm.show()
also the strings that appear in the pull downs are surounded by [' ' and so on?
cheers
-adam

You probably want to use a PythonPanel, rather than the old-style Panel, which is basically a TCL wrapper. That way, you can get callbacks when the knobs in the panel are changed.
Here's a basic example:
import os
import nuke
import nukescripts.panels
class ProjectManager(nukescripts.panels.PythonPanel):
def __init__(self, rootDir='/Volumes/Production_02/000_jobs/projects'):
super(ProjectManager, self).__init__('ProjectManager', 'id.ProjectManager')
self.rootDir = rootDir
self.project = self.sequence = self.shot = None
projectDirs = [x for x in os.listdir(rootDir)
if os.path.isdir(os.path.join(rootDir, x))]
self.project = projectDirs[0]
self.projectEnum = nuke.Enumeration_Knob('project', 'project', projectDirs)
self.addKnob(self.projectEnum)
self.seqEnum = nuke.Enumeration_Knob('sequence', 'sequence', [])
self.addKnob(self.seqEnum)
self.shotEnum = nuke.Enumeration_Knob('shot', 'shot', [])
self.addKnob(self.shotEnum)
self._projectChanged()
def _projectChanged(self):
self.project = self.projectEnum.value()
projectDir = os.path.join(self.rootDir, self.project)
projectSeqDirs = [x for x in os.listdir(projectDir)
if os.path.isdir(os.path.join(projectDir, x))]
self.seqEnum.setValues(projectSeqDirs)
self._sequenceChanged()
def _sequenceChanged(self):
s = self.seqEnum.value()
if s:
self.sequence = s
seqDir = os.path.join(self.rootDir, self.project, s)
seqShotDirs = [x for x in os.listdir(seqDir)
if os.path.isdir(os.path.join(seqDir, x))]
else:
self.sequence = None
seqShotDirs = []
self.shotEnum.setValues(seqShotDirs)
self._shotChanged()
def knobChanged(self, knob):
if knob is self.projectEnum:
self._projectChanged()
elif knob is self.seqEnum:
self._sequenceChanged()
elif knob is self.shotEnum:
self.shot = self.shotEnum.value()
p = ProjectManager()
if p.showModalDialog():
print p.project, p.sequence, p.shot
Note that this example is only to demonstrate the basic design of a PythonPanel subclass. It has a few small logic issues (in the context of Nuke), and is written to be as clear as possible, rather than as efficient or idiomatic as possible.
Anyway, hopefully this gives you an idea of how to go about building what you're after.

XML Parsing in Python using document builder factory

I am working in STAF and STAX. Here python is used for coding . I am new to python.
Basically my task is to parse a XML file in python using Document Factory Parser.
The XML file I am trying to parse is :
<?xml version="1.0" encoding="utf-8"?>
<operating_system>
<unix_80sp1>
<tests type="quick_sanity_test">
<prerequisitescript>preparequicksanityscript</prerequisitescript>
<acbuildpath>acbuildpath</acbuildpath>
<testsuitscript>test quick sanity script</testsuitscript>
<testdir>quick sanity dir</testdir>
</tests>
<machine_name>u80sp1_L004</machine_name>
<machine_name>u80sp1_L005</machine_name>
<machine_name>xyz.pxy.dxe.cde</machine_name>
<vmware id="155.35.3.55">144.35.3.90</vmware>
<vmware id="155.35.3.56">144.35.3.91</vmware>
</unix_80sp1>
</operating_system>
I need to read all the tags .
For the tags machine_name i need to read them into a list
say all machine names should be in a list machname.
so machname should be [u80sp1_L004,u80sp1_L005,xyz.pxy.dxe.cde] after reading the tags.
I also need all the vmware tags:
all attributes should be vmware_attr =[155.35.3.55,155.35.3.56]
all vmware values should be vmware_value = [ 144.35.3.90,155.35.3.56]
I am able to read all tags properly except vmware tags and machine name tags:
I am using the following code:(i am new to xml and vmware).Help required.
The below code needs to be modified.
factory = DocumentBuilderFactory.newInstance();
factory.setValidating(1)
factory.setIgnoringElementContentWhitespace(0)
builder = factory.newDocumentBuilder()
document = builder.parse(xmlFileName)
vmware_value = None
vmware_attr = None
machname = None
# Get the text value for the element with tag name "vmware"
nodeList = document.getElementsByTagName("vmware")
for i in range(nodeList.getLength()):
node = nodeList.item(i)
if node.getNodeType() == Node.ELEMENT_NODE:
children = node.getChildNodes()
for j in range(children.getLength()):
thisChild = children.item(j)
if (thisChild.getNodeType() == Node.TEXT_NODE):
vmware_value = thisChild.getNodeValue()
vmware_attr ==??? what method to use ?
# Get the text value for the element with tag name "machine_name"
nodeList = document.getElementsByTagName("machine_name")
for i in range(nodeList.getLength()):
node = nodeList.item(i)
if node.getNodeType() == Node.ELEMENT_NODE:
children = node.getChildNodes()
for j in range(children.getLength()):
thisChild = children.item(j)
if (thisChild.getNodeType() == Node.TEXT_NODE):
machname = thisChild.getNodeValue()
Also how to check if a tag exists or not at all. I need to code the parsing properly.

You are need to instantiate vmware_value, vmware_attr and machname as lists not as strings, so instead of this:
vmware_value = None
vmware_attr = None
machname = None
do this:
vmware_value = []
vmware_attr = []
machname = []
Then, to add items to the list, use the append method on your lists. E.g.:
factory = DocumentBuilderFactory.newInstance();
factory.setValidating(1)
factory.setIgnoringElementContentWhitespace(0)
builder = factory.newDocumentBuilder()
document = builder.parse(xmlFileName)
vmware_value = []
vmware_attr = []
machname = []
# Get the text value for the element with tag name "vmware"
nodeList = document.getElementsByTagName("vmware")
for i in range(nodeList.getLength()):
node = nodeList.item(i)
vmware_attr.append(node.attributes["id"].value)
if node.getNodeType() == Node.ELEMENT_NODE:
children = node.getChildNodes()
for j in range(children.getLength()):
thisChild = children.item(j)
if (thisChild.getNodeType() == Node.TEXT_NODE):
vmware_value.append(thisChild.getNodeValue())
I've also edited the code to something I think should work to append the correct values to vmware_attr and vmware_value.
I had to make the assumption that STAX uses xml.dom syntax, so if that isn't the case, you will have to edit my suggestion appropriately.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - How to iterate over nested data - python

Related

What is the best way to return a variable or call a function to maximize code reuse?

Confused regarding the usage of -init_ & self in a class

How do I make it so I only need my api key referenced once?

create pull down from folders in nuke, then evaluate first pull down to populate second pull down inpython

XML Parsing in Python using document builder factory

Categories

Resources