Confused regarding the usage of -init_ & self in a class - python

I first wrote the necessary code to get the information I wanted from the internet, and it works. But now I'm trying to make the code look a bit nicer, therefore I want to put it into functions that are in a class. But I'm a bit confused when it comes to the usages of self and _init_. Currently, the code isn't working as I want, meaning it isn't adding the information to my dictionary.
As I have understood, you have to add self as a parameter in every function you create in a class. But I don't think I'm using the _init_ in a correct way.
from bs4 import BeautifulSoup
import requests
# Importing data from Nasdaq
page_link = "https://www.nasdaq.com/symbol/aapl/financials?query=balance-sheet"
page_response = requests.get(page_link, timeout=1000)
page_content = BeautifulSoup(page_response.content, "lxml")
# Creating class that gather essential stock information
class CompanySheet:
# creating dictionary to store stock information
def __init__(self):
self.stockInfo = {
"ticker": "",
"sharePrice": "",
"assets": "",
"liabilities": "",
"shareholderEquity": ""
}
def ticker(self):
# Finding ticker
self.tickerSymbol = page_content.find("div", attrs={"class":"qbreadcrumb"})
self.a_TickerList = self.tickerSymbol.findAll("a")
self.a_TickerList = (self.a_TickerList[2].text)
# Adding ticker to dictionary
self.stockInfo["ticker"] = self.a_TickerList
print(self.a_TickerList)
def share(self):
# Finding share price
self.sharePrice = page_content.findAll("div", attrs={"id":"qwidget_lastsale"})
self.aSharePrice = (self.sharePrice[0].text)
# Transforming share price to desired format
self.aSharePrice = str(self.aSharePrice[1:]).replace( ',' , '' )
self.aSharePrice = float(self.aSharePrice)
# Adding results to dictionary
self.stockInfo["sharePrice"] = self.aSharePrice
"""
def assets(self):
# Finding total assets
totalAssets = page_content.findAll("tr", attrs={"class":"net"})[1]
td_assetList = totalAssets.findAll("td")
tdAssets = (td_assetList[22].text)
# Transforming share price to desired format
tdAssets = str(tdAssets[1:]).replace( ',' , '' )
tdAssets = float(tdAssets)
# Adding results to dictionary
self.stockInfo["assets"] = tdAssets
def liabilites(self):
# Finding total liabilities
totalLiabilities = page_content.findAll("tr", attrs={"class":"net"})[3]
td_liabilityList = totalLiabilities.findAll("td")
tdLiabilities = (td_liabilityList[24].text)
# Transforming share price to desired format
tdLiabilities = str(tdLiabilities[1:]).replace( ',' , '' )
tdLiabilities = float(tdLiabilities)
# Adding results to dictionary
self.stockInfo["liabilities"] = tdLiabilities
def equity(self):
# Finding shareholder equity
netEquity = page_content.findAll("tr", attrs={"class":"net"})[4]
td_equityList = netEquity.findAll("td")
tdShareholderEquity = (td_equityList[24].text)
# Transforming shareholder equity to desired format
tdShareholderEquity = str(tdShareholderEquity[1:]).replace( ',' , '' )
tdShareholderEquity = float(tdShareholderEquity)
# Adding results to dictionary
self.stockInfo["shareholderEquity"] = tdShareholderEquity
"""
companySheet = CompanySheet()
print(companySheet.stockInfo)
All I want the code to do is for each function to parse it's information to my dictionary. I then want to access it outside of the class. Can someone help to clarify how I can use _init_ in this scenario, or do I even have to use it?

init is a constructor, which is called along with the creation of the class object. Whereas, self is an instance of the class, which is used accessing methods and attributes of a python class.
In your code, firstly change:
_init_(self) to __init__(self)
Then, in the methods:
def share(self):
# Finding share price
sharePrice = page_content.findAll("div", attrs={"id":"qwidget_lastsale"})
self.aSharePrice = (sharePrice[0].text)
# Transforming share price to desired format
self.aSharePrice = str(aSharePrice[1:]).replace( ',' , '' )
self.aSharePrice = float(aSharePrice)
# Adding results to dictionary
self.stockInfo["sharePrice"] = self.aSharePrice
Similarly, in all the remaining methods, access the variable through the self keyword.
Now, you also need to call the methods which are updating your dictionary.
So, after you have created the object, call the methods through the object and then print the dictionary, like this:
companySheet = CompanySheet()
companySheet.share()
print(companySheet.stockInfo)
Probably it would work!

Related

Python - How to iterate over nested data

I've seen a lot of information on how to iterate over nested data, but none seem to be applicable to my problem - perhaps my method of nesting is not ideal...
I'm working on a web scraping problem to increase the efficiency of my workflow (breaking out and presenting data in a better format than the website provides). I have a class that contains information about a contractor called ContractorData that has a property subs whcih is a list that contains more ContractorData and this nest can continue (each contractor can have sub contractors and those sub contractor can have sub contractors ...)
I have an efficient way to build this hierarchy, but I'm now struggling to find a way to iterate over every contractor down the hierarchy.
class ContractorData():
def __init__(self, soup:BeautifulSoup, parentId=None):
self.contractorName = soup.find('a', id=True).getText()
self.id = int(soup.find('a', id=True).get('href').split('&')[-1].split('=')[1])
status_options = ['Enrolled', 'Excluded', 'Pending']
self.status = 'Unknown'
for i in range(3):
cls = f'contractorStatusCol{i+1}'
if 'gray' not in soup.find('div', class_=cls).find('img').get('src'):
self.status = status_options[i]
break
self.date_enrolled = soup.find('div', class_ = 'contractorStatusCol4').get_text().replace(u'\xa0', '')
self.has_subs = soup.find('a', class_="expand") is not None
self.subs = []
self.parent = parentId
building the hierarchy: (general_contractor is the top of the hierarchy)
def load_subs(pid, cid, level):
sub_data = {'mode': 'loadEnrollmentCRUD', 'projectId': pid, 'contractorId': cid, 'level':level}
sub_resp = session.post('https://my-site.com/ajax-contractor.html', data=sub_data)
sub_soup = BeautifulSoup(sub_resp.text, 'html.parser')
return [ContractorData(x, cid) for x in sub_soup.find_all('div', class_='contractorStatusCRUD')]
level = 0
has_sub_list = [general_contractor]
while has_sub_list:
new_sub_list = []
level = level + 1
for x in has_sub_list:
x.subs = load_subs(PROJECT_ID, x.id, level)
new_sub_list.extend([y for y in x.subs if y.has_subs])
has_sub_list = new_sub_list
I'm thinking to use another while loop like the one I used to build the data, but I can't help but think my data architecture isn't optimal for this type of problem.
Edit based on comments:
The goal is to traverse through all contractors under the general_contractor and check their status to see if I need to take action on them.
I did just learn about a recursive function that can call itself which may work for this.
def walk(sub):
if sub.status != 'Enrolled':
# handle this case
pass
if sub.subs:
walk(sub.subs)
Thanks!
Recursive functions to the rescue! I didn't know this was possible, but it's very crisp and clean IMO.
flags = []
def check_subs(contractor:ContractorData):
if contractor.status == 'Pending': flags.append(contractor)
for x in contractor.subs:
check_subs(x)
check_subs(general_contractor)
for c in flags:
print(f'{c.contractorName}, {c.status}')

What is the best way to return a variable or call a function to maximize code reuse?

I was wondering if i could get some input from some season python exports, i have a couple questions
I am extracting data from an api request and calculating the total vulnerabilities,
what is the best way i can return this data so that i can call it in another function
what is the way i can add up all the vulnerabilities (right now its just adding it per 500 at a time, id like to do the sum of every vulnerability
def _request():
third_party_patching_filer = {
"asset": "asset.agentKey IS NOT NULL",
"vulnerability" : "vulnerability.categories NOT IN ['microsoft patch']"}
headers = _headers()
print(headers)
url1 = f"https://us.api.insight.rapid7.com/vm/v4/integration/assets"
resp = requests.post(url=url1, headers=headers, json=third_party_patching_filer, verify=False).json()
jsonData = resp
#print(jsonData)
has_next_cursor = False
nextKey = ""
if "cursor" in jsonData["metadata"]:
has_next_cursor = True
nextKey = jsonData["metadata"]["cursor"]
while has_next_cursor:
url2 = f"https://us.api.insight.rapid7.com/vm/v4/integration/assets?&size=500&cursor={nextKey}"
resp2 = requests.post(url=url2, headers=headers, json=third_party_patching_filer, verify=False).json()
cursor = resp2["metadata"]
print(cursor)
if "cursor" in cursor:
nextKey = cursor["cursor"]
print(f"next key {nextKey}")
#print(desktop_support)
for data in resp2["data"]:
for tags in data['tags']:
total_critical_vul_osswin = []
total_severe_vul_osswin = []
total_modoer_vuln_osswin = []
if tags["name"] == 'OSSWIN':
print("OSSWIN")
critical_vuln_osswin = data['critical_vulnerabilities']
severe_vuln_osswin = data['severe_vulnerabilities']
modoer_vuln_osswin = data['moderate_vulnerabilities']
total_critical_vul_osswin.append(critical_vuln_osswin)
total_severe_vul_osswin.append(severe_vuln_osswin)
total_modoer_vuln_osswin.append(modoer_vuln_osswin)
print(sum(total_critical_vul_osswin))
print(sum(total_severe_vul_osswin))
print(sum(total_modoer_vuln_osswin))
if tags["name"] == 'DESKTOP_SUPPORT':
print("Desktop")
total_critical_vul_desktop = []
total_severe_vul_desktop = []
total_modorate_vuln_desktop = []
critical_vuln_desktop = data['critical_vulnerabilities']
severe_vuln_desktop = data['severe_vulnerabilities']
moderate_vuln_desktop = data['moderate_vulnerabilities']
total_critical_vul_desktop.append(critical_vuln_desktop)
total_severe_vul_desktop.append(severe_vuln_desktop)
total_modorate_vuln_desktop.append(moderate_vuln_desktop)
print(sum(total_critical_vul_desktop))
print(sum(total_severe_vul_desktop))
print(sum(total_modorate_vuln_desktop))
else:
pass
else:
has_next_cursor = False
If you have a lot of parameters to pass, consider using a dict to combine them. Then you can just return the dict and pass it along to the next function that needs that data. Another approach would be to create a class and either access the variables directly or have helper functions that do so. The latter is a cleaner solution vs a dict, since with a dict you have to quote every variable name, and with a class you can easily add additional functionally beyond just being a container for a bunch of instance variables.
If you want the total across all the data, you should put these initializations:
total_critical_vul_osswin = []
total_severe_vul_osswin = []
total_modoer_vuln_osswin = []
before the while has_next_cursor loop (and similarly for the desktop totals). The way your code is currently, they are initialized each cursor (ie, each 500 samples based on the URL).

extract dicitonary values from non-subscriptable object-type in python

i'm a python novice, trying to learn and be useful at work at the same time
we use DespatchBay to send parcels. they have a SOAP API which i don't entirely understand, and am using an SDK they released.
before booking a collection and producing a label my code queries the api to get available services, and available dates, and returns what i think are custom object-types containing the info i need. i want to extract and then print information from these objects so that i can confirm the correct details have been used.
postcode = "NW1 4RY"
street_num = 1
recipient_address = client.find_address(postcode, street_num)
print (recipient_address)
yields:
(AddressType){
CompanyName = "London Zoo"
Street = "Regents Park"
Locality = None
TownCity = "London"
County = None
PostalCode = "NW1 4RY"
CountryCode = "GB"
}
i can see there's a dictionary there, and i want to drill down into it to extract details, but i don't understand the "(AddressType)" before the dictionary - how do i get past it and call values from the dictionary?
AddressType has some ref here but it doesn't shine much light for me
thanks for any help you can offer!
full code: sdk ref
import os
from despatchbay.despatchbay_sdk import DespatchBaySDK
from pprint import pprint
api_user = os.getenv('DESPATCH_API_USER')
api_key = os.getenv('DESPATCH_API_KEY')
client = DespatchBaySDK(api_user=api_user, api_key=api_key)
sender_id = '5536'
# inputs
postcode = "NW1 4RY"
street_num = 1
customer = "Testy Mctestson"
phone = "07666666666"
email = "testicles#tested.com"
num_boxes = 2
collection_date = '2022-09-11'
recipient_address = client.find_address(postcode, street_num)
recipient = client.recipient(
name=customer,
telephone=phone,
email=email,
recipient_address=recipient_address
)
print (recipient_address)
parcels = []
parcel_names = []
for x in range(num_boxes):
parcelname = "my_parcel_" + str(x + 1)
parcel_names.append(parcelname)
for my_parcel in parcel_names:
go_parcel = client.parcel(
contents="Radios",
value=500,
weight=6,
length=60,
width=40,
height=40,
)
parcels.append(go_parcel)
sender = client.sender(
address_id=sender_id
)
shipment_request = client.shipment_request(
parcels=parcels,
client_reference=customer,
collection_date=collection_date,
sender_address=sender,
recipient_address=recipient,
follow_shipment='true'
)
services = client.get_available_services(shipment_request)
shipment_request.service_id = services[0].service_id
dates = client.get_available_collection_dates(sender, services[0].courier.courier_id)
print(customer + "'s shipment of",num_boxes, "parcels will be collected from: ",recipient['RecipientAddress'], "on", dates[0])
shipment_request.collection_date = dates[0]
added_shipment = client.add_shipment(shipment_request)
client.book_shipments([added_shipment])
shipment_return = client.get_shipment(added_shipment)
label_pdf = client.get_labels(shipment_return.shipment_document_id)
label_pdf.download('./' + customer + '.pdf')
You do not know exactly how they store the data inside of their object, even if it looks like a dictionary. you could run print(dir(recipient_address)) and see what the "inside of the object looks like". once you get the output you start inspecting which attributes or methods may have the data you want. Of course, you should always follow the published contract for interacting withe these objects, as the implementation details can always change. I've examine the source code of this object published here
https://github.com/despatchbay/despatchbay-python-sdk/blob/master/despatchbay/despatchbay_entities.py#L169
It turns out that it doesn't use a dictionary. It looks like you are meant to just access the data via attributes like follows:
recipient_address.company_name
recipient_address.street
recipient_address.locality
recipient_address.town_city
recipient_address.county
recipient_address.postal_code
recipient_address.country_code
I agree, their python sdk could use improved documentation..

Python Flask and SQLAlchemy, selecting all data from a column

I am attempting to query all rows for a column called show_id. I would then like to compare each potential item to be added to the DB with the results. Now the simplest way I can think of doing that is by checking if each show is in the results. If so pass etc. However the results from the below snippet are returned as objects. So this check fails.
Is there a better way to create the query to achieve this?
shows_inDB = Show.query.filter(Show.show_id).all()
print(shows_inDB)
Results:
<app.models.user.Show object at 0x10c2c5fd0>,
<app.models.user.Show object at 0x10c2da080>,
<app.models.user.Show object at 0x10c2da0f0>
Code for the entire function:
def save_changes_show(show_details):
"""
Save the changes to the database
"""
try:
shows_inDB = Show.query.filter(Show.show_id).all()
print(shows_inDB)
for show in show_details:
#Check the show isnt already in the DB
if show['id'] in shows_inDB:
print(str(show['id']) + ' Already Present')
else:
#Add show to DB
tv_show = Show(
show_id = show['id'],
seriesName = str(show['seriesName']).encode(),
aliases = str(show['aliases']).encode(),
banner = str(show['banner']).encode(),
seriesId = str(show['seriesId']).encode(),
status = str(show['status']).encode(),
firstAired = str(show['firstAired']).encode(),
network = str(show['network']).encode(),
networkId = str(show['networkId']).encode(),
runtime = str(show['runtime']).encode(),
genre = str(show['genre']).encode(),
overview = str(show['overview']).encode(),
lastUpdated = str(show['lastUpdated']).encode(),
airsDayOfWeek = str(show['airsDayOfWeek']).encode(),
airsTime = str(show['airsTime']).encode(),
rating = str(show['rating']).encode(),
imdbId = str(show['imdbId']).encode(),
zap2itId = str(show['zap2itId']).encode(),
added = str(show['added']).encode(),
addedBy = str(show['addedBy']).encode(),
siteRating = str(show['siteRating']).encode(),
siteRatingCount = str(show['siteRatingCount']).encode(),
slug = str(show['slug']).encode()
)
db.session.add(tv_show)
db.session.commit()
except Exception:
print(traceback.print_exc())
I have decided to use the method above and extract the data I wanted into a list, comparing each show to the list.
show_compare = []
shows_inDB = Show.query.filter().all()
for item in shows_inDB:
show_compare.append(item.show_id)
for show in show_details:
#Check the show isnt already in the DB
if show['id'] in show_compare:
print(str(show['id']) + ' Already Present')
else:
#Add show to DB
For querying a specific column value, have a look at this question: Flask SQLAlchemy query, specify column names. This is the example code given in the top answer there:
result = SomeModel.query.with_entities(SomeModel.col1, SomeModel.col2)
The crux of your problem is that you want to create a new Show instance if that show doesn't already exist in the database.
Querying the database for all shows and looping through the result for each potential new show might become very inefficient if you end up with a lot of shows in the database, and finding an object by identity is what an RDBMS does best!
This function will check to see if an object exists, and create it if not. Inspired by this answer:
def add_if_not_exists(model, **kwargs):
if not model.query.filter_by(**kwargs).first():
instance = model(**kwargs)
db.session.add(instance)
So your example would look like:
def add_if_not_exists(model, **kwargs):
if not model.query.filter_by(**kwargs).first():
instance = model(**kwargs)
db.session.add(instance)
for show in show_details:
add_if_not_exists(Show, id=show['id'])
If you really want to query all shows upfront, instead of putting all of the id's into a list, you could use a set instead of a list which will speed up your inclusion test.
E.g:
show_compare = {item.show_id for item in Show.query.all()}
for show in show_details:
# ... same as your code

python print() doesnt output what I expect

I made a small web-crawler in one function, upso_final.
If I print(upso_final()), I get 15 lists that include title, address, phone #. However, I want to print out only title, so I made variable title a global string. When I print it, I get only 1 title, the last one in the run. I want to get all 15 titles.
from __future__ import unicode_literals
import requests
from scrapy.selector import Selector
import scrapy
import pymysql
def upso_final(page=1):
def upso_from_page(url):
html = fetch_page(url)
sel = Selector(text=html)
global title,address,phone
title = sel.css('h1::text').extract()
address = sel.css('address::text').extract()
phone = sel.css('.mt1::text').extract()
return {
'title' : title,
'address' : address,
'phone' : phone
}
def upso_list_from_listpage(url):
html = fetch_page(url)
sel = Selector(text=html)
upso_list = sel.css('.title_list::attr(href)').extract()
return upso_list
def fetch_page(url):
r = requests.get(url)
return r.text
list_url = "http://yp.koreadaily.com/list/list.asp?page={0}&bra_code=LA&cat_code=L020502&strChar=&searchField=&txtAddr=&txtState=&txtZip=&txtSearch=&sort=N".format(page)
upso_lists = upso_list_from_listpage(list_url)
upsos = [upso_from_page(url) for url in upso_lists]
return upsos
upso_final()
print (title,address,phone)
The basic problem is that you're confused about passing values back from a function.
upso_from_page finds each of the 15 records in turn, placing the desired information in the global variables (generally a bad design). However, the only time you print any results is after you've found all 15. Since your logic has each record overwriting the previous one, you print only the last one you found.
It appears that upso_final accumulates the list and returns it, but you ignore that return value. Instead, try this in your main program:
upso_list = upso_final()
for upso in upso.list:
print (upso)
This should give you a 3-item dictionary for each upso record; from there, you can learn the referencing and format to your taste.
AN alternate solution is to print each record as you find it, from within upso_from_page, but your overall design suggests that's not what you want.

Categories

Resources