BeautifulSoup checking if an element has a specific class

BeautifulSoup checking if an element has a specific class - python

for containerElement in container:
brandingElement = containerElement.find("div", class_="item-branding")
titleElement=containerElement.find("a", class_="item-title")
rating = brandingElement.find("i", {"class":"rating"})["aria-label"]
priceElement = containerElement.find("li", class_="price-current")
so this for loop checks for prices, ratings, and the name of an item on a website. it works. however, some items have no reviews, in which case it fails. how do i fix this? i was thinking of an if statement to check if the containerElement (the actual container the item and all its information is in) has a rating, but im not exacatly sure how to do that

for containerElement in container:
brandingElement = containerElement.find("div", class_="item-branding")
titleElement=containerElement.find("a", class_="item-title")
rating = brandingElement.find("i", {"class":"rating"})["aria-label"] if brandingElement.find("i", {"class":"rating"}) else ""
priceElement = containerElement.find("li", class_="price-current")

Related

I want to extract menu_header from bubble_list and then need to extract all titles from menu_list with condition type='menu' only

I want to extract menu_header from bubble_list and then need to extract all titles from menu_list with condition type='menu' only
response3 = [{"thumbs_id":56071,"disable_text":"yes","thumbs_display":"no","recipient_id":"12698","bubble_list":[{"class":"bubble-top","delay":0,"logo":"no","text":"You may only add\/change your preferred first name and legal last name in PIMS.","type":"text"},{"class":"bubble-menu","delay":3000,"logo":"yes","menu_header":"Which name would you like to update?","menu_list":[{"payload":"\/update_name{\"name_type\":\"preferred_first_name\"}","title":"Update Preferred First Name"},{"payload":"\/update_name{\"name_type\": \"legal_last_name\"}","title":"Update Legal Last Name"}],"menu_status":"always_active","type":"menu"}],"button_list":[{"payload":"\/inactivity_timeout","title":"End Chat"},{"payload":"\/bot_help{\"bot_help_value\":\"start_over\"}","title":"Start Fresh again"}],"related_question":[]}]

A bit long but it works:
response = {key:value for key, value in response3[0].items() if key == 'bubble_list'}
filter = [my_dict for my_dict in response['bubble_list'] if 'menu_header' in list(my_dict.keys()) and my_dict.get('type') == 'menu']
menu_header = filter[0]['menu_header']
menu_list = [title.get('title') for title in filter[0]['menu_list']]

How to print specific value from specific key from JSON in Python

I wrote 2 functions so I can get champion ID knowing champion Name but then I wanted to get champion Name knowing champion ID but I cannot figure it out how to extract the name because of how the data structured.
"data":{"Aatrox":{"version":"8.23.1","id":"Aatrox","key":"266","name":"Aatrox"
so in my code I wrote ['data']['championName'(in this case Aatrox)]['key'] to get the champion ID/key. But how can I reverse it if for example I don't know the champion Name but champions ID. How can I get the champion Name if after writing ['data'] I need to write champion Name so I can go deeper and get all the champions info like ID, title etc..
link: http://ddragon.leagueoflegends.com/cdn/8.23.1/data/en_US/champion.json
Code:
def requestChampionData(championName):
name = championName.lower()
name = name.title()
URL = "http://ddragon.leagueoflegends.com/cdn/8.23.1/data/en_US/champion/" + name + ".json"
response = requests.get(URL)
return response.json()
def championID(championName):
championData = requestChampionData(championName)
championID = str(championData['data'][championName]['key'])
return championID

since python values are passed by reference you can make a new dict with keys as the champion id pointing to the values of the previous dict, that way you dont duplicate too much data. but be carefull if you change data in one dict the data will be changed in the other one too
def new_dict(d):
return { val["id"]:val for val in d.values() }

I solved my problem with this code:
def championNameByID(id):
championData = requestChampionData()
allChampions = championData['data']
for champion in allChampions:
if id == allChampions[champion]['key']:
championName = allChampions[champion]['name']
return championName

Python: not every web page have a certain element

When I tried to use urls to scrape web pages, I found that some elements only exists in some pages and other have not. Let's take the code for example
Code:
for urls in article_url_set:
re=requests.get(urls)
soup=BeautifulSoup(re.text.encode('utf-8'), "html.parser")
title_tag = soup.select_one('.page_article_title')
if title_tag=True:
print(title_tag.text)
else:
#do something
if title_tag exits, I want to print them, if it's not, just skip them.
Another thing is that, I need to save other elements and title.tag.text in data.
data={
"Title":title_tag.text,
"Registration":fruit_tag.text,
"Keywords":list2
}
It will have an error cause not all the article have Title, what should I do to skip them when I try to save? 'NoneType' object has no attribute 'text'
Edit: I decide not to skip them and keep them as Null or None.

U code is wrong:
for urls in article_url_set:
re=requests.get(urls)
soup=BeautifulSoup(re.text.encode('utf-8'), "html.parser")
title_tag = soup.select_one('.page_article_title')
if title_tag=True: # wrong
print(title_tag.text)
else:
#do something
your code if title_tag=True,
changed code title_tag == True
It is recommended to create conditional statements as follows.
title_tag == True => True == title_tag
This is a way to make an error when making a mistake.
If Code is True = title_tag, occur error.

You can simply use a truth test to check if the tag is existing, otherwise assign a value like None, then you can insert it in the data container :
title_tag = soup.select_one('.page_article_title')
if title_tag:
print(title_tag.text)
title = title_tag.text
else:
title = None
Or in one line :
title = title_tag.text if title_tag else None

Filtering through a Django Countries list

I'm trying to retrieve a list of tags that share the same name as a django Country. (i will be throwing it into my autocomplete search). What I have isn't working:
View:
from django_countries.countries import COUNTRIES
...
#login_required
def country_tags(request):
result = {}
tags = Tags.objects.all()
countries = list(COUNTRIES)
for tag in tags:
for country in countries:
if country.name == tag.name:
result[tag.name] = tag.name.title()
return HttpResponse(json.dumps(result))
Can't quite figure out why this isn't working. Am I wrong to reference country.name?

Here is a version that should work. COUNTRIES is a 2-tuple tuple.
countries_only = [x[1] for x in COUNTRIES]
tags = Tag.objects.filter(tag.name__in=countries_only)
results = {}
for t in tags:
results[t.name] = t.name.title()

COUNTRIES is just a list of 2 elements tuples - there is no name property. You should do something like country[1] == tag.name.

Python search list of objects that contain objects, partial matches

I'm trying to build a simple search engine for a small website. My initial thought is to avoid using larger packages such as Solr, Haystack, etc. because of the simplistic nature of my search needs.
My hope is that with some guidance I can make my code more pythonic, efficient, and most importantly function properly.
Intended functionality: return product results based on full or partial matches of item_number, product name, or category name (currently no implementation of category matching)
Some code:
import pymssql
import utils #My utilities
class Product(object):
def __init__(self, item_number, name, description, category, msds):
self.item_number = str(item_number).strip()
self.name = name
self.description = description
self.category = category
self.msds = str(msds).strip()
class Category(object):
def __init__(self, name, categories):
self.name = name
self.categories = categories
self.slug = utils.slugify(name)
self.products = []
categories = (
Category('Food', ('123', '12A')),
Category('Tables', ('354', '35A', '310', '31G')),
Category('Chemicals', ('845', '85A', '404', '325'))
)
products = []
conn = pymssql.connect(...)
curr = conn.cursor()
for Category in categories:
for c in Category.categories:
curr.execute('SELECT item_number, name, CAST(description as text), category, msds from tblProducts WHERE category=%s', c)
for row in curr:
product = Product(row[0], row[1], row[2], row[3], row[4])
products.append(product)
Category.products.append(product)
conn.close()
def product_search(*params):
results = []
for product in products:
for param in params:
name = str(product.name)
if (name.find(param.capitalize())) != -1:
results.append(product)
item_number = str(product.item_number)
if (item.number.find(param.upper())) != -1:
results.append(product)
print results
product_search('something')
MS SQL database with tables and fields I cannot change.
At most I will pull in about 200 products.
Some things that jump out at me. Nested for loops. Two different if statements in the product search which could result in duplicate products being added to the results.
My thought was that if I had the products in memory (the products will rarely change) I could cache them, reducing database dependence and possibly providing an efficient search.
...posting for now... will come back and add more thoughts
Edit:
The reason I have a Category object holding a list of Products is that I want to show html pages of Products organized by Category. Also, the actual category numbers may change in the future and holding a tuple seemed like simple painless solution. That and I have read-only access to the database.
The reason for a separate list of products was somewhat of a cheat. I have a page that shows all products with the ability to view MSDS (safety sheets). Also it provided one less level to traverse while searching.
Edit 2:
def product_search(*params):
results = []
lowerParams = [ param.lower() for param in params ]
for product in products:
item_number = (str(product.item_number)).lower()
name = (str(product.name)).lower()
for param in lowerParams:
if param in item_number or param in name:
results.append(product)
print results

Prepare all variables outside of the loops and use in instead of .find if you don't need the position of the substring:
def product_search(*params):
results = []
upperParams = [ param.upper() for param in params ]
for product in products:
name = str(product.name).upper()
item_number = str(product.item_number).upper()
for upperParam in upperParams:
if upperParam in name or upperParam in item_number:
results.append(product)
print results

If both the name and number matches the search parameters, the product will appear twice on the result list.
Since the products count is a small number, I recommend constructing a SELECT query like:
def search(*args):
import operator
cats = reduce(operator.add, [list(c.categories) for c in categories], [])
query = "SELECT * FROM tblProducts WHERE category IN (" + ','.join('?' * len(cats)) + ") name LIKE '%?%' or CAST(item_number AS TEXT) LIKE '%?%' ..."
curr.execute(query, cats + list(args)) # Not actual code
return list(curr)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

BeautifulSoup checking if an element has a specific class - python

Related

I want to extract menu_header from bubble_list and then need to extract all titles from menu_list with condition type='menu' only

How to print specific value from specific key from JSON in Python

Python: not every web page have a certain element

Filtering through a Django Countries list

Python search list of objects that contain objects, partial matches

Categories

Resources