beautiful soup bug?

beautiful soup bug? - python

I have next code:
for table in soup.findAll("table","tableData"):
for row in table.findAll("tr"):
data = row.findAll("td")
url = data[0].a
print type(url)
I get next output:
<class 'bs4.element.Tag'>
That means, that url is object of class Tag and i could get attribytes from this objects.
But if i replace print type(url) to print url['href'] i get next traceback
Traceback (most recent call last):
File "baseCreator.py", line 57, in <module>
createStoresTable()
File "baseCreator.py", line 46, in createStoresTable
print url['href']
TypeError: 'NoneType' object has no attribute '__getitem__'
What is wrong? And how i can get value of href attribute.

I do like BeautifulSoup but I personally prefer lxml.html (for not too wacky HTML) because of the ability to utilise XPath.
import lxml.html
page = lxml.html.parse('http://somesite.tld')
print page.xpath('//tr/td/a/#href')
Might need to implement some form of "axes" though depending on the structure.
You can also use elementsoup as a parser - details at http://lxml.de/elementsoup.html

Related

Can't print only text using Beautiful soup

I am struggling creating one of my first projects on python3. When I use the following code:
def scrape_offers():
r = requests.get("https://www.olx.bg/elektronika/kompyutrni-aksesoari-chasti/aksesoari-chasti/q-1070/?search%5Border%5D=filter_float_price%3Aasc", cookies=all_cookies)
soup = BeautifulSoup(r.text,"html.parser")
offers = soup.find_all("div",{'class':'offer-wrapper'})
for offer in offers:
offer_name = offer.findChildren("a", {'class':'marginright5 link linkWithHash detailsLink'})
print(offer_name.text.strip())
I get the following error:
Traceback (most recent call last):
File "scrape_products.py", line 45, in <module>
scrape_offers()
File "scrape_products.py", line 40, in scrape_offers
print(offer_name.text.strip())
File "/usr/local/lib/python3.7/site-packages/bs4/element.py", line 2128, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
I've read many similar cases on StackOverFlow but I still can't help myself. If someone have any ideas, please help :)
P.S.: If i run the code without .text it show the entire <a class=...> ... </a>

findchildren returns a list. Sometimes you get an empty list, sometimes you get a list with one element.
You should add an if statement to check if the length of the returned list is greater than 1, then print the text.
import requests
from bs4 import BeautifulSoup
def scrape_offers():
r = requests.get("https://www.olx.bg/elektronika/kompyutrni-aksesoari-chasti/aksesoari-chasti/q-1070/?search%5Border%5D=filter_float_price%3Aasc")
soup = BeautifulSoup(r.text,"html.parser")
offers = soup.find_all("div",{'class':'offer-wrapper'})
for offer in offers:
offer_name = offer.findChildren("a", {'class':'marginright5 link linkWithHash detailsLink'})
if (len(offer_name) >= 1):
print(offer_name[0].text.strip())
scrape_offers()

AttributeError: 'NoneType' object has no attribute 'findAll' in a web scraper

I am making a program for web scraping but this is my first time. The tutorial that I am using is built for python 2.7, but I am using 3.8.2. I have mostly edited my code to fit it to python 3, but one error pops up and I can't fix it.
import requests
import csv
from bs4 import BeautifulSoup
url = 'http://www.showmeboone.com/sheriff/JailResidents/JailResidents.asp'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(features="html.parser")
results_table = soup.find('table', attrs={'class': 'resultsTable'})
output = []
for row in results_table.findAll('tr'):
output_rows = []
for cell in tr.findAll('td'):
output_rows.append(cell.text.replace(' ', ''))
output.append(output_rows)
print(output)
handle = open('out-using-requests.csv', 'a')
outfile = csv.writer(handle)
outfile.writerows(output)
The error I get is:
Traceback (most recent call last):
File "C:\Code\scrape.py", line 17, in <module>
for row in results_table.findAll('tr'):
AttributeError: 'NoneType' object has no attribute 'findAll'
The tutorial I am using is https://first-web-scraper.readthedocs.io/en/latest/
I tried some other questions, but they didn't help.
Please help!!!
Edit: Never mind, I got a good answer.

find returns None if it doesn't find a match. You need to check for that before attempting to find any sub elements in it:
results_table = soup.find('table', attrs={'class': 'resultsTable'})
output = []
if results_table:
for row in results_table.findAll('tr'):
output_rows = []
for cell in tr.findAll('td'):
output_rows.append(cell.text.replace(' ', ''))
output.append(output_rows)

The error allows the following conclusion:
results_table = None
Therefore, you cannot access the findAll() method because None.findAll() does not exist.
You should take a look, it is best to use a debugger to run through your program and see how the variables change line by line and why the mentioned line only returns ```None''. Especially important is the line:
results_table = soup.find('table', attrs={'class': 'resultsTable'})
Because in this row results_table is initialized yes, so here the above none'' value is returned andresults_table'' is assigned.

"AttributeError: 'list' object has no attribute 'findAll'"

I am currently writing a web-scraping script with Python to be able to take play-by-play soccer commentary from fixtures and inputting it into an excel sheet. I keep getting this when I try to run it:
Traceback (most recent call last):
File "/Users/noahhollander/Desktop/Web_Scraping/play_by_play.py", line 9, in <module>
tbody = soup('table',{"class":"content"})[0:].findAll('tr')
AttributeError: 'list' object has no attribute 'findAll'
[Finished in 6.207s]
I've read that this probably has something to do with this table being text format, but I have added .text at the end and still same result.
Here is a picture of my code so far.

You might have to write something like this.
soup.find_all('table',{"class":"content"})

tbody = []
tclass = soup('table', {"class":"content"})[0:]
for temp in tclass:
for t_temp in temp.find_all('tr'):
tbody.append(t_temp)
This is your desired result?

div = soup.find('div', {"class": "content"})
tbody = div.find('table').findAll('tr')
You will get your desired result

Parse activity unstable, getting a few random results

Here's the code:
# -*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup
with open('/users/Rachael/Desktop/CheckTitle.csv', 'r') as readcsv:
for row in readcsv.readlines():
try:
openitem = urllib2.urlopen(row).read()
soup = BeautifulSoup(openitem, 'lxml')
print soup.head.find('title').get_text()
except urllib2.URLError:
print 'passed'
pass
I'm getting following results:
(a):
passed
贝贝网京外裁员10%：团队要保持狼性和危机感_新浪财经_新浪网
垂直电商贝贝网被曝裁员 回应称只是10%人员优化_新浪财经_新浪网
(b):
passed
Traceback (most recent call last):
File "C:/Users/Rachael/PycharmProjects/untitled1/GetTitle.py", line 10, in
<module>
print soup.head.find('title').get_text()
AttributeError: 'NoneType' object has no attribute 'find'
(c):
passed
贝贝网京外裁员10%：团队要保持狼性和危机感_新浪财经_新浪网
Traceback (most recent call last):
File "C:/Users/Rachael/PycharmProjects/untitled1/GetTitle.py", line 10, in <module>
print soup.head.find('title').get_text()
AttributeError: 'NoneType' object has no attribute 'find'
I'm getting these three types of results randomly.
If I do soup.title OR soup.title.text OR soup.title.string instead, it will return the same/similar error.
Please help!
I found this very hard to describe so if this is a dup in any ways please give me the link to similar posts.
Thanks!!

'NoneType' object has no attribute is an error that happens when there are no results for this object, try print only the print soup.head.find('title') title without printing the .text it should return something like '[]' or 'None'
Answer: There is no actual title tag or there's a bot protection of some kind on one of those sites you have in that file.

Write data scraped to text file with python script

I am newbie to data scraping. This is my first program i am writting in python to scrape data and store it into the text file. I have written following code to scrape the data.
from bs4 import BeautifulSoup
import urllib2
text_file = open("scrape.txt","w")
url = urllib2.urlopen("http://ga.healthinspections.us/georgia/search.cfm?1=1&f=s&r=name&s=&inspectionType=&sd=04/24/2016&ed=05/24/2016&useDate=NO&county=Appling&")
content = url.read()
soup = BeautifulSoup(content, "html.parser")
type = soup.find('span',attrs={"style":"display:inline-block; font- size:10pt;"}).findAll()
for found in type:
text_file.write(found)
However i run this program using command prompt it shows me following error.
c:\PyProj\Scraping>python sample1.py
Traceback (most recent call last):
File "sample1.py", line 9, in <module>
text_file.write(found)
TypeError: expected a string or other character buffer object
What am i missing here, or is there anything i haven't added to. Thanks.

You need to check if type is None, ie soup.find did not actually find what you searched.
Also, don't use the name type, it's a builtin.
find, much like find_all return one/a list of Tag object(s). If you call print on a Tag you see a string representation. This automatism isn;t invoked on file.write. You have to decide what attribute of found you want to write.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

beautiful soup bug? - python

Related

Can't print only text using Beautiful soup

AttributeError: 'NoneType' object has no attribute 'findAll' in a web scraper

"AttributeError: 'list' object has no attribute 'findAll'"

Parse activity unstable, getting a few random results

Write data scraped to text file with python script

Categories

Resources