After cruising through about a dozen questions of the same variety, and consulting a coworker, I have determined I need some expert insight
with open("c:\source\list.csv") as f:
for row in csv.reader(f):
for url in row:
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
tables = soup.find('table', attrs={"class": "hpui-standardHrGrid-table"}).append
for rows in table.find_all('tr', {'releasetype': 'Current_Releases'}):
item = [].append
for val in row.find_all('td'):
item.append(val.text.encode('utf8').strip())
rows.append(item)
headers = [header.text for header in tables.find_all('th')].append
rows = [].append
print (headers)
So what I have here is : a csv file that has 30 URLs in it. I first dump them into Soup to get all of its contents, then bind the specific HTML element (the tables) to the tables variable. After this, i am trying to pull specific rows and headers from those tables.
According to logical thinking in my brain, it should work, but instead, i get this:
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
AttributeError: 'function' object has no attribute 'find_all'
Line 7 is
for rows in table.find_all('tr', {'releasetype': 'Current_Releases'}):
What are we missing here?
You have some strange misconceptions about Python syntax. Four times in your code you refer to <something>.append; I'm not sure what you think this does, but append is a method and it not only must be called, with (), but it needs a parameter: the thing you are appending.
So, for example, this line:
item = [].append
makes no sense at all; what are you expecting item to be? What are you hoping to append? Surely you just mean item = [].
In the specific case, the error is because of the superfluous append on the end of the previous line:
tables = soup.find('table', attrs={"class": "hpui-standardHrGrid-table"}).append
Again, just remove the append.
Related
I sent a request to a website and then parsed it. Got a list but have issues making it a csv file table. When I try this:
from bs4 import BeautifulSoup
import requests
import csv
website = requests.get("https://www.tradingview.com/markets/cryptocurrencies/prices-all/")
soup = BeautifulSoup(website.text, 'lxml')
name = soup.find_all('a', class_ ="tv-screener__symbol")
with open('newparser.csv', 'w') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(name)
It prints out:
"<a class=""tv-screener__symbol"" href=""/markets/cryptocurrencies/prices-cryptoxtvcbtc/"" target=""_blank"">Bitcoin</a>","<a class=""tv-screener__symbol"" href=""/markets/cryptocurrencies/prices-cryptoxtvceth/"" target=""_blank"">Ethereum</a>","<a class=""tv-screener__symbol"" href=""/markets/cryptocurrencies/prices-cryptoxtvcbnb/"" target=""_blank"">Binance Coin</a>","<a class=""tv-screener__symbol"" href=""/markets/cryptocurrencies/prices-cryptoxtvcada/"" target=""_blank"">Cardano</a>","<a class=""tv-screener__symbol"" href=""/markets/cryptocurrencies/prices-cryptoxtvcusdt/"
Not going to post the whole thing but you get the point.
When I change:
wr.writerow(name) --> wr.writerow(name.text)
It prints out:
ResultSet object has no attribute 'text'. You're probably treating a
list of elements like a single element. Did you call find_all() when
you meant to call find()?
And when I make a loop:
for i in name:
wr.writerow(i.text)
It creates a file like this:
"B","i","t","c","o","i","n"
"E","t","h","e","r","e","u","m"
"B","i","n","a","n","c","e"," ","C","o","i","n"
"C","a","r","d","a","n","o"
"T","e","t","h","e","r"
"H","E","X"
"X","R","P"
"S","o","l","a","n","a"
"D","o","g","e","c","o","i","n"
"U","S","D"," ","C","o","i","n"
"P","o","l","k","a","d","o","t"
"T","e","r","r","a"
"U","n","i","s","w","a","p"
How do I fix this?
writerow takes an iterable to columns.
Strings are iterable, causing each character to be put into columns
If you want to write a single value, then put a list around the value
for i in name:
writer.writerow([i.text])
I am struggling creating one of my first projects on python3. When I use the following code:
def scrape_offers():
r = requests.get("https://www.olx.bg/elektronika/kompyutrni-aksesoari-chasti/aksesoari-chasti/q-1070/?search%5Border%5D=filter_float_price%3Aasc", cookies=all_cookies)
soup = BeautifulSoup(r.text,"html.parser")
offers = soup.find_all("div",{'class':'offer-wrapper'})
for offer in offers:
offer_name = offer.findChildren("a", {'class':'marginright5 link linkWithHash detailsLink'})
print(offer_name.text.strip())
I get the following error:
Traceback (most recent call last):
File "scrape_products.py", line 45, in <module>
scrape_offers()
File "scrape_products.py", line 40, in scrape_offers
print(offer_name.text.strip())
File "/usr/local/lib/python3.7/site-packages/bs4/element.py", line 2128, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
I've read many similar cases on StackOverFlow but I still can't help myself. If someone have any ideas, please help :)
P.S.: If i run the code without .text it show the entire <a class=...> ... </a>
findchildren returns a list. Sometimes you get an empty list, sometimes you get a list with one element.
You should add an if statement to check if the length of the returned list is greater than 1, then print the text.
import requests
from bs4 import BeautifulSoup
def scrape_offers():
r = requests.get("https://www.olx.bg/elektronika/kompyutrni-aksesoari-chasti/aksesoari-chasti/q-1070/?search%5Border%5D=filter_float_price%3Aasc")
soup = BeautifulSoup(r.text,"html.parser")
offers = soup.find_all("div",{'class':'offer-wrapper'})
for offer in offers:
offer_name = offer.findChildren("a", {'class':'marginright5 link linkWithHash detailsLink'})
if (len(offer_name) >= 1):
print(offer_name[0].text.strip())
scrape_offers()
I am making a program for web scraping but this is my first time. The tutorial that I am using is built for python 2.7, but I am using 3.8.2. I have mostly edited my code to fit it to python 3, but one error pops up and I can't fix it.
import requests
import csv
from bs4 import BeautifulSoup
url = 'http://www.showmeboone.com/sheriff/JailResidents/JailResidents.asp'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(features="html.parser")
results_table = soup.find('table', attrs={'class': 'resultsTable'})
output = []
for row in results_table.findAll('tr'):
output_rows = []
for cell in tr.findAll('td'):
output_rows.append(cell.text.replace(' ', ''))
output.append(output_rows)
print(output)
handle = open('out-using-requests.csv', 'a')
outfile = csv.writer(handle)
outfile.writerows(output)
The error I get is:
Traceback (most recent call last):
File "C:\Code\scrape.py", line 17, in <module>
for row in results_table.findAll('tr'):
AttributeError: 'NoneType' object has no attribute 'findAll'
The tutorial I am using is https://first-web-scraper.readthedocs.io/en/latest/
I tried some other questions, but they didn't help.
Please help!!!
Edit: Never mind, I got a good answer.
find returns None if it doesn't find a match. You need to check for that before attempting to find any sub elements in it:
results_table = soup.find('table', attrs={'class': 'resultsTable'})
output = []
if results_table:
for row in results_table.findAll('tr'):
output_rows = []
for cell in tr.findAll('td'):
output_rows.append(cell.text.replace(' ', ''))
output.append(output_rows)
The error allows the following conclusion:
results_table = None
Therefore, you cannot access the findAll() method because None.findAll() does not exist.
You should take a look, it is best to use a debugger to run through your program and see how the variables change line by line and why the mentioned line only returns ```None''. Especially important is the line:
results_table = soup.find('table', attrs={'class': 'resultsTable'})
Because in this row results_table is initialized yes, so here the above none'' value is returned andresults_table'' is assigned.
I am currently writing a web-scraping script with Python to be able to take play-by-play soccer commentary from fixtures and inputting it into an excel sheet. I keep getting this when I try to run it:
Traceback (most recent call last):
File "/Users/noahhollander/Desktop/Web_Scraping/play_by_play.py", line 9, in <module>
tbody = soup('table',{"class":"content"})[0:].findAll('tr')
AttributeError: 'list' object has no attribute 'findAll'
[Finished in 6.207s]
I've read that this probably has something to do with this table being text format, but I have added .text at the end and still same result.
Here is a picture of my code so far.
You might have to write something like this.
soup.find_all('table',{"class":"content"})
tbody = []
tclass = soup('table', {"class":"content"})[0:]
for temp in tclass:
for t_temp in temp.find_all('tr'):
tbody.append(t_temp)
This is your desired result?
div = soup.find('div', {"class": "content"})
tbody = div.find('table').findAll('tr')
You will get your desired result
I am trying to download Pdfs using urllib.request.urlopen from a page but it returns an error: 'list' object has no attribute 'timeout':
def get_hansard_data(page_url):
#Read base_url into Beautiful soup Object
html = urllib.request.urlopen(page_url).read()
soup = BeautifulSoup(html, "html.parser")
#grab <div class="itemContainer"> that hold links and dates to all hansard pdfs
hansard_menu = soup.find_all("div","itemContainer")
#Get all hansards
#write to a tsv file
with open("hansards.tsv","a") as f:
fieldnames = ("date","hansard_url")
output = csv.writer(f, delimiter="\t")
for div in hansard_menu:
hansard_link = [HANSARD_URL + div.a["href"]]
hansard_date = div.find("h3", "catItemTitle").string
#download
with urllib.request.urlopen(hansard_link) as response:
data = response.read()
r = open("/Users/Parliament Hansards/"+hansard_date +".txt","wb")
r.write(data)
r.close()
print(hansard_date)
print(hansard_link)
output.writerow([hansard_date,hansard_link])
print ("Done Writing File")
A bit late, but might still be helpful to someone else (if not for topic starter). I found the solution by solving the same problem.
The problem was that page_url (in your case) was a list, rather than a string. The reason for that is mos likely that page_url comes from argparse.parse_args() (at least it was so in my case).
Doing page_url[0] should work but it is not nice to do that inside the def get_hansard_data(page_url) function. Better would be to check the type of the parameter and return an appropriate error to the function caller, if the type does not match.
The type of an argument could be checked by calling type(page_url) and comparing the result like for example: typen("") == type(page_url). I am sure there might be more elegant way to do that, but it is out of the scope of this particular question.