AttributeError: 'function' object has no attribute 'read' - python

I'm getting an error for a program that fetches(searches) data from youtube , and it shows an error AttributeError: 'function' object has no attribute 'read' i am on python3
import urllib.request
from bs4 import BeautifulSoup
import sys
flag = 0
textToSearch = 'hello world'
query = sys.argv[0].strip("\"").replace(" ","+")
url = "https://www.youtube.com/results?search_query=" + query
response = urllib.request.urlopen
html = response.read()
soup = BeautifulSoup(html,"lxml")
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
if ('https://www.youtube.com' + vid['href']).startswith("https://www.youtube.com/watch?v="):
flag = 1
print ('https://www.youtube.com' + vid['href'])
if flag == 0:
print ("No results found") ```

The mistake has been made here:
response = urllib.request.urlopen
html = response.read()
You put urllib.request.urlopen into response variable instead of the result of calling that function.
So instead of
response = urllib.request.urlopen
you should call it with appropriate parameters:
response = urllib.request.urlopen( .. parameters come here ... )

have you tried using requests library ?
like this:
import requests
from bs4 import BeautifulSoup
import sys
flag = 0
textToSearch = 'hello world'
query = sys.argv[0].strip("\"").replace(" ","+")
url = "https://www.youtube.com/results?search_query=" + query
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html,"lxml")
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
if ('https://www.youtube.com' + vid['href']).startswith("https://www.youtube.com/watch?v="):
flag = 1
print ('https://www.youtube.com' + vid['href'])
if flag == 0:
print ("No results found")

Related

Exception has occurred: AttributeError 'str' object has no attribute 'descendants'

I'm new at using python and I'm trying to make a web scraper for an internship
from typing import Container
import requests
from bs4 import BeautifulSoup as bs
from selenium import webdriver
p1 = ["https://www.libris.ro/search?iv.q={}", "https://carturesti.ro/product/search/{}", "https://www.elefant.ro/search?SearchTerm={}&StockAvailability=true", "https://www.litera.ro/catalogsearch/result/?q{}", "https://www.librariadelfin.ro/?submitted=1&O=search&keywords{}&do_submit=1", "https://bookzone.ro/cautare?term={}", "https://www.librex.ro/search/{}/?q={}"]
#price_min = 1000000
#url_min, price_min
title = "percy jackson"
for x in p1:
temp = x
title = title.replace(" ", "+")
url = temp.format(title)
if url == "https://www.libris.ro/search?iv.q=" + title :
**books = bs.find_all("div", class_="product-item-info imgdim-x")**
for each_book in books:
book_url = each_book.find("a")["href"]
price = each_book.find("span", class_="price-wrapper")
print(book_url)
print(price)
and I'm getting this error for the text between the 2 asterisk :
Exception has occurred: AttributeError
'str' object has no attribute 'descendants'
After from bs4 import BeautifulSoup as bs, bs is the class. You need to instantiate that class with data from the web site. In the code below, I've add a requests call to get the page and have built the beautifulsoup doc from there. You'll find some other errors in your code that need to be sorted out, but it will get you past this problem.
from typing import Container
import requests
from bs4 import BeautifulSoup as bs
from selenium import webdriver
p1 = ["https://www.libris.ro/search?iv.q={}", "https://carturesti.ro/product/search/{}", "https://www.elefant.ro/search?SearchTerm={}&StockAvailability=true", "https://www.litera.ro/catalogsearch/result/?q{}", "https://www.librariadelfin.ro/?submitted=1&O=search&keywords{}&do_submit=1", "https://bookzone.ro/cautare?term={}", "https://www.librex.ro/search/{}/?q={}"]
#price_min = 1000000
#url_min, price_min
title = "percy jackson"
for x in p1:
temp = x
title = title.replace(" ", "+")
url = temp.format(title)
if url == "https://www.libris.ro/search?iv.q=" + title :
# THE FIX
resp = requests.get(url)
if not 200 <= resp.status_code < 299:
print("failed", resp.status_code, url)
continue
doc = bs(resp.text, "html.parser")
books = doc.find_all("div", class_="product-item-info imgdim-x")
for each_book in books:
book_url = each_book.find("a")["href"]
price = each_book.find("span", class_="price-wrapper")
print(book_url)
print(price)

AttributeError: 'str' object has no attribute 'descendants' - BeautifulSoup4

I want to scrape the s in a specific url = https://www.sortlist.fr/pub and select the specific one where the following value is found = "Dupont Lewis". I am using the following code :
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.sortlist.fr/pub")
BeautifulSoup.find_all("a")
BeautifulSoup("a")
search = BeautifulSoup.select_one("a[title*=Dupont Lewis]")
if len(search)>0:
print ('I find')
else:
print ('None')
But I get the following error ="AttributeError: 'str' object has no attribute 'descendants'"
Can anyone help me please?
The error is that you don't create a soup from server response:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.sortlist.fr/pub")
soup = BeautifulSoup(r.content, 'html.parser') # <-- create a soup
search = soup.select_one('a[title*="Dupont Lewis"]') # <-- put "Dupont Lewis" inside ""
if search: # <-- len() isn't necessary, because of .select_one
print ('I find')
else:
print ('None')
Prints:
I find

Using Python for web scraping: TypeError: argument of type 'NoneType' is not iterable

import urllib.request, urllib.parse, urllib.error
import json
serviceurl = "http://python-data.dr-chuck.net/geojson?"
while True:
address = "Farmingdale State University"
if len(address) < 1 :
break
url = serviceurl + urllib.parse.urlencode({'sensor':'false','address':address})
print ('Retrieving',url)
uh =urllib.request.urlopen(url)
data = uh.read()
print ('Retrived',len(data),'characters')
try: (js) = json.loads(str(data))
except: (js) = None
if ('status' not in js) or (js['status'] != 'OK'):
print ('==== Failure To Retrieve ====')
print (data)
continue
placeid = js["results"][0]['place_id']
print ("Place id",placeid)
File "", line 23, in
if ('status' not in js) or (js['status'] != 'OK'):
TypeError: argument of type 'NoneType' is not iterable
If js is None then you will get this error. Try to check if js is True and only then go further into checking its content.
Right now when your code fails the try block and goes to except js will be set to None and then the if sentance is still checked.
json.loads() - returns an object from a string representing a json object.
Ex.
import urllib.request, urllib.parse, urllib.error
import json
serviceurl = "http://python-data.dr-chuck.net/geojson?"
address = "Farmingdale State University"
url = serviceurl + urllib.parse.urlencode({'sensor':'false','address':address})
while True:
uh =urllib.request.urlopen(url)
response = uh.read()
data = json.loads(response)
if 'status' in data and data['status'] != 'OK':
continue
#check place_id is exists in dictionary
if 'results' in data and len(data['results']) > 0 and 'place_id' in data['results'][0]:
print(data["results"][0]['place_id'])
break
O/P:
ChIJScDUqFcq6IkRpvubNFOAVmw

Beautiful Soup error: NameError: name 'htmltext' is not defined

I'm getting this error:
NameError: name 'htmltext' is not defined
It comes from the code below:
from bs4 import BeautifulSoup
import urllib
import urllib.parse
url = "http://nytimes.com"
urls = [url]
visited = [url]
while len(urls) > 0:
try:
htmltext = urllib.urlopen(urls[0]).read()
except:
print(urls[0])
soup = BeautifulSoup(htmltext)
urls.pop(0)
print(soup.findAll('a',href = true))
In Python 3.x, you have to import urllib.request instead of urllib. Then, change the line:
htmltext = urllib.urlopen(urls[0]).read()
to:
htmltext = urllib.request.urlopen(urls[0]).read()
Finally, change true to True.

how to resolve this error in a python code.... 'module' object has no attribute 'Request'?

part of code containing error:
select_link = db.GqlQuery("select * from PhishTank where url= :1",str(updated_url))
in_database_phishtank = False
for link in select_link:
if str(updated_url) == str(link.url):
in_database_phishtank = True
# chk for 7 days period , update the link
if (datetime.now()-link.timestamp) > timedelta(days = TIME_UPDATE):
# query to the site and update the datastore
url = "http://checkurl.phishtank.com/checkurl/"
parameters = {"url": "%s" % updated_url,
"app_key": "74283d86612c6b89de0b186882446e069dd071f65e9711aa374e9cdbd2ba7ffe",
"format":"json"}
data = urllib.urlencode(parameters)
req = urllib.Request(url, data)
try:
response = urllib2.urlopen(req)
except urllib.error.URLError as e:
self.redirect('/error')
json_post = response.read()
data = json.loads(json_post)
Try this:
urllib.request.Request(url, data)
Be aware that in Python 3.x urllib was split in several modules: urllib.request, urllib.parse, and urllib.error. It's possible that you're importing it wrong.

Categories

Resources