Trying to execute this code. But i get the above error.
from urllib.request import FancyURLopener
from urllib.request import urlopen
exam = urllib.request.urlopen("192.168.2.2")
print(exam.read)
exam.close()
I expect to open the provided IP address.
from urllib.request import FancyURLopener
from urllib.request import urlopen
exam = urlopen("http://192.168.2.2")
print(exam.read)
exam.close()
Try using urlopen("url") rather than the full urllib.request.urlopen(url).
You need to add http or https on your URL to make this work. Also, check out python requests, a good alternative to urllib.
I have been trying to extract data from websites and print it using python 2.7.13 on Windows 10.It gives me the following error:
Traceback (most recent call last):
File "C:\Python27\Scripts\i1.py", line 5, in <module>
data=urlparse.urlencode(values)
AttributeError: 'function' object has no attribute 'urlencode'
Here is the code:
from urllib import urlopen
from urlparse import urlparse
url='http://pythonprogramming.net'
values={'s':'basic','submit':'search'}
data=urllib.parse.urlencode(values)
data=data.encode('utf-8')
req=urllib.request.Request(url,data)
resp=urllib.request.urlopen(req)
respData=resp.read()
print(respData)
Since its Python 2 I have written from urllib import urlopen and from urlparse import urlparse instead of import urllib.request and import urllib.parse
Solution 1
There is no parser object in the urllib module. You directly need to use urlopen with it and pass the url as the argument. Below is the modified and working piece of code:
import urllib
from urllib import urlopen
url='http://pythonprogramming.net'
values={'s':'basic','submit':'search'}
data=urllib.urlencode(values)
data=data.encode('utf-8')
response=urllib.urlopen(url,data)
responseData=response.read()
print responseData
Solution 2
You can also use beautifulSoup library to scrape the data from your website. It's pretty easy to use. Below is the code pertaining to your example:
import urllib
from urllib import urlopen
from bs4 import BeautifulSoup
url='http://pythonprogramming.net'
page = urllib.urlopen(url)
soup = BeautifulSoup(page, "lxml")
print soup
I'm trying to use Python to download the HTML source code of a website but I'm receiving this error.
Traceback (most recent call last):
File "C:\Users\Sergio.Tapia\Documents\NetBeansProjects\DICParser\src\WebDownload.py", line 3, in <module>
file = urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'
I'm following the guide here: http://www.boddie.org.uk/python/HTML.html
import urllib
file = urllib.urlopen("http://www.python.org")
s = file.read()
f.close()
#I'm guessing this would output the html source code?
print(s)
I'm using Python 3.
This works in Python 2.x.
For Python 3 look in the docs:
import urllib.request
with urllib.request.urlopen("http://www.python.org") as url:
s = url.read()
# I'm guessing this would output the html source code ?
print(s)
A Python 2+3 compatible solution is:
import sys
if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
# Not Python 3 - today, it is most likely to be Python 2
# But note that this might need an update when Python 4
# might be around one day
from urllib import urlopen
# Your code where you can use urlopen
with urlopen("http://www.python.org") as url:
s = url.read()
print(s)
import urllib.request as ur
s = ur.urlopen("http://www.google.com")
sl = s.read()
print(sl)
In Python v3 the "urllib.request" is a module by itself, therefore "urllib" cannot be used here.
To get 'dataX = urllib.urlopen(url).read()' working in python3 (this would have been correct for python2) you must just change 2 little things.
1: The urllib statement itself (add the .request in the middle):
dataX = urllib.request.urlopen(url).read()
2: The import statement preceding it (change from 'import urlib' to:
import urllib.request
And it should work in python3 :)
Change TWO lines:
import urllib.request #line1
#Replace
urllib.urlopen("http://www.python.org")
#To
urllib.request.urlopen("http://www.python.org") #line2
If You got ERROR 403: Forbidden Error exception try this:
siteurl = "http://www.python.org"
req = urllib.request.Request(siteurl, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'})
pageHTML = urllib.request.urlopen(req).read()
I hope your problem resolved.
import urllib.request as ur
filehandler = ur.urlopen ('http://www.google.com')
for line in filehandler:
print(line.strip())
For python 3, try something like this:
import urllib.request
urllib.request.urlretrieve('http://crcv.ucf.edu/THUMOS14/UCF101/UCF101/v_YoYo_g19_c02.avi', "video_name.avi")
It will download the video to the current working directory
I got help from HERE
Solution for python3:
from urllib.request import urlopen
url = 'http://www.python.org'
file = urlopen(url)
html = file.read()
print(html)
import urllib
import urllib.request
from bs4 import BeautifulSoup
with urllib.request.urlopen("http://www.newegg.com/") as url:
s = url.read()
print(s)
soup = BeautifulSoup(s, "html.parser")
all_tag_a = soup.find_all("a", limit=10)
for links in all_tag_a:
#print(links.get('href'))
print(links)
One of the possible way to do it:
import urllib
...
try:
# Python 2
from urllib2 import urlopen
except ImportError:
# Python 3
from urllib.request import urlopen
If your code uses Python version 2.x, you can do the following:
from urllib.request import urlopen
urlopen(url)
By the way, I suggest another module called requests, which is more friendly to use. You can use pip install it, and use it like this:
import requests
requests.get(url)
requests.post(url)
Use the third-party six module to make your code compatible between Python2 and Python3.
from six.moves import urllib
urllib.request.urlopen("<your-url>")
imgResp = urllib3.request.RequestMethods.urlopen(url)
Add this RequestMethods before using urlopen
I just bought a book to show me how to scrape websites but the first example right off the bat is not working for me - so now I am a little upset that I bought the book in the first place but I would like to try and get it going.
In Python 3.5 my code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
BsObj = BeautifulSoup(html.read())
print(bsObj.h1)*
Here is the error I am getting
Traceback (most recent call last):
File
"C:/Users/MyName/AppData/Local/Programs/Python/Python35-32/Lib/site-packages/bs4/test.py",
line 5, in
BsObj = BeautifulSoup(html.read())
File "C:\Users\MyName\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4__init__.py",
line 153, in init
builder = builder_class()
File "C:\Users\MyName\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\builder_htmlparser.py",
line 39, in init
return super(HTMLParserTreeBuilder, self).init(*args, **kwargs)
TypeError: init() got an unexpected keyword argument 'strict'
Any ideas would be super helpful?
Thanks in advance
I guess you transcribed the code from the book. bsObj is not named consistently and there is an unnecessary * after print(). It will work after you change those two things.
Also note that read() is not needed and that it's better to define the parser, otherwise you will get a warning.
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page1.html')
bsObj = BeautifulSoup(html, 'html.parser')
print(bsObj.h1)
Hey you just had some typos BsObj not bsObj in print line.
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
BsObj = BeautifulSoup(html.read())
print(BsObj.h1)
Here is my code, using find_all, but It works great with .find():
import requests
from BeautifulSoup import BeautifulSoup
r = requests.get(URL_DEFINED)
print r.status_code
soup = BeautifulSoup(r.text)
print soup.find_all('ul')
This is what I got:
Traceback (most recent call last):
File "scraper.py", line 19, in <module>
print soup.find_all('ul')
TypeError: 'NoneType' object is not callable
It looks like you're using BeautifulSoup version 3, which used a slightly different naming convention, eg: .findAll, while BeautifulSoup 4 standardised naming to be more PEP8 like, eg: .find_all (but keeps the older naming for backwards compatibility). Note that soup('ul') is the equivalent to find all on both.
To download and install, use pip install beautifulsoup4.
Then change your import to be:
from bs4 import BeautifulSoup
Then you're good to go.
Download BS4 from here. http://www.crummy.com/software/BeautifulSoup/#Download
Install it and import it at the beginning of your code like this:
import requests
from bs4 import BeautifulSoup
r = requests.get(URL_DEFINED)
print r.status_code
soup = BeautifulSoup(r.text)
print soup.find_all('ul')