How to open url source code? - python

I try to write code, that can open url connection and read text from the response. I've tried:
import urllib
urllib.urlopen('http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345')
But it gives me this error:
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
urllib.urlopen('http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345')
AttributeError: 'module' object has no attribute 'urlopen'
What's wrong with my code and what is the solution to the problem?

I think the problem is that you are on Python 3.x but using code that only works on 2.x.
In Python 3.x, the urlopen function is contained in urllib.request:
>>> from urllib.request import urlopen
>>> urlopen
<function urlopen at 0x020DA7C8>
>>>
Edit:
I think this does everything you want:
from urllib.request import urlopen
page = urlopen('http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345').read()
print(page)

Try "requests". It is much easier to work with.
http://www.python-requests.org/en/latest/user/quickstart/
import requests
r = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php', params={
'nothing':'12345'
})
print r.text

Related

Why do I receive urlopen no attribute error in python 3.7.3?

Im new to python and trying to work on geocode data.
I have written the below code to find the drive time between to geocodes.
import urllib.request
#from urllib.request import urlopen
import simplejson, urllib
orig_lat = 52.2296756
orig_lng = 21.0122287
dest_lat = 52.406374
dest_lng = 16.9251681
orig_coord = orig_lat, orig_lng
dest_coord = dest_lat, dest_lng
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&mode=driving&language=en-EN&sensor=false".format(str(orig_coord),str(dest_coord))
result= simplejson.load(urllib.urlopen(url))
driving_time = result['rows'][0]['elements'][0]['duration']['value']
However, I receive an error:
Traceback (most recent call last):
File "<ipython-input-154-c5d2043b6825>", line 11, in <module>
result= simplejson.load(urllib.urlopen(url))
AttributeError: module 'urllib' has no attribute 'urlopen'
I have seen other posts about this, but none of them work for me as a solution to this attribute error.
Assuming you're using Python 3, urllib.urlopen has been replaced with urllib.request.urlopen(). Change urllib.urlopen(url) to urllib.request.urlopen(url).
See the Python 3 documentation for more information.

AttributeError: module 'copy' has no attribute 'deepcopy'

I'm Actually New to Python and BS4.
And I Decided to create a script that will scrape a website, oscarmini.com to be precise, the code was running fine untill today when I wanted to modify it, I keep getting errors, in the little knowledge I have about Exceptions and Error, there's nothing wrong with the code it seems to be from the importation of 'bs4' module..
from bs4 import BeautifulSoup as BS
import requests
url = 'https://oscarmini.com/2018/05/techfest-2018.html'
page = requests.get(url)
soup = BS(page.text, 'lxml')
mydivs = soup.find("div", {"class": "entry-content"})
soup.find('div', id="dpsp-content-top").decompose()
print(mydivs.get_text())
input()
Below is the error message I get.
Traceback (most recent call last):
File "C:/Users/USERNaME/Desktop/My Programs/Random/Oscarmini-
Scrapper.py", line 1, in <module>
from bs4 import BeautifulSoup as BS
File "C:\Users\USERNaME\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\__init__.py", line 35, in <module>
import xml.etree.cElementTree as default_etree
File ":\Users\USERNaME\AppData\Local\Programs\Python\Python36-32\lib\xml\etree\cElementTree.py", line 3, in <module>
from xml.etree.ElementTree import *
File "C:\Users\USERNaME\AppData\Local\Programs\Python\Python36-32\lib\xml\etree\ElementTree.py", line 1654, in <module>
from _elementtree import *
AttributeError: module 'copy' has no attribute 'deepcopy'
Process finished with exit code 1
Please I really need help on this..
I encountered the same problem. And I finally found that the problem is I have another script named copy.py and it shadows the original copy module.
You can show the real path for the copy module with print(copy.__file__) just before the exception occurs and see whether it is intended.
You can also list your PATHONPATH environment variable with:
print(os.environ['PYTHONPATH'].split(os.pathsep))
just before the line that causes the exception, and see whether there are something unexpected.
Make sure any copy.py file does not exists in your project working directory...
like
project Folder:
copy.py
currentOpenFile.py # when you import copy module...

ImportError: No module named error when importing urllib.error

I am just getting my feet wet in the art of webscraping and I am following the tutorials from this source. For some reason I cannot import the error module from 'urllib' to handle exceptions. Since this is a built-in library, I am confused as to why this is an issue.
from urllib import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
yields the error
ImportErrorTraceback (most recent call last)
<ipython-input-1-30b72b3bf2ea> in <module>()
1 from urllib import urlopen
----> 2 from urllib.error import HTTPError
3 from urllib.error import URLError
I have tried the same code with another IDE (IntelliJ) and it works as expected leading me to believe that this could be an issue with Google Colab itself. Could someone weight in and possibly help me find a solution to this problem.
I am new to programming, so if this is a juvenile question or if this is not the appropriate place for this question, I apologize in advance.
P.S. I have double checked that the runtime is Python 3
Your problem is in
from urllib import urlopen
Right way to import urlopen is from urllib.request
from urllib.request import urlopen
Docs
Just try this:
from urllib.request import urlopen
Always remember to try to search the docs of a particular library it helps a lot.
You are trying to run this code using Python 2. Use Python 3 and it will work.
Python 2:
>>> from urllib.error import HTTPError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named error
>>>
Python 3:
>>> from urllib.error import HTTPError
>>>

python: called function from other file needs modules

I am calling a function from functions.py into work.py, which works fine:
from functions import get_ad_page_urls
The get_ad_page_urls function makes use of a.o. the requests module.
Now, wether or not I import the requests module into work.py, when I run the called function in work.py, it gives an error: NameError: name 'requests' is not defined.
I have defined get_ad_page_urls in functions.py including the module, like so,
def get_ad_page_urls():
import requests
<rest of function>
or excluding the module, like so,
import requests
def get_ad_page_urls():
<rest of function>
but it doesn't matter, the NameError persists.
How should I write the function such that when I call the function in work.py everything works fine?
Traceback:
get_ad_page_urls(page_root_url)
Traceback (most recent call last):
File "<ipython-input-253-ac55b8b1e24c>", line 1, in <module>
get_ad_page_urls(page_root_url)
File "/Users/myname/Documents/RentIndicator/Python Code/idealista_functions.py", line 35, in get_ad_page_urls
NameError: name 'requests' is not defined
functions.py
import requests
import bs4
import re
from bs4 import BeautifulSoup
def get_ad_page_urls(page_root_url):
response = requests.get(page_root_url)
soup = bs4.BeautifulSoup(response.text)
container=soup.find("div",{"class":"items-container"})
return [link.get("href") for link in container.findAll("a", href=re.compile("^(/inmueble/)((?!:).)*$"))]
work.py
import requests
import bs4
import re
from bs4 import BeautifulSoup
from functions import get_ad_page_urls
city='Valencia'
lcity=city.lower()
root_url = 'https://www.idealista.com'
house_href='/alquiler-habitacion/'
page_root_url = root_url +house_href +lcity+ '-' + lcity + '/'
get_ad_page_urls(page_root_url)
Mine works perfectly fine running on python 3.4.4
functions.py
import requests
def get_ad_page_urls():
return requests.get("https://www.google.com")
work.py
from functions import get_ad_page_urls
print(get_ad_page_urls())
# outputs <Response [200]>
Make sure they are in the same directory. You might be using two different python versions and one of them doesn't have requests?

python urllib error

so I have this code:
def crawl(self, url):
data = urllib.request.urlopen(url)
print(data)
but then when I call the function, it returns
data = urllib.request.urlopen(url)
AttributeError: 'module' object has no attribute 'request'
what did I do wrong? I already imported urllib..
using python 3.1.3
In python3, urllib is a package with three modules request, response, and error for its respective purposes.
Whenever you had import urllib or import urllib2 in Python2.
Replace them with
import urllib.request
import urllib.response
import urllib.error
The classs and methods are same.
BTW, use 2to3 tool if you converting from python2 to python3.
urllib.request is a separate module; import it explicitly.

Categories

Resources