I'm trying to do this on remote Ubuntu Server:
>>> import urllib2, requests
>>> url = 'http://python.org/'
>>> urllib2.urlopen(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
>>> requests.get(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/django/zyq2/venv/local/lib/python2.7/site-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/home/django/zyq2/venv/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/home/django/zyq2/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 382, in request
resp = self.send(prep, **send_kwargs)
File "/home/django/zyq2/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 505, in send
history = [resp for resp in gen] if allow_redirects else []
File "/home/django/zyq2/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 99, in resolve_redir ts
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
But it works fine on local Windows machine:
>>> urllib2.urlopen(url)
<addinfourl at 57470168 whose fp = <socket._fileobject object at 0x036CB630>>
>>> requests.get(url)
<Response [200]>
I have absolutely no idea about what's going on and would appreciate any suggestion.
Update
I tried S.M. Al Mamun's suggestion and got an exception with long traceback:
>>> req = urllib2.Request(url, headers={ 'User-Agent': 'Mozilla/5.0' })
>>> urllib2.urlopen(req).read()
...
long traceback (more than one page)
...
urllib2.HTTPError: HTTP Error 303: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
See Other
Infinite loop again (I mean TooManyRedirects exception).
Try using a user-agent:
req = urllib2.Request(url, headers={ 'User-Agent': 'Mozilla/5.0' })
urllib2.urlopen(req).read()
If it doesn't work still, that might be your Ubuntu is offline!
Related
I was wondering if somebody can help me with getting my code to work.
import urllib.request
import urllib.parse
import re
url = 'https://www.google.com'
values = {'s':'basics',
'submit':'search'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url,data)
resp = urllib.request.urlopen(req)
respData = resp.read()
print(respData)
It is giving me this error message when running the code.
TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.
I hope someone can help me with my problem. If not thanks anyways.
It is giving me this gigantic error message:
It is giving me this gigantic error
Traceback (most recent call last):
File "C:\Users\user\OneDrive\Desktop\Lotto.py", line 11, in <module>
resp = urllib.request.urlopen(req)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\urllib \request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: Method Not Allowed
It's not working because you have a TYPO:
data = urllib.parse.urlencode(values)
date = data.encode('utf-8') # YOUR TYPO IS HERE
req = urllib.request.Request(url,data)
You need to change that line to:
data = data.encode('utf-8') # TYPO FIXED!
I seem to be getting this error with urllib.request and it gives me this url error that i cant seem to fix.
raceback (most recent call last):
File "C:\Users\Jarvis\Documents\Python Scripts\MultiCheck by Koala.py", line 133, in <module>
Migration()
File "C:\Users\Jarvis\Documents\Python Scripts\MultiCheck by Koala.py", line 116, in Migration
rawdata_uuid = urllib.request.urlopen(url)
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 469, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 507, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: 42
The code im using is here is for a migration checker for a game:
def Migration():
url = "https://api.mojang.com/users/profiles/minecraft/" + einfos
rawdata = urllib.request.urlopen(url)
newrawdata = rawdata.read()
jsondata = json.loads(newrawdata.decode('utf-8'))
results = jsondata['id']
url = "https://sessionserver.mojang.com/session/minecraft/profile/" + results
rawdata_uuid = urllib.request.urlopen(url)
newrawdata_uuid = rawdata_uuid.read()
jsondata_uuid = json.loads(newrawdata_uuid.decode('utf-8'))
try:
results = jsondata_uuid['legacy']
print ("Unmigrated")
except:
print("Migrated")
Error 429 means: Too many requests. You seem to have hit a rate limit. The additional number gives are the seconds you have to wait for the limitation to be dropped. So, try again in 42s, or later.
I use url lib, urllib2, cookie lib to scrape a web:get the login page and post the data.
def getpage():
codeurl=r"http://www.xxx/sign_in"
request=urllib2.Request(codeurl)
response=urllib2.urlopen(request)
return response
def parsecode(response):
"""
parse the login page to get the changed code
"""
pattern=re.compile(r"""<meta.*?csrf-token.*?content=(.*?)\s/>""")
code=re.findall(pattern,response.read())[0]
return code
def Hand():
"""
deal with cookie and header
"""
headers={
"Referer":"xxx",
"User-Agent":"xxx"
}
ck=cookielib.MozillaCookieJar()
handle=urllib2.HTTPCookieProcessor(ck)
openner=urllib2.build_opener(handle)
head=[]
for key,value in headers.items():
tup=(key,value)
head.append(tup)
openner.addheaders = head
return openner
def postdata(code,openner):
"""
post the data xxx.com needed
"""
logurl=r"http://www.jianshu.com/sessions"
sign_in={"name":"xxx","password":"xxx","authenticity_token":code}
data=urllib.urlencode(sign_in).encode("utf-8")
x=openner.open(logurl,data)
for item in ck:
print item
However,I met this bug:
Traceback (most recent call last):
File "jianshu.py", line 80, in
postdata(code,op)
File "jianshu.py", line 43, in postdata
x=openner.open(logurl,data)
File "/usr/lib64/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error
Are you possibly missing a ' in between the 'r' and 'http://...' this line:
codeurl=r"http://www.xxx/sign_in"
I looked at this answer: Again urllib.error.HTTPError: HTTP Error 400: Bad Request because it was very similar to my question, but that solution did not work. I'm using Python 3.3.2. The lines that are like ..something.. are just values I replaced to protect my privacy. They should be strings
I'm getting an Error 400: Bad request from this code:
import urllib.parse
import urllib.request
url = '..url..'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'email':'..my email address..',
'github':'..my github account..'}
headers = {'User-Agent':user_agent}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url, data, headers)
response = urllib.request.urlopen(req) #this line causes the errors
page = response.read()
This is the particular error message:
Traceback (most recent call last):
File "/Users/.../Documents/Code 2040/File.py", line 23, in <module>
response = urllib.request.urlopen(req)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 475, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 513, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 447, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
Hello Guys! I want to access some web page through python script. The url is: http://www.idealo.de/preisvergleich/Shop/27039.html
When I access it through web browser it is OK. But when I want to access it with urllib2:
a = urllib2.urlopen("http://www.idealo.de/preisvergleich/Shop/27039.html")
It gives me the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Also I tried to access it with wget:
wget http://www.idealo.de/preisvergleich/Shop/27039.html
The error is:
--2012-04-23 12:42:03-- http://www.idealo.de/preisvergleich/Shop/27039.html
Resolving www.idealo.de (www.idealo.de)... 62.146.49.133
Connecting to www.idealo.de (www.idealo.de)|62.146.49.133|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-23 12:42:03 ERROR 403: Forbidden.
Can anyone explain why it is so? And how can I access it using python?
They're blocking some user agents. If you try with the following:
wget -U "Mozilla/5.0" http://www.idealo.de/preisvergleich/Shop/27039.html
it works. So you have to find the way to fake the user agent in your python code to make it work.
Try this:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
a = opener.open("http://www.idealo.de/preisvergleich/Shop/27039.html")