I am creating a script that will call a rest API in python and spits out the results in JSON format. I am getting some few trace back errors in my code. How can I go about fixing this issue.
'import sitecustomize' failed; use -v for traceback
Traceback (most recent call last):
File "/home/Desktop/Sync.py", line 12, in <module>
url = urllib2.Request(request)
File "/usr/lib/python2.7/urllib2.py", line 202, in __init__
self.__original = unwrap(url)
File "/usr/lib/python2.7/urllib.py", line 1057, in unwrap
url = url.strip()
File "/usr/lib/python2.7/urllib2.py", line 229, in __getattr__
raise AttributeError, attr
AttributeError: strip
Here's the code:
import urllib2
import json
url = "http://google.com"
request = urllib2.Request(url)
request.add_header("Authorization","Basic xxxxxxxxxxxxxxxxxx")
socket = urllib2.urlopen(request)
data = json.dumps(socket)
hdrs = socket.headers
source = socket.read()
socket.close()
print "---- Headers -----"
print data
print "---- Source HTML -----"
print source
print "---- END -----"
value = 0
for line in source.splitlines():
if not line.strip(): continue
if line.startswith("value="):
try:
value = line.split("=")
except IndexError:
pass
if value > 0:
break
open("some.json", "w").write("value is: %d" % value)
You seem to have an issue here:
request=urllib2.Request( "http.google.com")
request.add_header("Authorization","Basic xxxxxxxxxxxxxxxxxxxxxxxx=")
url = urllib2.Request(request)
socket = urllib2.urlopen(url)
You are trying to create a Request object named "url" by passing a Request object into the constructor.
See http://docs.python.org/2/library/urllib2.html#urllib2.Request
Try this:
request=urllib2.Request( "http.google.com")
request.add_header("Authorization","Basic xxxxxxxxxxxxxxxxxxxxxxxx=")
socket = urllib2.urlopen(request)
From documentation of Request class:
url should be a string containing a valid URL.
You are curretnly passing another Request object to its constructor, so that's the reason for the error you're seeing. The correct way to do this:
request=urllib2.Request( "http.google.com")
request.add_header("Authorization","Basic xxxxxxxxxxxxxxxxxxxxxxxx=")
socket = urllib2.urlopen(request)
Related
As a part of a small project of mine, I'm using the requests module to make an API call. Here's the snippet:
date = str(day) + '-' + str(month) + '-' + str(year)
req = "https://cdn-api.co-vin.in/api/v2/appointment/sessions/public/findByDistrict?district_id=" + str(distid) + "&date=" + date
response = requests.get(req,headers={'Content-Type': 'application/json'})
st = str(jprint(response.json()))
file = open("data.json",'w')
file.write(st)
file.close()
The jprint function is as follows:
def jprint(obj):
text = json.dumps(obj,sort_keys=True,indent=4)
return text
This is a part of a nested loop. On the first few runs, it worked successfully but after that it gave the following error:
Traceback (most recent call last):
File "vax_alert2.py", line 99, in <module>
st = str(jprint(response.json()))
File "/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I tried adding a sleep of 1 second but got the same error. How should I resolve it?
Also, I checked it without using the jprint function yet got the exact same error.
I would suggest recording the response in case of exception parsing the response as the response body is likely empty with an error status. It's likely that you're getting a 403 or some other error status (potentially from a DDOS aware firewall). Once you know the potentially errant (empty) response status, you may detect said status and throttle your requests accordingly.
try:
st = str(jprint(response.json()))
file = open("data.json",'w')
file.write(st)
file.close()
except:
print(response)
See the following (from https://docs.python-requests.org/en/master/user/quickstart/):
In case the JSON decoding fails, r.json() raises an exception. For
example, if the response gets a 204 (No Content), or if the response
contains invalid JSON, attempting r.json() raises
simplejson.JSONDecodeError if simplejson is installed or raises
ValueError: No JSON object could be decoded on Python 2 or
json.JSONDecodeError on Python 3.
It should be noted that the success of the call to r.json() does not
indicate the success of the response. Some servers may return a JSON
object in a failed response (e.g. error details with HTTP 500). Such
JSON will be decoded and returned. To check that a request is
successful, use r.raise_for_status() or check r.status_code is what
you expect.
My program is written to scan through a large list of websites for SQLi vulnerabilities by adding a simple string query (') to the end of URLs and looking for errors in the page source.
My program keeps getting stuck on the same website. Here's the error I keep receiving:
[-] http://www.pluralsight.com/guides/microsoft-net/getting-started-with-asp-net-mvc-core-1-0-from-zero-to-hero?status=in-review'
[-] Page not found.
[-] http://lfg.go2dental.com/member/dental_search/searchprov.cgi?P=LFGDentalConnect&Network=L'
[-] http://www.parlimen.gov.my/index.php?lang=en'
[-] http://www.otakunews.com/category.php?CatID=23'
[-] http://plaine-d-aunis.bibli.fr/opac/index.php?lvl=cmspage&pageid=6&id_rubrique=100'
[-] Page not found.
[-] http://www.rvparkhunter.com/state.asp?state=britishcolumbia'
[-] http://ensec.org/index.php?option=com_content&view=article&id=547:lord-howell-british-fracking-policy--a-change-of-direction-needed&catid=143:issue-content&Itemid=433'
[-] URL Timed Out
[-] http://www.videohelp.com/tools.php?listall=1'
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "C:\Users\Brice\Desktop\My Site Hunter\sitehunter.py", line 81, in
mp_worker
mainMethod(URLS)
File "C:\Users\Brice\Desktop\My Site Hunter\sitehunter.py", line 77, in
mainMethod
tryMethod(req, URL)
File "C:\Users\Brice\Desktop\My Site Hunter\sitehunter.py", line 48, in
tryMethod
checkforMySQLError(req, URL)
File "C:\Users\Brice\Desktop\My Site Hunter\sitehunter.py", line 23, in
checkforMySQLError
response = urllib.request.urlopen(req, context=gcontext, timeout=2)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 564, in error
result = self._call_chain(*args)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\urllib\request.py", line 753, in http_error_302
fp.read()
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\http\client.py", line 462, in read
s = self._safe_read(self.length)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\http\client.py", line 614, in _safe_read
raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(4659 bytes read, 15043 more
expected)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "sitehunter.py", line 91, in <module>
mp_handler(URLList)
File "sitehunter.py", line 86, in mp_handler
p.map(mp_worker, URLList)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\multiprocessing\pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-
32\lib\multiprocessing\pool.py", line 608, in get
raise self._value
http.client.IncompleteRead: IncompleteRead(4659 bytes read, 15043 more
expected)
C:\Users\Brice\Desktop\My Site Hunter>
Here's my full source code. I narrow it down for you in the next section.
# Start off with imports
import urllib.request
import urllib.error
import socket
import threading
import multiprocessing
import time
import ssl
# Fake a header to get less errors
headers={'User-agent' : 'Mozilla/5.0'}
# Make a class to pass to upon exception errors
class MyException(Exception):
pass
# Checks for mySQL error responses after putting a string (') query on the end of a URL
def checkforMySQLError(req, URL):
# gcontext is to bypass a no SSL error from shutting down my program
gcontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = urllib.request.urlopen(req, context=gcontext, timeout=2)
page_source = response.read()
page_source_string = page_source.decode(encoding='cp866', errors='ignore')
# The if statements behind the whole thing. Checks page source for these errors,
# and returns any that come up positive.
# I'd like to do my outputting here, if possible.
if "You have an error in your SQL syntax" in page_source_string:
print ("\t [+] " + URL)
elif "mysql_fetch" in page_source_string:
print ("\t [+] " + URL)
elif "mysql_num_rows" in page_source_string:
print ("\t [+] " + URL)
elif "MySQL Error" in page_source_string:
print ("\t [+] " + URL)
elif "MySQL_connect()" in page_source_string:
print ("\t [+] " + URL)
elif "UNION SELECT" in page_source_string:
print ("\t [+] " + URL)
else:
print ("\t [-] " + URL)
# Attempts to connect to the URL, and passes an error on if it fails.
def tryMethod(req, URL):
try:
checkforMySQLError(req, URL)
except urllib.error.HTTPError as e:
if e.code == 404:
print("\t [-] Page not found.")
if e.code == 400:
print ("\t [+] " + URL)
except urllib.error.URLError as e:
print("\t [-] URL Timed Out")
except socket.timeout as e:
print("\t [-] URL Timed Out")
pass
except socket.error as e:
print("\t [-] Error in URL")
pass
# This is where the magic begins.
def mainMethod(URLList):
##### THIS IS THE WORK-AROUND I USED TO FIX THIS ERROR ####
# URL = urllib.request.urlopen(URLList, timeout=2)
# Replace any newlines or we get an invalid URL request.
URL = URLList.replace("\n", "")
# URLLib doesn't like https, not sure why.
URL = URL.replace("https://","http://")
# Python likes to truncate urls after spaces, so I add a typical %20.
URL = URL.replace("\s", "%20")
# The blind sql query that makes the errors occur.
URL = URL + "'"
# Requests to connect to the URL and sends it to the tryMethod.
req = urllib.request.Request(URL)
tryMethod(req, URL)
# Multi-processing worker
def mp_worker(URLS):
mainMethod(URLS)
# Multi-processing handler
def mp_handler(URLList):
p = multiprocessing.Pool(25)
p.map(mp_worker, URLList)
# The beginning of it all
if __name__=='__main__':
URLList = open('sites.txt', 'r')
mp_handler(URLList)
Here's the important parts of the code, specifically the parts where I read from URLs using urllib:
def mainMethod(URLList):
##### THIS IS THE WORK-AROUND I USED TO FIX THIS ERROR ####
# URL = urllib.request.urlopen(URLList, timeout=2)
# Replace any newlines or we get an invalid URL request.
URL = URLList.replace("\n", "")
# URLLib doesn't like https, not sure why.
URL = URL.replace("https://","http://")
# Python likes to truncate urls after spaces, so I add a typical %20.
URL = URL.replace("\s", "%20")
# The blind sql query that makes the errors occur.
URL = URL + "'"
# Requests to connect to the URL and sends it to the tryMethod.
req = urllib.request.Request(URL)
tryMethod(req, URL)
# Checks for mySQL error responses after putting a string (') query on the end of a URL
def checkforMySQLError(req, URL):
# gcontext is to bypass a no SSL error from shutting down my program
gcontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = urllib.request.urlopen(req, context=gcontext, timeout=2)
page_source = response.read()
page_source_string = page_source.decode(encoding='cp866', errors='ignore')
I got past this error by making a request to read from URLList before making any changes to it. I commented out the part that fixed it - but only to get another error that looks worse/harder to fix (which is why I included this error although I had fixed it)
Here's the new error when I remove the comment from that line of code:
[-] http://www.davis.k12.ut.us/site/Default.aspx?PageType=1&SiteID=6497&ChannelID=6507&DirectoryType=6'
[-] http://www.surreyschools.ca/NewsEvents/Posts/Lists/Posts/ViewPost.aspx?ID=507'
[-] http://plaine-d-aunis.bibli.fr/opac/index.php?lvl=cmspage&pageid=6&id_rubrique=100'
[-] http://www.parlimen.gov.my/index.php?lang=en'
[-] http://www.rvparkhunter.com/state.asp?state=britishcolumbia'
[-] URL Timed Out
[-] http://www.videohelp.com/tools.php?listall=1'
Traceback (most recent call last):
File "sitehunter.py", line 91, in <module>
mp_handler(URLList)
File "sitehunter.py", line 86, in mp_handler
p.map(mp_worker, URLList)
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\Brice\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 608, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x0381C790>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object",)'
C:\Users\Brice\Desktop\My Site Hunter>
The new error seems worse than the old one, to be honest. That's why I included both. Any information on how to fix this would be greatly appreciated, as I've been stuck trying to fix it for the past few hours.
I accidentally disconnected my internet connection and received this error below. However, why did this line trigger the error?
self.content += tuple(subreddit_posts)
Or perhaps I should ask, why did the following line not lead to a sys.exit? It seems it should catch all errors:
try:
subreddit_posts = self.r.get_content(url, limit=10)
except:
print '*** Could not connect to Reddit.'
sys.exit()
Does this mean I am inadvertently hitting reddit's network twice?
FYI, praw is a reddit API client. And get_content() fetches a subreddit's posts/submissons as a generator object.
The error message:
Traceback (most recent call last):
File "beam.py", line 49, in <module>
main()
File "beam.py", line 44, in main
scan.scanNSFW()
File "beam.py", line 37, in scanNSFW
map(self.getSub, self.nsfw)
File "beam.py", line 26, in getSub
self.content += tuple(subreddit_posts)
File "/Library/Python/2.7/site-packages/praw/__init__.py", line 504, in get_co
page_data = self.request_json(url, params=params)
File "/Library/Python/2.7/site-packages/praw/decorators.py", line 163, in wrap
return_value = function(reddit_session, *args, **kwargs)
File "/Library/Python/2.7/site-packages/praw/__init__.py", line 557, in reques
retry_on_error=retry_on_error)
File "/Library/Python/2.7/site-packages/praw/__init__.py", line 399, in _reque
_raise_response_exceptions(response)
File "/Library/Python/2.7/site-packages/praw/internal.py", line 178, in _raise
response.raise_for_status()
File "/Library/Python/2.7/site-packages/requests/models.py", line 831, in rais
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable
The script (it's short):
import sys, os, pprint, praw
class Scanner(object):
''' A scanner object. '''
def __init__(self):
self.user_agent = 'debian.22990.myapp'
self.r = praw.Reddit(user_agent=self.user_agent)
self.nsfw = ('funny', 'nsfw')
self.nsfw_posters = set()
self.content = ()
def getSub(self, subreddit):
''' Accepts a subreddit. Connects to subreddit and retrieves content.
Unpacks generator object containing content into tuple. '''
url = 'http://www.reddit.com/r/{sub}/'.format(sub=subreddit)
print 'Scanning:', subreddit
try:
subreddit_posts = self.r.get_content(url, limit=10)
except:
print '*** Could not connect to Reddit.'
sys.exit()
print 'Constructing list.',
self.content += tuple(subreddit_posts)
print 'Done.'
def addNSFWPoster(self, post):
print 'Parsing author and adding to posters.'
self.nsfw_posters.add(str(post.author))
def scanNSFW(self):
''' Scans all NSFW subreddits. Makes list of posters.'''
# Get content from all nsfw subreddits
print 'Executing map function.'
map(self.getSub, self.nsfw)
# Scan content and get authors
print 'Executing list comprehension.'
[self.addNSFWPoster(post) for post in self.content]
def main():
scan = Scanner()
scan.scanNSFW()
for i in scan.nsfw_posters:
print i
print len(scan.content)
main()
It looks like praw is going to lazily get objects, so when you actually use subreddit_posts is when the request gets made, which explains why it's blowing up on that line.
See: https://praw.readthedocs.org/en/v2.1.20/pages/lazy-loading.html
I want to get data from pages like http://www.site.com/list?a=data&b=data...
I retrieve all those url from a page of site.com. When trying to open a link, I get error: TypeError: expected BaseHandler instance, got .
My guess is that url need to be "encoded" but how ?
Thanks for your help guys!
Edit:
Ok, here is the code, So all connection pass by my proxy server and try to open the url found earlier like described above.
Code:
tileurl = 'http://www.site.com/list?a=data&b=data'
proxy = SocksiPyHandler(socks.PROXY_TYPE_SOCKS4, '192.168.0.190', 12500)
opener = urllib2.build_opener(proxy)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
infile = opener.open(tileurl)
tile_bin = infile.read()
Traceback (most recent call last):
File "C:\Users\Jean-michel\Dropbox\Projects\Python Code\Maps Saver\map.py", line 89, in <module> opener = urllib2.build_opener(tileurl)
File "C:\Python27\lib\urllib2.py", line 490, in build_opener opener.add_handler(h)
File "C:\Python27\lib\urllib2.py", line 326, in add_handler type(handler))
TypeError: expected BaseHandler instance, got type 'str'
tileurl = tile.replace(t1, "") ## Removing the parameters from the url
p = urlparse.parse_qs(t1) ## decoding the parameter
tileparam = urllib.urlencode(p) ## encoding the parameter...
Problem solved!! :)
I'm trying to submit a POST method form using lxml and I'm getting a TypeError. This is a minimal example that raises this Error:
>>> import lxml.html
>>> page = lxml.html.parse("http://www.webcom.com/html/tutor/forms/start.shtml")
>>> form = page.getroot().forms[0]
>>> form.fields['your_name'] = 'Morphit'
>>> result = lxml.html.parse(lxml.html.submit_form(form))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.3/site-packages/lxml/html/__init__.py", line 887, in submit_form
return open_http(form.method, url, values)
File "/usr/lib/python3.3/site-packages/lxml/html/__init__.py", line 907, in open_http_urllib
return urlopen(url, data)
File "/usr/lib/python3.3/urllib/request.py", line 160, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.3/urllib/request.py", line 471, in open
req = meth(req)
File "/usr/lib/python3.3/urllib/request.py", line 1183, in do_request_
raise TypeError(msg)
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.
I've found the exact error elsewhere online, but I haven't seen it generated from inside lxml like this. Does anyone know if this is a bug, or expected behaviour and how to work around it?
From https://github.com/lxml/lxml/pull/122/files:
"In python3, urlopen expects a byte stream for the POST data. this patch encodes the data in utf-8 before transmission." In src/lxml/html/__init__.py, change line 918,
data = urlencode(values)
to
data = urlencode(values).encode('utf-8')
It is Python 3, so you should write
form.fields['your_name'] = b'Morphit'
or
form.fields['your_name'] = 'Morphit'.encode('utf-8')
def myopen_http(method, url, values):
if not url:
raise ValueError("cannot submit, no URL provided")
## FIXME: should test that it's not a relative URL or something
try:
from urllib import urlencode, urlopen
except ImportError: # Python 3
from urllib.request import urlopen
from urllib.parse import urlencode
if method == 'GET':
if '?' in url:
url += '&'
else:
url += '?'
url += urlencode(values)
data = None
else:
data = urlencode(values).encode('utf-8')
return urlopen(url, data)
result = lxml.html.parse(lxml.html.submit_form(form, open_http=myopen_http))