How to pass a variable to Scrapy Spider

How to pass a variable to Scrapy Spider - python

I have a list in spider class. I need to initialize it. This is what code looks like:
class Myspider(SitemapSpider):
name = 'spidername'
sitemap_urls = [
'https://www.arabam.com/sitemap/otomobil_13.xml']
sitemap_rules = [
('/otomobil/', 'parse'),
]
custom_settings = {'FEED_FORMAT':'csv','FEED_URI': "arabam_"+str(datetime.today().strftime('%d%m%y'))+'.csv'
}
crawled = []
new_links = 0
def parse(self,response):
if self.new_links >3:
with open("URLs", "wb") as f:
pickle.dump(self.crawled, f)
self.new_links = 0
for td in response.xpath("/html/body/div[3]/div[6]/div[4]/div/div[2]/table/tbody/tr/td[4]/div/a"):
if link[0] not in self.crawled:
self.crawled.append(link[0])
#################################some code
process = CrawlerProcess({
})
Myspider.crawled = []
Myspider.crawled.append("hi")
try:
with (open("URLs", "rb")) as openfile:
while True:
try:
Myspider.crawled = pickle.load(openfile)
except EOFError:
break
except:
with open("URLs", "wb") as f:
pickle.dump("", f)
print(Myspider.crawled)
process.crawl(Myspider, Myspider.crawled)
process.start() # the script wi
It keeps throwing following exception:
Traceback (most recent call last):
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\site-packages\twisted\internet\defer.py", line 151, in maybeDeferred
result = f(*args, **kw)
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\site-packages\scrapy\extensions\feedexport.py", line 262, in item_scraped
slot = self.slot
AttributeError: 'FeedExporter' object has no attribute 'slot'
According to some resource it is because of this:
Traceback (most recent call last):
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\site-packages\twisted\internet\defer.py", line 151, in maybeDeferred
result = f(*args, **kw)
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\site-packages\scrapy\extensions\feedexport.py", line 232, in open_spider
uri = self.urifmt % self._get_uri_params(spider)
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\site-packages\scrapy\extensions\feedexport.py", line 313, in _get_uri_params
params[k] = getattr(spider, k)
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\site-packages\scrapy\spiders\__init__.py", line 36, in logger
logger = logging.getLogger(self.name)
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\logging\__init__.py", line 1845, in getLogger
return Logger.manager.getLogger(name)
File "C:\Users\fatima.arshad\AppData\Local\Continuum\anaconda2\envs\web_scraping\lib\logging\__init__.py", line 1174, in getLogger
raise TypeError('A logger name must be a string')
TypeError: A logger name must be a string
How do I pass it the list or there is any way that this list canbe initialized only once with scrapy spider?
List contains all the urls that have been crawled. This list is pickled. When the code starts, it initializes this list and crawls further only if the link is not present in this list.

You need to pass the list of urls using the spider attribute name (which is crawled) in your case.
According to the docs, if you don't override the __init__ method of the spider, all the passed arguments to the spider class are mapped to the spider attributes. So in order to override the crawled attribute, you need to send the extact argument name.
Something like this:
process = CrawlerProcess()
crawled_urls = []
try:
with (open("URLs", "rb")) as openfile:
while True:
try:
crawled_urls = pickle.load(openfile)
except EOFError:
break
except:
with open("URLs", "wb") as f:
pickle.dump("", f)
print(crawled_urls)
process.crawl(Myspider, crawled=crawled_urls)
process.start() # the script wi

Related

How to let twisted print the exact positions of unhandled errors

I am writing a simple server program. They will be inevitable typos and other errors in fresh code, and usually the python interpreter will print a ValueError/AttributeError traceback and exit. The traceback can point to the exact position of the error. However under the twisted framework, these errors are not printed. Like in the following example:
from twisted.internet import reactor, protocol, task
#from twisted.internet.defer import setDebugging
#setDebugging(True)
class MyProtocol(protocol.Protocol):
def dataReceived(self, data):
try:
set_position(int(data))
except ValueError:
pass
def connectionMade(self):
self.factory.clientConnectionMade(self)
def connectionLost(self, reason):
self.factory.clientConnectionLost(self)
class MyFactory(protocol.Factory):
protocol = MyProtocol
def __init__(self):
self.clients = []
self.lc = task.LoopingCall(self.announce)
self.lc.start(1)
def announce(self):
pos = A_GREAT_TYPO_HERE()
for client in self.clients:
client.transport.write("Posiiton is {0}\n".format(pos).encode('utf-8'))
def clientConnectionMade(self, client):
self.clients.append(client)
def clientConnectionLost(self, client):
self.clients.remove(client)
def get_position():
return position[0]
def set_position(pos):
position[0] = pos
def main():
global position
position = [0]
myfactory = MyFactory()
reactor.listenTCP(5362, myfactory)
reactor.run()
if __name__ == "__main__":
main()
A_GREAT_TYPO_HERE() in MyFactory.announce is meant to be get_position(). But it is a typo.
And when the server is run, the terminal only outputs
Unhandled error in Deferred:
and nothing else. Even if I enable Defer debugging (uncomment the 2nd and 3rd line), the terminal outputs:
Unhandled error in Deferred:
(debug: C: Deferred was created:
C: File "nodes/test.py", line 48, in <module>
C: main()
C: File "nodes/test.py", line 43, in main
C: myfactory = MyFactory()
C: File "nodes/test.py", line 21, in __init__
C: self.lc.start(1)
C: File "/home/sgsdxzy/anaconda3/lib/python3.6/site-packages/twisted/internet/task.py", line 189, in start
C: deferred = self._deferred = defer.Deferred()
I: First Invoker was:
I: File "nodes/test.py", line 48, in <module>
I: main()
I: File "nodes/test.py", line 43, in main
I: myfactory = MyFactory()
I: File "nodes/test.py", line 21, in __init__
I: self.lc.start(1)
I: File "/home/sgsdxzy/anaconda3/lib/python3.6/site-packages/twisted/internet/task.py", line 194, in start
I: self()
I: File "/home/sgsdxzy/anaconda3/lib/python3.6/site-packages/twisted/internet/task.py", line 241, in __call__
I: d.addErrback(eb)
I: File "/home/sgsdxzy/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 332, in addErrback
I: errbackKeywords=kw)
I: File "/home/sgsdxzy/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 310, in addCallbacks
I: self._runCallbacks()
I: File "/home/sgsdxzy/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
I: current.result = callback(current.result, *args, **kw)
I: File "/home/sgsdxzy/anaconda3/lib/python3.6/site-packages/twisted/internet/task.py", line 236, in eb
I: d.errback(failure)
)
It points the error as close as to self.lc.start(1), but not A_GREAT_TYPO_HERE(). How can I debug my program so tracebacks can point to actual errors?

The "C" and "I" lines you're seeing are due to the fact that you've enabled Deferred debugging. The "C" lines give you the stack where the Deferred was created. The "I" lines give you the stack where the Deferred was "invoked" (its callback or errback method was called).
Neither of those is what you're looking for, it seems. If you want to see the stack associated with the Failure the Deferred has been fired with, the most straightforward solution is to make sure the Failure gets logged (and that you have a log observer so that you can actually see that log event).
You should add this to your main:
from sys import stdout
from twisted.logger import globalLogBeginner, textFileLogObserver
globalLogBeginner.beginLoggingTo([textFileLogObserver(stdout)])
This directs the log stream to stdout as text. It is most likely sufficient to get you the information you want. However, to be really safe, you also want to explicitly log failures instead of relying on the garbage collector to do it for you. So you also want to change:
self.lc.start(1)
To:
# Module scope
from twisted.logger import Logger
logger = Logger()
...
# in __init__
d = self.lc.start(1)
d.addErrback(lambda f: logger.failure("Loop thing problem", f))
(Also you may want to consider taking this code out of __init__ and putting it in startFactory instead; also consider not using a global reactor but instead pass it around as a parameter.)
This will give you output like:
2017-04-25T06:53:14-0400 [__main__.MyFactory#critical] Foo
Traceback (most recent call last):
File "debugging2.py", line 52, in main
myfactory = MyFactory()
File "debugging2.py", line 28, in __init__
d = self.lc.start(1)
File "/tmp/debugging/local/lib/python2.7/site-packages/twisted/internet/task.py", line 194, in start
self()
File "/tmp/debugging/local/lib/python2.7/site-packages/twisted/internet/task.py", line 239, in __call__
d = defer.maybeDeferred(self.f, *self.a, **self.kw)
--- <exception caught here> ---
File "/tmp/debugging/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "debugging2.py", line 32, in announce
pos = A_GREAT_TYPO_HERE()
exceptions.NameError: global name 'A_GREAT_TYPO_HERE' is not defined

Scrapy: Item Loader and KeyError even when Key is defined

Intention / expected behaviour
Return the text of the links from page: https://www.bezrealitky.cz/vypis/nabidka-prodej/byt/praha
In CSV format and in the shell.
Error
I get a KeyError: 'title', even though I have defined the key in the item.py itemloader.
Full Traceback
Traceback (most recent call last):
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Users\phili\Documents\Python Scripts\Scrapy Spiders\bezrealitky\bezrealitky\spiders\bezrealitky_spider.py", line 33, in parse
yield loader.load_item()
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 115, in load_item
value = self.get_output_value(field_name)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 122, in get_output_value
proc = self.get_output_processor(field_name)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 144, in get_output_processor
self.default_output_processor)
File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 154, in _get_item_field_attr
value = self.item.fields[field_name].get(key, default)
KeyError: 'title'
Spider.py
def parse(self, response):
for records in response.xpath('//*[starts-with(#class,"record")]'):
loader = BaseItemLoader(selector=records)
loader.add_xpath('title', './/div[#class="details"]/h2/a[#href]/text()')
yield loader.load_item()
Item.py - Itemloader
class BaseItemLoader(ItemLoader):
title_in = MapCompose(unidecode)
Conclusion
I am a bit at a loss, as I think I followed the Scrapy manual and defined the item loader and the key by "title_in", but then when I yield the value to it I get the KeyError. I check in the shell that the Xpath provides the text I want, so at least that is working. Hoping to get some help!

Even if you use ItemLoader you should define Item class first and then pass it to the item loader either defining it as loader's property:
class CustomItemLoader(ItemLoader):
default_item_class = MyItem
or passing its instance to loader's constructor:
l = CustomItemLoader(item=Item())
otherwise item loader knows nothing about the item and its fields.

Dynamically setting scrapy request call back

I'm working with scrapy. I want to rotate proxies on a per request basis and get a proxy from an api I have that returns a single proxy. My plan is to make a request to the api, get a proxy, then use it to set the proxy based on :
http://stackoverflow.com/questions/39430454/making-request-to-api-from-within-scrapy-function
I have the following:
class ContactSpider(Spider):
name = "contact"
def parse(self, response):
....
PR = Request(
'my_api'
headers=self.headers,
meta={'newrequest': Request(url_to_scrape, headers=self.headers),},
callback=self.parse_PR
)
yield PR
def parse_PR(self, response):
newrequest = response.meta['newrequest']
proxy_data = response.body
newrequest.meta['proxy'] = 'http://'+proxy_data
newrequest.replace(url = 'http://ipinfo.io/ip') #TESTING
newrequest.replace(callback= self.form_output) #TESTING
yield newrequest
def form_output(self, response):
open_in_browser(response)
but I'm getting:
Traceback (most recent call last):
File "C:\twisted\internet\defer.py", line 1126, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "C:\twisted\python\failure.py", line 389, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "C:\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "C:\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "C:\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "C:\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
return agent.download_request(request)
File "C:\scrapy\core\downloader\handlers\http11.py", line 255, in download_request
agent = self._get_agent(request, timeout)
File "C:\scrapy\core\downloader\handlers\http11.py", line 235, in _get_agent
_, _, proxyHost, proxyPort, proxyParams = _parse(proxy)
File "C:\scrapy\core\downloader\webclient.py", line 37, in _parse
return _parsed_url_args(parsed)
File "C:\scrapy\core\downloader\webclient.py", line 20, in _parsed_url_args
host = b(parsed.hostname)
File "C:\scrapy\core\downloader\webclient.py", line 17, in <lambda>
b = lambda s: to_bytes(s, encoding='ascii')
File "C:\scrapy\utils\python.py", line 117, in to_bytes
'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got NoneType
what am I doing wrong?

The stacktrace info suggests Scrapy has encountered a request object whose url is None, which is expected to be of string type.
These two lines in your code:
newrequest.replace(url = 'http://ipinfo.io/ip') #TESTING
newrequest.replace(callback= self.form_output) #TESTING
would not work as expected, since method Request.replace returns a new instance instead of modifying the original request in-place.
You would need something like this:
newrequest = newrequest.replace(url = 'http://ipinfo.io/ip') #TESTING
newrequest = newrequest.replace(callback= self.form_output) #TESTING
or simply:
newrequest = newrequest.replace(
url='http://ipinfo.io/ip',
callback=self.form_output
)

Getting results from query with alchimia library

Im trying to use alchimia for get asynchronous API for DB. Trying to make a simple request to DB, like that:
def authorization(self, data):
"""
Checking user with DB
"""
def __gotResult(user):
yield engine.execute(sqlalchemy.select([Users]).where(Users.name == user))
result = __gotResult(data['user'])
log.msg("[AUTH] User=%s trying to auth..." % data['user'])
data, result_msg = commands.AUTH(result, data)
log.msg(result_msg)
return data
And cant understand - what i doing wrong? Maybe issue in option for engine (where reactor=[])?
Source code:
import sys
from json import dumps, loads
import sqlalchemy
from twisted.internet import reactor, ssl
from twisted.python import log, logfile
from twisted.web.server import Site
from twisted.web.static import File
from autobahn.twisted.websocket import WebSocketServerFactory, WebSocketServerProtocol, listenWS
import commands
from db.tables import Users
from alchimia import TWISTED_STRATEGY
log_file = logfile.LogFile("service.log", ".")
log.startLogging(log_file)
engine = sqlalchemy.create_engine('postgresql://test:test#localhost/testdb', pool_size=20, max_overflow=0,strategy=TWISTED_STRATEGY, reactor=[])
class DFSServerProtocol(WebSocketServerProtocol):
commands = commands.commands_user
def __init__(self):
self.commands_handlers = self.__initHandlersUser()
def __initHandlersUser(self):
handlers = commands.commands_handlers_server
handlers['AUTH'] = self.authorization
handlers['READ'] = None
handlers['WRTE'] = None
handlers['DELT'] = None
handlers['RNME'] = None
handlers['SYNC'] = None
handlers['LIST'] = None
return handlers
def authorization(self, data):
"""
Checking user with DB
"""
def __gotResult(user):
yield engine.execute(sqlalchemy.select([Users]).where(Users.name == data['user']))
result = __gotResult(data['user'])
log.msg("[AUTH] User=%s trying to auth..." % data['user'])
data, result_msg = commands.AUTH(result, data)
log.msg(result_msg)
return data
def onMessage(self, payload, isBinary):
json_data = loads(payload)
json_auth = json_data['auth']
json_cmd = json_data['cmd']
if json_auth == False:
if json_cmd == 'AUTH':
json_data = self.commands_handlers['AUTH'](json_data)
# for authorized users
else:
if json_cmd in commands.commands_user.keys():
if self.commands_handlers[json_cmd] is not None:
json_data = self.commands_handlers[json_cmd](json_data)
else:
json_data['error'] = '%s command is not already realized...' % json_cmd
else:
json_data['auth'] = False
json_data['error'] = 'This command is not supported on server...'
response = dumps(json_data)
self.sendMessage(str(response))
if __name__ == '__main__':
if len(sys.argv) > 1 and sys.argv[1] == 'debug':
log.startLogging(sys.stdout)
debug = True
else:
debug = False
contextFactory = ssl.DefaultOpenSSLContextFactory('keys/server.key', 'keys/server.crt')
factory = WebSocketServerFactory("wss://localhost:9000", debug = debug, debugCodePaths = debug)
factory.protocol = DFSServerProtocol
factory.setProtocolOptions(allowHixie76 = True)
listenWS(factory, contextFactory)
webdir = File("./web/")
webdir.contentTypes['.crt'] = 'application/x-x509-ca-cert'
web = Site(webdir)
reactor.listenSSL(8080, web, contextFactory)
#reactor.listenTCP(8080, web)
reactor.run()
Traceback:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/python/log.py", line 88, in callWithLogger
return callWithContext({"system": lp}, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/log.py", line 73, in callWithContext
return context.call({ILogContext: newCtx}, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
why = selectable.doRead()
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 215, in doRead
return self._dataReceived(data)
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 221, in _dataReceived
rval = self.protocol.dataReceived(data)
File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 419, in dataReceived
self._flushReceiveBIO()
File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/tls.py", line 389, in _flushReceiveBIO
ProtocolWrapper.dataReceived(self, bytes)
File "/usr/local/lib/python2.7/dist-packages/twisted/protocols/policies.py", line 120, in dataReceived
self.wrappedProtocol.dataReceived(data)
File "/usr/local/lib/python2.7/dist-packages/autobahn/twisted/websocket.py", line 78, in dataReceived
self._dataReceived(data)
File "/usr/local/lib/python2.7/dist-packages/autobahn/websocket/protocol.py", line 1270, in _dataReceived
self.consumeData()
File "/usr/local/lib/python2.7/dist-packages/autobahn/websocket/protocol.py", line 1286, in consumeData
while self.processData() and self.state != WebSocketProtocol.STATE_CLOSED:
File "/usr/local/lib/python2.7/dist-packages/autobahn/websocket/protocol.py", line 1445, in processData
return self.processDataHybi()
File "/usr/local/lib/python2.7/dist-packages/autobahn/websocket/protocol.py", line 1758, in processDataHybi
fr = self.onFrameEnd()
File "/usr/local/lib/python2.7/dist-packages/autobahn/websocket/protocol.py", line 1887, in onFrameEnd
self._onMessageEnd()
File "/usr/local/lib/python2.7/dist-packages/autobahn/twisted/websocket.py", line 107, in _onMessageEnd
self.onMessageEnd()
File "/usr/local/lib/python2.7/dist-packages/autobahn/websocket/protocol.py", line 734, in onMessageEnd
self._onMessage(payload, self.message_is_binary)
File "/usr/local/lib/python2.7/dist-packages/autobahn/twisted/websocket.py", line 110, in _onMessage
self.onMessage(payload, isBinary)
File "server.py", line 84, in onMessage
json_data = self.commands_handlers['AUTH'](json_data)
File "server.py", line 68, in authorization
data, result_msg = commands.AUTH(result, data)
File "/home/relrin/code/Helenae/helenae/commands.py", line 68, in AUTH
if result['name'] == data['user']:
exceptions.TypeError: 'generator' object has no attribute '__getitem__'

I think you are missing an #inlineCallbacks around __gotResult() That might not help you quite enough, though; since a single statement generator wrapped with inlineCallbacks is sort of pointless. You should get used to working with explicit deferred handling anyway. Lets pull this apart:
def authorization(self, data):
"""
Checking user with DB
"""
# engine.execute already gives us a deferred, will grab on to that.
user = data['user']
result_d = engine.execute(sqlalchemy.select([Users]).where(Users.name == user))
# we don't have the result in authorization,
# we need to wrap any code that works with its result int a callback.
def result_cb(result):
data, result_msg = commands.AUTH(result, data)
return data
result_d.addCallback(result_cb)
# we want to pass the (asynchronous) result out, it's hiding in our deferred;
# so we return *that* instead; callers need to add more callbacks to it.
return result_d
If you insist; we can squish this down into an inline callbacks form:
from twisted.internet.defer import inlineCallbacks, returnValue
#inlineCallbacks
def authorization(self, data):
user = data['user']
result = yield engine.execute(sqlalchemy.select([Users]).where(Users.name == user))
data, result_msg = commands.AUTH(result, data)
yield returnValue(data)
as before, though, authorization() is asynchronous; and must be since engine.execute is async. to use it, you must attach a callback to the deferred it returns (although the caller may also yield it if it is also wrapped in inlineCallbacks

Handling exceptions in while loop - Python

Here is my code (almost full version for #cdhowie :)):
def getResult(method, argument=None):
result = None
while True:
print('### loop')
try:
print ('### try hard...')
if argument:
result = method(argument)
else:
result = method()
break
except Exception as e:
print('### GithubException')
if 403 == e.status:
print('Warning: ' + str(e.data))
print('I will try again after 10 minutes...')
else:
raise e
return result
def getUsernames(locations, gh):
usernames = set()
for location in locations:
print location
result = getResult(gh.legacy_search_users, location)
for user in result:
usernames.add(user.login)
print user.login,
return usernames
# "main.py"
gh = Github()
locations = ['Washington', 'Berlin']
# "main.py", line 12 is bellow
usernames = getUsernames(locations, gh)
The problem is, that exception is raised, but I can't handle it. Here is an output:
### loop
### try hard...
Traceback (most recent call last):
File "main.py", line 12, in <module>
usernames = getUsernames(locations, gh)
File "/home/ciembor/projekty/github-rank/functions.py", line 39, in getUsernames
for user in result:
File "/usr/lib/python2.7/site-packages/PyGithub-1.8.0-py2.7.egg/github/PaginatedList.py", line 33, in __iter__
newElements = self.__grow()
File "/usr/lib/python2.7/site-packages/PyGithub-1.8.0-py2.7.egg/github/PaginatedList.py", line 45, in __grow
newElements = self._fetchNextPage()
File "/usr/lib/python2.7/site-packages/PyGithub-1.8.0-py2.7.egg/github/Legacy.py", line 37, in _fetchNextPage
return self.get_page(page)
File "/usr/lib/python2.7/site-packages/PyGithub-1.8.0-py2.7.egg/github/Legacy.py", line 48, in get_page
None
File "/usr/lib/python2.7/site-packages/PyGithub-1.8.0-py2.7.egg/github/Requester.py", line 69, in requestAndCheck
raise GithubException.GithubException(status, output)
github.GithubException.GithubException: 403 {u'message': u'API Rate Limit Exceeded for 11.11.11.11'}
Why it doesn't print ### handling exception?

Take a close look at the stack trace in the exception:
Traceback (most recent call last):
File "main.py", line 12, in <module>
usernames = getUsernames(locations, gh)
File "/home/ciembor/projekty/github-rank/functions.py", line 39, in getUsernames
for user in result:
File "/usr/lib/python2.7/site-packages/PyGithub-1.8.0-py2.7.egg/github/PaginatedList.py", line 33, in __iter__
newElements = self.__grow()
...
The exception is being thrown from code being called by the line for user in result: after getResult finishes executing. This means that the API you're using is using lazy evaluation, so the actual API request doesn't quite happen when you expect it to.
In order to catch and handle this exception, you'll need to wrap the code inside the getUsernames function with a try/except handler.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to pass a variable to Scrapy Spider - python

Related

How to let twisted print the exact positions of unhandled errors

Scrapy: Item Loader and KeyError even when Key is defined

Dynamically setting scrapy request call back

Getting results from query with alchimia library

Handling exceptions in while loop - Python

Categories

Resources