inappropriate deploy Scrapy proxies

inappropriate deploy Scrapy proxies - python

I got a error message, when I am scraping a profile. I assume I use my proxy wrong. But what is the main error here? Can you guys help
2017-06-15 21:35:17 [scrapy.proxies] INFO: Removing failed proxy
, 12 proxies left 2017-06-15
21:35:17 [scrapy.core.scraper] ERROR: Error downloading https://www.linkedin.com/in/jiajie-jacky-fan-80920083/> Traceback
(most recent call last): File
"/Users/jiajiefan/data_mining/lib/python2.7/site-packages/twisted/internet/defer.py",
line 1299, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g) File "/Users/jiajiefan/data_mining/lib/python2.7/site-packages/twisted/python/failure.py",
line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb) File "/Users/jiajiefan/data_mining/lib/python2.7/site-packages/Scrapy-1.4.0-py2.7.egg/scrapy/core/downloader/middleware.py",
line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider))) File
"/Users/jiajiefan/data_mining/lib/python2.7/site-packages/Scrapy-1.4.0-py2.7.egg/scrapy/utils/defer.py",
line 45, in mustbe_deferred
result = f(*args, **kw) File "/Users/jiajiefan/data_mining/lib/python2.7/site-packages/Scrapy-1.4.0-py2.7.egg/scrapy/core/downloader/handlers/init.py",
line 65, in download_request
return handler.download_request(request, spider) File "/Users/jiajiefan/data_mining/lib/python2.7/site-packages/Scrapy-1.4.0-py2.7.egg/scrapy/core/downloader/handlers/http11.py",
line 63, in download_request
return agent.download_request(request) File "/Users/jiajiefan/data_mining/lib/python2.7/site-packages/Scrapy-1.4.0-py2.7.egg/scrapy/core/downloader/handlers/http11.py",
line 272, in download_request
agent = self._get_agent(request, timeout) File "/Users/jiajiefan/data_mining/lib/python2.7/site-packages/Scrapy-1.4.0-py2.7.egg/scrapy/core/downloader/handlers/http11.py",
line 252, in _get_agent
_, _, proxyHost, proxyPort, proxyParams = _parse(proxy) File "/Users/jiajiefan/data_mining/lib/python2.7/site-packages/Scrapy-1.4.0-py2.7.egg/scrapy/core/downloader/webclient.py", line 37, in _parse
return _parsed_url_args(parsed) File "/Users/jiajiefan/data_mining/lib/python2.7/site-packages/Scrapy-1.4.0-py2.7.egg/scrapy/core/downloader/webclient.py", line 21, in _parsed_url_args
port = parsed.port File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py",
line 113, in port
port = int(port, 10) ValueError: invalid literal fo
r int() with base 10: '178.32.255.199'

Proxy should has address with 'http', etc:
rq.meta['proxy'] = 'http://127.0.0.1:8123'

Related

Scrapy Spider gives an error while processing

I have built a scrapy prjoect which worked fine. Then in the process of making it into an .exe file, apparently I ruined something because it now gives the following error when ran from the IDE (PyCharm):
2023-02-02 20:41:14 [scrapy.core.scraper] ERROR: Spider error processing <GET https://ra.co/dj/Antigone/past-events> (referer: None)
Traceback (most recent call last):
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\utils\defer.py", line 240, in iter_errback
yield next(it)
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\utils\python.py", line 338, in __next__
return next(self.data)
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\utils\python.py", line 338, in __next__
return next(self.data)
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\core\spidermw.py", line 79, in process_sync
for r in iterable:
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in <genexpr>
return (r for r in result or () if self._filter(r, spider))
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\core\spidermw.py", line 79, in process_sync
for r in iterable:
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 336, in <genexpr>
return (self._set_referer(r, response) for r in result or ())
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\core\spidermw.py", line 79, in process_sync
for r in iterable:
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 28, in <genexpr>
return (r for r in result or () if self._filter(r, spider))
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\core\spidermw.py", line 79, in process_sync
for r in iterable:
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 32, in <genexpr>
return (r for r in result or () if self._filter(r, response, spider))
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\core\spidermw.py", line 79, in process_sync
for r in iterable:
File "C:\Users\axelz\Programmeren\RA_scrapy\rascraper\rascraper\spiders\spiderone.py", line 41, in parse
for post in response.css(''):
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\scrapy\http\response\text.py", line 141, in css
return self.selector.css(query)
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\parsel\selector.py", line 456, in css
return self.xpath(self._css2xpath(query))
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\parsel\selector.py", line 459, in _css2xpath
return self._csstranslator.css_to_xpath(query)
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\parsel\csstranslator.py", line 104, in css_to_xpath
return super().css_to_xpath(css, prefix)
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\cssselect\xpath.py", line 224, in css_to_xpath
for selector in parse(css)
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\cssselect\parser.py", line 543, in parse
return list(parse_selector_group(stream))
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\cssselect\parser.py", line 558, in parse_selector_group
yield Selector(*parse_selector(stream))
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\cssselect\parser.py", line 567, in parse_selector
result, pseudo_element = parse_simple_selector(stream)
File "C:\Users\axelz\Programmeren\RA_scrapy\venv\lib\site-packages\cssselect\parser.py", line 702, in parse_simple_selector
raise SelectorSyntaxError("Expected selector, got %s" % (stream.peek(),))
cssselect.parser.SelectorSyntaxError: Expected selector, got <EOF at 0>
2023-02-02 20:41:14 [scrapy.core.engine] INFO: Closing spider (finished)
I have tried really hard, but have no idea what the actual problem is.
Can anyone point me in the right direction?

Error using set MLFLOW_TRACKING_URI='http://0.0.0.0:5000' for serve models

Hi i need to run a command like this
mlflow server --backend-store-uri postgresql://mlflow_user:mlflow#localhost:5433/mlflow --default-artifact-root file:D:/artifact_root --host 0.0.0.0 --port 5000
for start my serve and i have not problem with this but when i try to run a example
in the route of project from github python
mlflow/examples/sklearn_elasticnet_diabetes/linux/train_diabetes.py 0.1 0.9
i get this error
_model_registry_store_registry.register_entrypoints()
Elasticnet model (alpha=0.100000, l1_ratio=0.900000):
RMSE: 71.98302888908191
MAE: 60.5647520017933
R2: 0.21655161434654602
<function get_tracking_uri at 0x0000017F3AE885E8>
url 'http://0.0.0.0:8001'
url2 'http|//0.0.0.0|8001'
Traceback (most recent call last):
File "train_diabetes.py", line 90, in <module>
mlflow.log_param("alpha", alpha)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 210, in log_param
run_id = _get_or_start_run().info.run_id
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 508, in _get_or_start_run
return start_run()
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 148, in start_run
active_run_obj = MlflowClient().create_run(
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\client.py", line 44, in __init__
self._tracking_client = TrackingServiceClient(final_tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 32, in __init__
self.store = utils._get_store(self.tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 126, in _get_store
return _tracking_store_registry.get_store(store_uri, artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\registry.py", line 37, in get_store
return builder(store_uri=store_uri, artifact_uri=artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 81, in _get_file_store
return FileStore(store_uri, store_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\store\tracking\file_store.py", line 100, in __init__
self.root_directory = local_file_uri_to_path(root_directory or _default_root_dir())
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\utils\file_utils.py", line 387, in local_file_uri_to_path
return urllib.request.url2pathname(path)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\nturl2path.py", line 35, in url2pathname
raise OSError(error)
OSError: Bad URL: 'http|//0.0.0.0|8001'
before running the python code i run this command to set the env tracking uri for the execution set MLFLOW_TRACKING_URI='http://0.0.0.0:5000'
i don´t know why mlflow replace the : for | i need help. Before this option worked but now it is failing

This looks strange because if you set MLFLOW_TRACKING_URI to http://0.0.0.0:5000, RestStore should be used, but your stacktrace says FileStore is used. Can you run the code below and see what it prints out?
from urllib.parse import urlparse
urlparse('http://0.0.0.0:5000')

this error
$ python train_diabetes.py 0.1 0.9
C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_model_registry\utils.py:106: UserWarning: Failure attempting to register store for scheme "file-plugin": No module named 'mlflow_test_plugin.sqlalchemy_store'
_model_registry_store_registry.register_entrypoints()
Elasticnet model (alpha=0.100000, l1_ratio=0.900000):
RMSE: 71.98302888908191
MAE: 60.5647520017933
R2: 0.21655161434654602
Traceback (most recent call last):
File "train_diabetes.py", line 88, in <module>
mlflow.log_param("alpha", alpha)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 218, in log_param
run_id = _get_or_start_run().info.run_id
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 573, in _get_or_start_run
return start_run()
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 159, in start_run
active_run_obj = MlflowClient().create_run(experiment_id=exp_id_for_run, tags=tags)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\client.py", line 54, in __init__
self._tracking_client = TrackingServiceClient(final_tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 39, in __init__
self.store = utils._get_store(self.tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 127, in _get_store
return _tracking_store_registry.get_store(store_uri, artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\registry.py", line 38, in get_store
return builder(store_uri=store_uri, artifact_uri=artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 81, in _get_file_store
return FileStore(store_uri, store_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\store\tracking\file_store.py", line 132, in __init__
self.root_directory = local_file_uri_to_path(root_directory or _default_root_dir())
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\utils\file_utils.py", line 390, in local_file_uri_to_path
return urllib.request.url2pathname(path)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\nturl2path.py", line 33, in url2pathname
raise OSError(error)
OSError: Bad URL: 'http|//0.0.0.0|5000'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tensorflow.py", line 577, in _flush_queue
client = mlflow.tracking.MlflowClient()
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\client.py", line 54, in __init__
self._tracking_client = TrackingServiceClient(final_tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 39, in __init__
self.store = utils._get_store(self.tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 127, in _get_store
return _tracking_store_registry.get_store(store_uri, artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\registry.py", line 38, in get_store
return builder(store_uri=store_uri, artifact_uri=artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 81, in _get_file_store
return FileStore(store_uri, store_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\store\tracking\file_store.py", line 132, in __init__
self.root_directory = local_file_uri_to_path(root_directory or _default_root_dir())
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\utils\file_utils.py", line 390, in local_file_uri_to_path
return urllib.request.url2pathname(path)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\nturl2path.py", line 33, in url2pathname
raise OSError(error)
OSError: Bad URL: 'http|//0.0.0.0|5000'

Issues with Odoo v13 UID/API

Having an issue with the api for Odoo v13. I am able to get the server info but for some reason the uid is not being returned
import xmlrpc.client
url ="localhost:8069"
db = "pnv3"
username = "test"
password = "test"
common = xmlrpc.client.ServerProxy('{}/xmlrpc/2/common'.format(url))
print(common.version())
uid = common.authenticate(db, username, password, url)
print(uid)
getting this error
Traceback (most recent call last):
File "C:/Users/Web Content/.PyCharmCE2019.3/config/scratches/scratch.py", line 11, in <module>
uid = common.authenticate(db, username, password, url)
File "C:\Users\Web Content\AppData\Local\Programs\Python\Python37\lib\xmlrpc\client.py", line 1112, in __call__
return self.__send(self.__name, args)
File "C:\Users\Web Content\AppData\Local\Programs\Python\Python37\lib\xmlrpc\client.py", line 1452, in __request
verbose=self.__verbose
File "C:\Users\Web Content\AppData\Local\Programs\Python\Python37\lib\xmlrpc\client.py", line 1154, in request
return self.single_request(host, handler, request_body, verbose)
File "C:\Users\Web Content\AppData\Local\Programs\Python\Python37\lib\xmlrpc\client.py", line 1170, in single_request
return self.parse_response(resp)
File "C:\Users\Web Content\AppData\Local\Programs\Python\Python37\lib\xmlrpc\client.py", line 1342, in parse_response
return u.close()
File "C:\Users\Web Content\AppData\Local\Programs\Python\Python37\lib\xmlrpc\client.py", line 656, in close
raise Fault(**self._stack[0])
xmlrpc.client.Fault: <Fault 1: 'Traceback (most recent call last):\n File "/odoo/odoo-server/odoo/modules/registry.py", line 59, in __new__\n return cls.registries[db_name]\n File "/odoo/odoo-server/odoo/tools/func.py", line 69, in wrapper\n return func(self, *args, **kwargs)\n File "/odoo/odoo-server/odoo/tools/lru.py", line 44, in __getitem__\n a = self.d[obj].me\nKeyError: \'pnv3\'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/odoo/odoo-server/odoo/addons/base/controllers/rpc.py", line 63, in xmlrpc_2\n response = self._xmlrpc(service)\n File "/odoo/odoo-server/odoo/addons/base/controllers/rpc.py", line 43, in _xmlrpc\n result = dispatch_rpc(service, method, params)\n File "/odoo/odoo-server/odoo/http.py", line 138, in dispatch_rpc\n result = dispatch(method, params)\n File "/odoo/odoo-server/odoo/service/common.py", line 61, in dispatch\n return g[exp_method_name](*params)\n File "/odoo/odoo-server/odoo/service/common.py", line 30, in exp_authenticate\n res_users = odoo.registry(db)[\'res.users\']\n File "/odoo/odoo-server/odoo/__init__.py", line 104, in registry\n return modules.registry.Registry(database_name)\n File "/odoo/odoo-server/odoo/modules/registry.py", line 61, in __new__\n return cls.new(db_name)\n File "/odoo/odoo-server/odoo/modules/registry.py", line 73, in new\n registry.init(db_name)\n File "/odoo/odoo-server/odoo/modules/registry.py", line 141, in init\n with closing(self.cursor()) as cr:\n File "/odoo/odoo-server/odoo/modules/registry.py", line 492, in cursor\n return self._db.cursor()\n File "/odoo/odoo-server/odoo/sql_db.py", line 649, in cursor\n return Cursor(self.__pool, self.dbname, self.dsn, serialized=serialized)\n File "/odoo/odoo-server/odoo/sql_db.py", line 186, in __init__\n self._cnx = pool.borrow(dsn)\n File "/odoo/odoo-server/odoo/sql_db.py", line 532, in _locked\n return fun(self, *args, **kwargs)\n File "/odoo/odoo-server/odoo/sql_db.py", line 600, in borrow\n **connection_info)\n File "/usr/local/lib/python3.7/dist-packages/psycopg2/__init__.py", line 130, in connect\n conn = _connect(dsn, connection_factory=connection_factory, **kwasync)\npsycopg2.OperationalError: FATAL: database "pnv3" does not exist\n\n'>
Process finished with exit cod
1
Databse does exist, have triple checked my password, not sure what else to do at this point.

The url to Odoo server should include protocol part "http://" in the beginning. Strange that you get the version info at all. What do you get as output for the version?
Also you pass the url as the last parameter to authenticate method and this is not required. This should still not give the error you received.
Try your code with these two fixes and report if this helps:
import xmlrpc.client
url ="http://localhost:8069"
db = "pnv3"
username = "test"
password = "test"
common = xmlrpc.client.ServerProxy('{}/xmlrpc/2/common'.format(url))
print(common.version())
uid = common.authenticate(db, username, password, {})
print(uid)
Identical code works on my machine...

How to write a DownloadHandler for scrapy that makes socks4 requests through txsocksx

I am working on a college project, but I need to make the code below work with socks4 instead of tor/socks5. I have tried modifying SOCKS5Agent to SOCKS4Agent but then I receive and error:
Original code: https://stackoverflow.com/a/33944924/11219616
My code:
import scrapy.core.downloader.handlers.http11 as handler
from twisted.internet import reactor
from txsocksx.http import SOCKS4Agent
from twisted.internet.endpoints import TCP4ClientEndpoint
from scrapy.core.downloader.webclient import _parse
class TorScrapyAgent(handler.ScrapyAgent):
_Agent = SOCKS4Agent
def _get_agent(self, request, timeout):
proxy = request.meta.get('proxy')
if proxy:
proxy_scheme, _, proxy_host, proxy_port, _ = _parse(proxy)
if proxy_scheme == 'socks4':
endpoint = TCP4ClientEndpoint(reactor, proxy_host, proxy_port)
return self._Agent(reactor, proxyEndpoint=endpoint)
return super(TorScrapyAgent, self)._get_agent(request, timeout)
class TorHTTPDownloadHandler(handler.HTTP11DownloadHandler):
def download_request(self, request, spider):
agent = TorScrapyAgent(contextFactory=self._contextFactory, pool=self._pool,
maxsize=getattr(spider, 'download_maxsize', self._default_maxsize),
warnsize=getattr(spider, 'download_warnsize', self._default_warnsize))
return agent.download_request(request)
I get the error:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "C:\Python27\lib\site-packages\twisted\python\failure.py", line 491, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "C:\Python27\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "C:\Python27\lib\site-packages\ometa\protocol.py", line 53, in dataReceived
self._parser.receive(data)
File "C:\Python27\lib\site-packages\ometa\tube.py", line 41, in receive
status = self._interp.receive(data)
File "C:\Python27\lib\site-packages\ometa\interp.py", line 48, in receive
for x in self.next:
File "C:\Python27\lib\site-packages\ometa\interp.py", line 177, in apply
for x in self._apply(f, ruleName, argvals):
File "C:\Python27\lib\site-packages\ometa\interp.py", line 110, in _apply
for x in rule():
File "C:\Python27\lib\site-packages\ometa\interp.py", line 256, in parse_Or
for x in self._eval(subexpr):
File "C:\Python27\lib\site-packages\ometa\interp.py", line 241, in parse_And
for x in self._eval(subexpr):
File "C:\Python27\lib\site-packages\ometa\interp.py", line 440, in parse_Action
val = eval(expr.data, self.globals, self._localsStack[-1])
File "<string>", line 1, in <module>
File "C:\Python27\lib\site-packages\txsocksx\client.py", line 276, in serverResponse
raise e.socks4ErrorMap.get(status)()
RequestRejectedOrFailed

Can't use API with username and password in Scrapy

This Curl works.
https://<user>:<pass>#xecdapi.xe.com/v1/convert_from.json/?from=1000000&to=SGD&amount=AED,AUD,BDT&inverse=True
But this Scrapy request doesn't work.
yield scrapy.Request("https://<user>:<pass>#xecdapi.xe.com/v1/convert_from.json/?from=1000000&to=SGD&amount=AED,AUD,BDT&inverse=True")
It returns this error:
Traceback (most recent call last):
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\internet\defer.py", line 1297, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\python\failure.py", line 389, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 61, in download_request
return agent.download_request(request)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 286, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\web\client.py", line 1596, in request
endpoint = self._getEndpoint(parsedURI)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\web\client.py", line 1580, in _getEndpoint
return self._endpointFactory.endpointForURI(uri)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\web\client.py", line 1456, in endpointForURI
uri.port)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\core\downloader\contextfactory.py", line 59, in creatorForNetloc
return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext())
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\internet\_sslverify.py", line 1201, in __init__
self._hostnameBytes = _idnaBytes(hostname)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\internet\_sslverify.py", line 87, in _idnaBytes
return idna.encode(text)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\idna\core.py", line 355, in encode
result.append(alabel(label))
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\idna\core.py", line 276, in alabel
check_label(label)
File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\idna\core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
InvalidCodepoint: Codepoint U+003A at position 28 of u'xxxxxxxxxxxxxxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxxx#xecdapi' not allowed

Scrapy does not support HTTP Authentication via URL. We have to use HTTPAuthMiddleware instead.
in settings.py:
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 811,
}
in the spider:
from scrapy.spiders import CrawlSpider
class SomeIntranetSiteSpider(CrawlSpider):
http_user = 'someuser'
http_pass = 'somepass'
name = 'intranet.example.com'
# .. rest of the spider code omitted ...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

inappropriate deploy Scrapy proxies - python

Proxy should has address with 'http', etc: rq.meta['proxy'] = 'http://127.0.0.1:8123'

Related

Scrapy Spider gives an error while processing

Error using set MLFLOW_TRACKING_URI='http://0.0.0.0:5000' for serve models

Issues with Odoo v13 UID/API

How to write a DownloadHandler for scrapy that makes socks4 requests through txsocksx

Can't use API with username and password in Scrapy

Categories

Resources