How to connect to hdfs from the docker container? - python

My goal is to read file from hdfs in airflow and do further manipulations.
After researching, I found that url I need to use is as follows:
df = pd.read_parquet('http://localhost:9870/webhdfs/v1/hadoop_files/sample_2022_01.parquet?op=OPEN'),
where localhost/172.20.80.1/computer-name.mshome.net can be interchangeably used,
9870 - namenode port,
hadoop_files/sample_2022_01.parquet - my folder and file created in the root.
I can access and read file locally in PyCharm, but I am unable to get the same result inside airflow in docker. I tried using local hdfs and hdfs hosted in docker and changing host to the host.docker.internal, but I am getting the same error.
Stack trace:
[2022-06-12, 17:52:45 UTC] {taskinstance.py:1889} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/usr/local/lib/python3.7/http/client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1036, in _send_output
self.send(msg)
File "/usr/local/lib/python3.7/http/client.py", line 976, in send
self.connect()
File "/usr/local/lib/python3.7/http/client.py", line 948, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
raise err
File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
sock.connect(sa)
OSError: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 207, in execute
branch = super().execute(context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 171, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 189, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/includes/parquet_dag/main.py", line 15, in main
df_parquet = read('hdfs://localhost:9000/hadoop_files/sample_2022_01.parquet')
File "/opt/airflow/dags/includes/parquet_dag/utils.py", line 29, in read
df = pd.read_parquet('http://172.20.80.1:9870/webhdfs/v1/hadoop_files/sample_2022_01.parquet?op=OPEN')
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/parquet.py", line 500, in read_parquet
**kwargs,
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/parquet.py", line 236, in read
mode="rb",
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/parquet.py", line 102, in _get_path_or_handle
path_or_handle, mode, is_text=False, storage_options=storage_options
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/common.py", line 614, in get_handle
storage_options=storage_options,
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/common.py", line 312, in _get_filepath_or_buffer
with urlopen(req_info) as req:
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/common.py", line 212, in urlopen
return urllib.request.urlopen(*args, **kwargs)
File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 1378, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/local/lib/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 113] No route to host>
With host.docker.internal:
urllib.error.URLError: <urlopen error [Errno 99] Cannot assign requested address>

you need to use any routable address inside airflow docker container.
if hadoop is inside docker container as well, check it ip address using docker inspect CONTAINER (doc). if hadoop is on localhost you can set network_mode: "host" (doc)
also there is an important notice if you are on macos and have the docker desktop app which basically a virtual machine. so in this case you need some extra settings, check this, for example.

where localhost/172.20.80.1/computer-name.mshome.net can be interchangeably used,
They shouldn't be interchangeable inside Docker network.
From Airflow, you could use Docker service names, not IP addresses, and ensure the containers are in the same bridge network (not host mode, which only works on Linux). host.docker.internal isn't correct either since you're trying to reach another container, not your host
https://docs.docker.com/network/bridge/
I'd also recommend using Airflow Spark operators to actually read Parquet from HDFS, using Spark, not Pandas or WebHDFS. You can convert Spark dataframes to Pandas, if needed

Related

Webdriver. Starting a large number of instances on virtual machine gives WinError 10061

I want to run many instances of Chrome (using Chromedriver) at the same time. I use Python and Celery. On the local computer everything works without problems, but on the server (Win7 and Win10) - only seven instances of Chrome are opened, the rest are crashed when it tries to run the next instance:
ConnectionRefusedError: [WinError 10061] No connection could be made because of the target machine actively refused it
[2017-04-18 08:14:43,988: ERROR/MainProcess] Task bid[7fce5818-41b5-4ff5-9ae2-41913bd88f43] raised unexpected: ConnectionRefusedError(10061, 'No connection could be made because the target machine actively refused it', None, 10061, None)
Traceback (most recent call last):
File "E:\env\bot\lib\site-packages\celery\app\trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "E:\env\bot\lib\site-packages\celery\app\trace.py", line 438, in _protected_call_
return self.run(*args, **kwargs)
File "E:\bot\app.py", line 100, in bid
home_page.open()
File "E:\env\bot\lib\site-packages\webium\base_page.py", line 57, in open
self._driver.get(self.url)
File "E:\env\bot\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 248, in get
self.execute(Command.GET, {'url': url})
File "E:\env\bot\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "E:\env\bot\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 407, in execute
return self._request(command_info[0], url, body=data)
File "E:\env\bot\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 438, in _request
self._conn.request(method, parsed_url.path, body, headers)
File "C:\Python35-32\Lib\http\client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "C:\Python35-32\Lib\http\client.py", line 1151, in _send_request
self.endheaders(body)
File "C:\Python35-32\Lib\http\client.py", line 1102, in endheaders
self._send_output(message_body)
File "C:\Python35-32\Lib\http\client.py", line 934, in _send_output
self.send(msg)
File "C:\Python35-32\Lib\http\client.py", line 877, in send
self.connect()
File "C:\Python35-32\Lib\http\client.py", line 849, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Python35-32\Lib\socket.py", line 711, in create_connection
raise err
File "C:\Python35-32\Lib\socket.py", line 702, in create_connection
sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
I don't use Selenium Grid, I run lots of instances with next parameters:
def set_driver():
chrome_options = Options()
chrome_options.add_extension('extension_3_0_0_14.crx')
return webdriver.Chrome(chrome_options=chrome_options)
Has anyone met the problem? What could be the reason? How can I solve the issue?
PS Google says that there is a problem with sockets. But why everything goes fine on my local machine and there are troubles on VM?

httplib2.SSLHandshakeError while Installing Google Cloud SDK

While installing the Google Cloud SDK - Python, a httplib2.SSLHandshakeError keeps occuring. I have configured the unfilled_client_secrets.json (shown below the return). And this has not solved the HandshakeError.
Similar questions have been asked on here below, but none have been explicitly answered. Thank you, in advance for any help you might be able to provide.
~ $ ./google-cloud-sdk/install.sh Welcome to the Google Cloud SDK!
Traceback (most recent call last):
File
"/Users/rptrainor/./google-cloud-sdk/bin/bootstrapping/install.py",
line 206, in
main()
File "/Users/rptrainor/./google-cloud-sdk/bin/bootstrapping/install.py",
line 184, in main
Install(pargs.override_components, pargs.additional_components)
File
"/Users/rptrainor/./google-cloud-sdk/bin/bootstrapping/install.py",
line 130, in Install
_CLI.Execute(['--quiet', 'components', 'list'])
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py",
line 759, in Execute
self._HandleAllErrors(exc, command_path_string, specified_arg_names)
File
"/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py",
line 737, in Execute
resources = args.calliope_command.Run(cli=self, args=args)
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py",
line 741, in Run
display_info=self.ai.display_info).Display()
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/calliope/display.py",
line 427, in Display
self._printer.Print(self._resources)
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/resource/resource_printer_base.py", line 251, in Print
for resource in resources:
File "/Users/rptrainor/google-cloud-sdk/lib/surface/components/list.py",
line 86, in Run
result = update_manager.List()
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/updater/update_manager.py",
line 516, in List
_, diff = self._GetStateAndDiff(command_path='components.list')
File
"/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/updater/update_manager.py",
line 446, in _GetStateAndDiff
command_path=command_path)
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/updater/update_manager.py",
line 429, in _GetLatestSnapshot
*effective_url.split(','), command_path=command_path)
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/updater/snapshots.py",
line 165, in FromURLs
for url in urls]
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/updater/snapshots.py",
line 186, in _DictFromURL
response = installers.ComponentInstaller.MakeRequest(url, command_path)
File
"/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/updater/installers.py",
line 283, in MakeRequest
return url_opener.urlopen(req, timeout=timeout)
File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/url_opener.py",
line 69, in urlopen
return opener.open(req, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 404, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 422, in _open
'_open', req) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 382, in _call_chain
result = func(*args) File "/Users/rptrainor/google-cloud-sdk/lib/googlecloudsdk/core/url_opener.py",
line 54, in https_open
return self.do_open(build, req)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
line 1181, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers) File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py",
line 995, in request
self._send_request(method, url, body, headers)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py",
line 1029, in _send_request
self.endheaders(body)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py",
line 991, in endheaders
self._send_output(message_body)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py",
line 844, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py",
line 806, in send
self.connect()
File "/Users/rptrainor/google-cloud-sdk/lib/third_party/httplib2/init.py",
line 1081, in connect
raise SSLHandshakeError(e)
httplib2.SSLHandshakeError: [Errno 1] _ssl.c:510: error:14090086:SSL >routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
{
"web":{
"client_id":"[[CLIENT_ID_IS_HERE]]",
"project_id":"[[PROJECT_ID_IS_HERE]]",
"auth_uri":"https://accounts.google.com/o/oauth2/auth",
"token_uri":"https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
"client_secret":"[[CLIENT_SECRET_IS_HERE]]"
}
}
Try updating Python to the last 2.7.x version. I could resolve the very same issue updating Python to 2.7.13.
One silly yet effective solution could be accessing these URL's via browser once and accepting their certificate.
As well check the time of your computer. If it is not appropriate i mean not in the current date. Server will not share the certificate.

socket.error: [Errno 111] Connection refused linux

I just started with Nagios XI and i got this error and i cant find the right answer does anyone know what the problem is?
I already checked the proxy and i cant find where the problem is.
The code is to get notifications messages through whats app.
[root#localhost ~]# yowsup-cli registration --requestcode sms --phone 316******** --cc 31 --mcc 204 --mnc 10
yowsup-cli v2.0.15
yowsup v2.5.0
Copyright (c) 2012-2016 Tarek Galal
http://www.openwhatsapp.org
This software is provided free of charge. Copying and redistribution is
encouraged.
If you appreciate this software and you would like to support future
development please consider donating:
http://openwhatsapp.org/yowsup/donate
Traceback (most recent call last):
File "/usr/bin/yowsup-cli", line 368, in <module>
if not parser.process():
File "/usr/bin/yowsup-cli", line 189, in process
self.handleRequestCode(self.args["requestcode"], config)
File "/usr/bin/yowsup-cli", line 208, in handleRequestCode
result = codeReq.send()
File "/usr/lib/python2.6/site-packages/yowsup/registration/coderequest.py", line 62, in send
res = super(WACodeRequest, self).send(parser)
File "/usr/lib/python2.6/site-packages/yowsup/common/http/warequest.py", line 70, in send
return self.sendGetRequest(parser)
File "/usr/lib/python2.6/site-packages/yowsup/common/http/warequest.py", line 108, in sendGetRequest
self.response = WARequest.sendRequest(host, port, path, headers, params, "GET")
File "/usr/lib/python2.6/site-packages/yowsup/common/http/warequest.py", line 164, in sendRequest
conn.request(reqType, path, params, headers);
File "/usr/lib64/python2.6/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.6/httplib.py", line 1010, in _send_request
self.endheaders()
File "/usr/lib64/python2.6/httplib.py", line 967, in endheaders
self._send_output()
File "/usr/lib64/python2.6/httplib.py", line 831, in _send_output
self.send(msg)
File "/usr/lib64/python2.6/httplib.py", line 790, in send
self.connect()
File "/usr/lib64/python2.6/httplib.py", line 1171, in connect
sock = socket.create_connection((self.host, self.port), self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
raise error, msg
socket.error: [Errno 111] Connection refused
thanks in advance.

py2neo SocketError: Connection refused, but curl works

I'm trying to get a Flask/Neo4j app set up on a remote Ubuntu server, and I've run into a problem that I haven't been able to figure out. My app uses py2neo, but when it tries to connect to the graph, the app crashes and the Neo4j process seems to stop. I've tried connecting in a python shell like this...
test = Graph('http://localhost:7474/db/data/',username='neo4j',password='myPassword')
which fails, and also renders neo4j inoperative until I restart it. However, these return 200 responses (and the web interface also works):
curl -u neo4j http://localhost:7474/db/data/
requests.get('http://localhost:7474/db/data/', auth=('neo4j','myPassword'))
I've tried to provide more information than this similar question, because it seems like the connection works from everywhere but py2neo.
Here's the full traceback:
Traceback (most recent call last):
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/database/__init__.py", line 318, in __new__
inst = cls.__instances[key]
KeyError: (<class 'py2neo.database.Graph'>, <ServerAddress settings={'http_port': 7474, 'host': 'localhost'}>, 'data')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/packages/httpstream/http.py", line 322, in submit
response = send()
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/packages/httpstream/http.py", line 317, in send
http.request(xstr(method), xstr(uri.absolute_path_reference), body, headers)
File "/usr/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
self.send(msg)
File "/usr/lib/python3.5/http/client.py", line 877, in send
self.connect()
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/packages/httpstream/http.py", line 80, in connect
self.source_address)
File "/usr/lib/python3.5/socket.py", line 711, in create_connection
raise err
File "/usr/lib/python3.5/socket.py", line 702, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/database/__init__.py", line 327, in __new__
use_bolt = version_tuple(inst.__remote__.get().content["neo4j_version"]) >= (3,)
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/database/http.py", line 154, in get
response = self.__base.get(headers=headers, redirect_limit=redirect_limit, **kwargs)
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/packages/httpstream/http.py", line 966, in get
return self.__get_or_head("GET", if_modified_since, headers, redirect_limit, **kwargs)
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/packages/httpstream/http.py", line 943, in __get_or_head
return rq.submit(redirect_limit=redirect_limit, **kwargs)
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/packages/httpstream/http.py", line 433, in submit
http, rs = submit(self.method, uri, self.body, self.headers)
File "/home/deploy/toponimika/toponimikaenv/lib/python3.5/site-packages/py2neo/packages/httpstream/http.py", line 362, in submit
raise SocketError(code, description, host_port=uri.host_port)
py2neo.packages.httpstream.http.SocketError: Connection refused
Anything I might try to figure out what's going on would be appreciated.
Changed to http://username:password#localhost:7474/db/data/ and it works!
Example:
test = Graph('http://username:password#localhost:7474/db/data/')
I had the same issue, solved with a simple upgrade of pip version.
pip install --upgrade py2neo

Posting Content to Friendica using Python

I'm working on a project that is written in Python and needs to post updates to a Friendica server and interact using various APIs available. However, I have had very limited API usage experience and so I'm unsure of how to code this functionality in Python. There is an example on the Friendica GitHub however the Python example would not work for me. There is a Python3 module https://bitbucket.org/tobiasd/python-friendica/overview, however when trying to connect using this in a test script as follows:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import friendica
# make a new instance of friendica
f = friendica.friendica (server = '10.211.55.23/api/statuses/update', username = 'newtest', password = 'klaup8744')
# check that we are logged in
f.account_verify_credentials()
# get the current notifications
print (f.ping())
# post something with the default settings
f.statuses_update( status = "here is the message you are going to post" )
it would refuse the connection with the following message:
Traceback (most recent call last):
File "/usr/lib/python3.4/urllib/request.py", line 1182, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/usr/lib/python3.4/http/client.py", line 1088, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1126, in _send_request
self.endheaders(body)
File "/usr/lib/python3.4/http/client.py", line 1084, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.4/http/client.py", line 922, in _send_output
self.send(msg)
File "/usr/lib/python3.4/http/client.py", line 857, in send
self.connect()
File "/usr/lib/python3.4/http/client.py", line 1223, in connect
super().connect()
File "/usr/lib/python3.4/http/client.py", line 834, in connect
self.timeout, self.source_address)
File "/usr/lib/python3.4/socket.py", line 512, in create_connection
raise err
File "/usr/lib/python3.4/socket.py", line 503, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_1.py", line 12, in <module>
print (f.ping())
File "/home/sambraidley/Desktop/friendica.py", line 851, in ping
res = urlopen(self.protocol()+self.apipath[:-4]+'/ping').read().decode('utf-8')
File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 463, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 481, in _open
'_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1225, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib/python3.4/urllib/request.py", line 1184, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
The friendica Python3 module can be found at the following https://bitbucket.org/tobiasd/python-friendica/src/b0b75ae80a6e747e8724b1ae36972ebfd939beb5/friendica.py?fileviewer=file-view-default
My Friendica server is setup within a VM with the address 10.211.55.23, with test credentials of username = 'newtest' and password 'klaup8744' and it is fully working as using the curl example code to post an update worked perfectly, as follows:
/usr/bin/curl -u newtest:klaup8744 10.211.55.23/api/statuses/update.xml -d source="Testing" -d status="This is a test status"
According to the source link you posted, friendica uses https by default. Your successful curl request is using http.
Try instantiating friendica it using http:
f = friendica.friendica (server = '10.211.55.23/api/statuses/update',
username = 'newtest', password = 'klaup8744', useHTTPS=False)

Categories

Resources