Is it possible to dynamically set the host in SSHHook airflow?

Is it possible to dynamically set the host in SSHHook airflow? - python

I get the parameters from the API to connect to the remote host. They are different every time and I do not know which ones in advance. Can I pass these parameters to SSHHook?
The code is as follows:
for index, conn in enumerate(get_connections(url=CONFIG.gater_url, vendor=OSS_VENDOR)):
ssh_hook = SSHHook(
ssh_conn_id=CONFIG.ssh.ssh_conn_id,
remote_host=conn.ip,
username=conn.login,
password=conn.password,
port=conn.port
)
Judging by the task logs, it tries to connect to localhost, while in fact the address for connection comes in a different one (this is logged).
In the airflow connections, I removed all the parameters from the ssh connection, leaving only the conn_id.

Yes, it is possible and you do it almost right. Although, you should set ssh_conn_id=None. Otherwise, the credentials that you pass as arguments are overwritten by the values from the connection (the ID of which you also pass to the hook).

Related

Multiple Connections to MongoDB Instance using Pymongo

I am fairly new to MongoDB, and I wondering how can I establish multiple connections to a single Mongo instance without specifying ports or making a new config file for each user. I am running the Mongo instance in a singularity container on a remote server.
Here is my sample config file:
# mongod.conf
# for documentation of all options, see:
# https://docs.mongodb.com/manual/reference/configuration-options/
# where to write logging data for debugging and such.
systemLog:
destination: file
logAppend: true
path: /path-to-log/
# network interfaces
net:
port: 27017
bindIp: 127.0.0.1
maxIncomingConnections: 65536
#security
security:
authorization: 'enabled'
Do I need to use replica set? If so, can someone explain the concept behind a replica set?
Do I need to change my config file? If so, what changes do I need to make to allow for multiple connections?
Here is my code that I use to connect to the server (leaving out import statements for clarity):
PWD = "/path-to-singularity-container/"
os.chdir(PWD)
self.p = subprocess.Popen(f"singularity run --bind {PWD}/data:/data/db mongo.sif --auth --config {PWD}/mongod.conf", shell=True, preexec_fn=os.setpgrp)
connection_string = "mongodb://user:password#127.0.0.1:27017/"
client = pymongo.MongoClient(connection_string, serverSelectionTimeoutMS=60_000)
EDIT: I am trying to have multiple people connect to MongoDB using pymongo at the same time given the same connection string. I am not sure how I can achieve this without giving each user a separate config. file.
Thank you for your help!

you can enough value of ulimit. Mongod tracks each incoming connection with a file descriptor and a thread.
you can go through the below link which will explain each component of ulimit parameters and their values.
https://docs.mongodb.com/manual/reference/ulimit/
For HA solutions, if you don't want downtime in your environment then you need to go with HA solution which means 3 nodes replica set which can afford one node down at a time.
If the primary node goes down, there will be internal voting and the new node will promote as new primary within seconds. So your application will be less impacted. Another benefit, if your node got crashed, you have another copy of data.
Hope this will answer your question.

No special work is required, you simply create a client and execute queries.

How to close a SolrClient connection?

I am using SolrClient for python with Solr 6.6.2. It works as expected but I cannot find anything in the documentation for closing the connection after opening it.
def getdocbyid(docidlist):
for id in docidlist:
solr = SolrClient('http://localhost:8983/solr', auth=("solradmin", "Admin098"))
doc = solr.get('Collection_Test',doc_id=id)
print(doc)
I do not know if the client closes it automatically or not. If it doesn't, wouldn't it be a problem if several connections are left open? I just want to know if it there is any way to close the connection. Here is the link to the documentation:
https://solrclient.readthedocs.io/en/latest/

The connections are not kept around indefinitely. The standard timeout for any persistent http connection in Jetty is five seconds as far as I remember, so you do not have to worry about the number of connections being kept alive exploding.
The Jetty server will also just drop the connection if required, as it's not required to keep it around as a guarantee for the client. solrclient uses a requests session internally, so it should do pipelining for subsequent queries. If you run into issues with this you can keep a set of clients available as a pool in your application instead, then request an available client instead of creating a new one each time.
I'm however pretty sure you won't run into any issues with the default settings.

python RPyC user count

I wish to use RPyC to provide an API for a hardware board as a service.
The board can only cater for a single user at a time.
Is there any way I can get RPyC to enforce that only a single user can get access at a time?

I'm not sure if this would work (or work well), but you can try starting a OneShotServer inside a loop, thus at any given moment only one connection is served. When the connection is closed, the server terminates, and you start another one for the next client.
Something like:
is_aborting = False
while not is_aborting:
server = OneShotServer(myservice, *args, **kwargs)
# serve the next client:
server.start()
# done serving the client
If this doesn't work, your best bet is to subclass ThreadedServer, and override the _accept_method method to keep track if there's already a connection open, and return an error if there is.

How to pass SSH options with Fabric?

We are trying to improve automation of some server processes; we use Fabric. I anticipate having to manage multiple hosts, and that means that SSH connections must be made to servers that haven't been SSH'd into before. If that happens, SSH always asks for verification of connection, which will break automation.
I have worked around this issue, in the same process, using the -o stricthostkeychecking=no option on an SSH command that I use to synchronize code with rsync, but I will also need to use it on calls with Fabric.
Is there a way to pass ssh-specific options to Fabric, in particular the one I mentioned above?

The short answer is:
For new hosts, nothing is needed. env.reject_unknown_hosts defaults to False
For known hosts with changed keys, env.disable_known_hosts = True will decide to proceed connecting to changed hosts.
Read ye olde docs: http://docs.fabfile.org/en/1.5/usage/ssh.html#unknown-hosts
The paramiko library is capable of loading up your known_hosts file,
and will then compare any host it connects to, with that mapping.
Settings are available to determine what happens when an unknown host
(a host whose username or IP is not found in known_hosts) is seen:
Reject: the host key is rejected and the connection is not made. This results in a Python exception, which will terminate your Fabric session with a message that the host is unknown.
Add: the new host key is added to the in-memory list of known hosts, the connection is made, and things continue normally. Note that this does not modify your on-disk known_hosts file!
Ask: not yet implemented at the Fabric level, this is a paramiko library option which would result in the user being prompted about the unknown key and whether to accept it.
Whether to reject or add hosts, as above, is controlled in Fabric via
the env.reject_unknown_hosts option, which is False by default for
convenience’s sake. We feel this is a valid tradeoff between
convenience and security; anyone who feels otherwise can easily modify
their fabfiles at module level to set env.reject_unknown_hosts = True.
http://docs.fabfile.org/en/1.5/usage/ssh.html#known-hosts-with-changed-keys
Known hosts with changed keys
The point of SSH’s key/fingerprint tracking is so that
man-in-the-middle attacks can be detected: if an attacker redirects
your SSH traffic to a computer under his control, and pretends to be
your original destination server, the host keys will not match. Thus,
the default behavior of SSH (and its Python implementation) is to
immediately abort the connection when a host previously recorded in
known_hosts suddenly starts sending us a different host key.
In some edge cases such as some EC2 deployments, you may want to
ignore this potential problem. Our SSH layer, at the time of writing,
doesn’t give us control over this exact behavior, but we can sidestep
it by simply skipping the loading of known_hosts – if the host list
being compared to is empty, then there’s no problem. Set
env.disable_known_hosts to True when you want this behavior; it is
False by default, in order to preserve default SSH behavior.
Warning Enabling env.disable_known_hosts will leave you wide open to
man-in-the-middle attacks! Please use with caution.

How to configure Pyramid to find MongoDB Primary replica

Is there a way to configure Pyramid so that when MongoDB fails over to a secondary replica, Pyramid starts using it?

Pyramid should be using the official python MongoDB drivers. The drivers are configured to do this "automatically", but they need the correct connection string.
See here for the connection strings.
One thing to keep in mind, the definition of "automatic fail-over" is not clear cut.
If you create a new connection to the DB that connection will point at the current primary.
If you use an existing connection from a pool, that connection may be pointing at the wrong server. In this case it will throw an exception the first time and should connect to the correct server the second time.
However, when a fail-over happens, there is a brief window where there is no primary (typically 2-10 seconds). If you use a connection during this period, no connection will be primary.
Note that this is not specific to python, it's the way Replica Sets function.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.