Multiple Connections to MongoDB Instance using Pymongo - python

I am fairly new to MongoDB, and I wondering how can I establish multiple connections to a single Mongo instance without specifying ports or making a new config file for each user. I am running the Mongo instance in a singularity container on a remote server.
Here is my sample config file:
# mongod.conf
# for documentation of all options, see:
# https://docs.mongodb.com/manual/reference/configuration-options/
# where to write logging data for debugging and such.
systemLog:
destination: file
logAppend: true
path: /path-to-log/
# network interfaces
net:
port: 27017
bindIp: 127.0.0.1
maxIncomingConnections: 65536
#security
security:
authorization: 'enabled'
Do I need to use replica set? If so, can someone explain the concept behind a replica set?
Do I need to change my config file? If so, what changes do I need to make to allow for multiple connections?
Here is my code that I use to connect to the server (leaving out import statements for clarity):
PWD = "/path-to-singularity-container/"
os.chdir(PWD)
self.p = subprocess.Popen(f"singularity run --bind {PWD}/data:/data/db mongo.sif --auth --config {PWD}/mongod.conf", shell=True, preexec_fn=os.setpgrp)
connection_string = "mongodb://user:password#127.0.0.1:27017/"
client = pymongo.MongoClient(connection_string, serverSelectionTimeoutMS=60_000)
EDIT: I am trying to have multiple people connect to MongoDB using pymongo at the same time given the same connection string. I am not sure how I can achieve this without giving each user a separate config. file.
Thank you for your help!

you can enough value of ulimit. Mongod tracks each incoming connection with a file descriptor and a thread.
you can go through the below link which will explain each component of ulimit parameters and their values.
https://docs.mongodb.com/manual/reference/ulimit/
For HA solutions, if you don't want downtime in your environment then you need to go with HA solution which means 3 nodes replica set which can afford one node down at a time.
If the primary node goes down, there will be internal voting and the new node will promote as new primary within seconds. So your application will be less impacted. Another benefit, if your node got crashed, you have another copy of data.
Hope this will answer your question.

No special work is required, you simply create a client and execute queries.

Related

Is it possible to dynamically set the host in SSHHook airflow?

I get the parameters from the API to connect to the remote host. They are different every time and I do not know which ones in advance. Can I pass these parameters to SSHHook?
The code is as follows:
for index, conn in enumerate(get_connections(url=CONFIG.gater_url, vendor=OSS_VENDOR)):
ssh_hook = SSHHook(
ssh_conn_id=CONFIG.ssh.ssh_conn_id,
remote_host=conn.ip,
username=conn.login,
password=conn.password,
port=conn.port
)
Judging by the task logs, it tries to connect to localhost, while in fact the address for connection comes in a different one (this is logged).
In the airflow connections, I removed all the parameters from the ssh connection, leaving only the conn_id.
Yes, it is possible and you do it almost right. Although, you should set ssh_conn_id=None. Otherwise, the credentials that you pass as arguments are overwritten by the values from the connection (the ID of which you also pass to the hook).

How to close a SolrClient connection?

I am using SolrClient for python with Solr 6.6.2. It works as expected but I cannot find anything in the documentation for closing the connection after opening it.
def getdocbyid(docidlist):
for id in docidlist:
solr = SolrClient('http://localhost:8983/solr', auth=("solradmin", "Admin098"))
doc = solr.get('Collection_Test',doc_id=id)
print(doc)
I do not know if the client closes it automatically or not. If it doesn't, wouldn't it be a problem if several connections are left open? I just want to know if it there is any way to close the connection. Here is the link to the documentation:
https://solrclient.readthedocs.io/en/latest/
The connections are not kept around indefinitely. The standard timeout for any persistent http connection in Jetty is five seconds as far as I remember, so you do not have to worry about the number of connections being kept alive exploding.
The Jetty server will also just drop the connection if required, as it's not required to keep it around as a guarantee for the client. solrclient uses a requests session internally, so it should do pipelining for subsequent queries. If you run into issues with this you can keep a set of clients available as a pool in your application instead, then request an available client instead of creating a new one each time.
I'm however pretty sure you won't run into any issues with the default settings.

How to change the dask scheduler and workers?

I am newbie to dask and distributed. I want to change the dask scheduler instead of localhost to use another address of a server. I didn't find how to do it in the internet.
Could you help me please ?
Thanks.
I suppose you can just pass the IP of a server into a worker constructor, like it's done it the docs http://docs.dask.org/en/latest/setup/python-advanced.html?highlight=scheduler#worker
w = Worker('tcp://{your_ip_here}:8786')
From the user session's point of view, you connect to the remote Dask scheduler using the client:
client = dask.distributed.Client('tcp://machine.ip:port')
where you need to fill in the machine's address and port as appropriate. You should not be constructing a Worker in your session, I am assuming that you scheduler already has some workers set up to talk to.
Yes, there are also ways to include the default address in config files, including having the scheduler write it for you on start-up, but the XML file you mention is unlikely to be something directly read by Dask. On the other hand, whoever designed your system may have its own config layer.

get process name for mongodb open connections

I need to know process/file name of the open MongoDB connections.
for example, assume there are files called F1,F2 ..., Fn using Connection pool to get mongodb connection. each running in parallel in different process.
Is there any way to get file name which having open connection to mongodb.
Because, I am on mission to reduce number of open mongodb connections.
when I did below query,
db.serverStatus().connections
It giving me current consumed connections count, available count. But I need filenames which opened connection to optimize.
stack: python,django,some server running in apache, mongodb, pymongo
I figured out myself how to know more about connections information.
db.currentOp(true).inprog
Above command will give all current connections information in array. you can see information such as client ip,whether its active or not,connection id,operation type and everything.
You can get the connection details in the most rudimentary form using a quick shell command such as
ps -eAf | grep mongo
If you use this command on the host running your mongod process. Essentially you can make a note of all active pid's and take corrective actions

How to configure Pyramid to find MongoDB Primary replica

Is there a way to configure Pyramid so that when MongoDB fails over to a secondary replica, Pyramid starts using it?
Pyramid should be using the official python MongoDB drivers. The drivers are configured to do this "automatically", but they need the correct connection string.
See here for the connection strings.
One thing to keep in mind, the definition of "automatic fail-over" is not clear cut.
If you create a new connection to the DB that connection will point at the current primary.
If you use an existing connection from a pool, that connection may be pointing at the wrong server. In this case it will throw an exception the first time and should connect to the correct server the second time.
However, when a fail-over happens, there is a brief window where there is no primary (typically 2-10 seconds). If you use a connection during this period, no connection will be primary.
Note that this is not specific to python, it's the way Replica Sets function.

Categories

Resources