get process name for mongodb open connections

get process name for mongodb open connections - python

I need to know process/file name of the open MongoDB connections.
for example, assume there are files called F1,F2 ..., Fn using Connection pool to get mongodb connection. each running in parallel in different process.
Is there any way to get file name which having open connection to mongodb.
Because, I am on mission to reduce number of open mongodb connections.
when I did below query,
db.serverStatus().connections
It giving me current consumed connections count, available count. But I need filenames which opened connection to optimize.
stack: python,django,some server running in apache, mongodb, pymongo

I figured out myself how to know more about connections information.
db.currentOp(true).inprog
Above command will give all current connections information in array. you can see information such as client ip,whether its active or not,connection id,operation type and everything.

You can get the connection details in the most rudimentary form using a quick shell command such as
ps -eAf | grep mongo
If you use this command on the host running your mongod process. Essentially you can make a note of all active pid's and take corrective actions

Related

Multiple Connections to MongoDB Instance using Pymongo

I am fairly new to MongoDB, and I wondering how can I establish multiple connections to a single Mongo instance without specifying ports or making a new config file for each user. I am running the Mongo instance in a singularity container on a remote server.
Here is my sample config file:
# mongod.conf
# for documentation of all options, see:
# https://docs.mongodb.com/manual/reference/configuration-options/
# where to write logging data for debugging and such.
systemLog:
destination: file
logAppend: true
path: /path-to-log/
# network interfaces
net:
port: 27017
bindIp: 127.0.0.1
maxIncomingConnections: 65536
#security
security:
authorization: 'enabled'
Do I need to use replica set? If so, can someone explain the concept behind a replica set?
Do I need to change my config file? If so, what changes do I need to make to allow for multiple connections?
Here is my code that I use to connect to the server (leaving out import statements for clarity):
PWD = "/path-to-singularity-container/"
os.chdir(PWD)
self.p = subprocess.Popen(f"singularity run --bind {PWD}/data:/data/db mongo.sif --auth --config {PWD}/mongod.conf", shell=True, preexec_fn=os.setpgrp)
connection_string = "mongodb://user:password#127.0.0.1:27017/"
client = pymongo.MongoClient(connection_string, serverSelectionTimeoutMS=60_000)
EDIT: I am trying to have multiple people connect to MongoDB using pymongo at the same time given the same connection string. I am not sure how I can achieve this without giving each user a separate config. file.
Thank you for your help!

you can enough value of ulimit. Mongod tracks each incoming connection with a file descriptor and a thread.
you can go through the below link which will explain each component of ulimit parameters and their values.
https://docs.mongodb.com/manual/reference/ulimit/
For HA solutions, if you don't want downtime in your environment then you need to go with HA solution which means 3 nodes replica set which can afford one node down at a time.
If the primary node goes down, there will be internal voting and the new node will promote as new primary within seconds. So your application will be less impacted. Another benefit, if your node got crashed, you have another copy of data.
Hope this will answer your question.

No special work is required, you simply create a client and execute queries.

ssh multiplexing & python: Will it work like this?

Preconditions:
I want to execute dyamic multiple commands via ssh from python on one remote machine at a time
I couldn't find any existing modules matching my "flavour" (If you care why, see below (*) ;))
Python scripts are running local on a Ubuntu machine
In general for single "one action calls" I simply do a native ssh call using subprocess.Popen and it works fine.
But for multiple subsequent dynamic calls, I don't want to create a new ssh connection for every command, even if the remote host might allow it. I thought of the following solution:
1) Configure my local ssh on Ubuntu to use multiplexing, so as long as a connection is open, it is used instead of creating a new one (https://www.admin-magazin.de/News/Tipps/Mit-SSH-Multiplexing-schneller-einloggen (Sorry, in german))
2) Creating an ssh connection by opening it in a running background thread, where in itself nothing is done, besides maybe a "keepalive" if necessary, or things like that, and keep the connection open till it's closed (i.e. by stopping the thread). (http://sebastiandahlgren.se/2014/06/27/running-a-method-as-a-background-thread-in-python/ )
3) Still executing ssh calls simply via subprocess.Popen, but now automatically using the open connection due to the ssh multiplexing config.
Should this work, or is there a fallacy alert?
(*)What I don't want:
Most solutions/examples I found used paramiko. On my first "happy path" it worked like charm, but the first failure test resulted in an internal AttributeError (https://github.com/paramiko/paramiko/issues/1617) and I don't want to build anything on this.
Other Libs i found like i.e. http://robotframework.org/SSHLibrary/SSHLibrary.html don't seem to have a real community using them.
pexpect....the whole "expect" concept gives me the creeps and should in my opinion only by used if there's absolutly no other reasonable reason ;)

What you've proposed is fine, but you don't even need to keep an ssh connection running in a background thread. If you configure ControlMaster (for reusing an existing connection) and ControlPerist (for keeping the master connection open even when all other connections have closed), then new ssh connections will continue to use the shared connection (as long as they happen before the ControlPersist timeout).
This means that if you set up the ControlMaster configuration external to your code (e.g., in ~/.ssh/ssh_config), your code doesn't even need to be aware of the configuration: it can just continue to call ssh normally, and ssh will take care of reusing the connection.

How to close a SolrClient connection?

I am using SolrClient for python with Solr 6.6.2. It works as expected but I cannot find anything in the documentation for closing the connection after opening it.
def getdocbyid(docidlist):
for id in docidlist:
solr = SolrClient('http://localhost:8983/solr', auth=("solradmin", "Admin098"))
doc = solr.get('Collection_Test',doc_id=id)
print(doc)
I do not know if the client closes it automatically or not. If it doesn't, wouldn't it be a problem if several connections are left open? I just want to know if it there is any way to close the connection. Here is the link to the documentation:
https://solrclient.readthedocs.io/en/latest/

The connections are not kept around indefinitely. The standard timeout for any persistent http connection in Jetty is five seconds as far as I remember, so you do not have to worry about the number of connections being kept alive exploding.
The Jetty server will also just drop the connection if required, as it's not required to keep it around as a guarantee for the client. solrclient uses a requests session internally, so it should do pipelining for subsequent queries. If you run into issues with this you can keep a set of clients available as a pool in your application instead, then request an available client instead of creating a new one each time.
I'm however pretty sure you won't run into any issues with the default settings.

Should I keep a db connection open in MySQLdb?

I am writing a script in python to listen to the twitter streaming api which will track specific keywords and insert them in mysql database using MySQLdb. I am not sure which way to choose:
For each incoming tweet, open a db connection, insert to db, then close the connection.
Open a db connection and execute insert command for incoming tweets and not closing the connection at all.
I think the script will receive 1-10 tweets per second.

It kind of depends on how your script is supposed to be run, but it should close the connection at some point - at least once when the process dies. Assuming it's a long-running process (daemon etc), the simplest strategy would be to use a "with" block to ensure the connection is closed, ie:
with MySQLdb.connect(**kw) as db:
while some_condition():
do_stuff_with(db)
but you'll probably need something a bit more involved since MySQL tends to close idle connections by itself

Threads are leaving behind open files for TCP requests to Amazon services

We are running an application in Amazon Elastic Beanstalk using their 64bit Python container. This application spawns threads, allows them to live for a certain amount of time and then closes them before iterating through the same pattern for an arbitrary period of time.
Each of these threads then creates a few files in the Unix system - a logfile created using the logging module with a FileHandler along with various connections to SQS, EC2, Cloudwatch, Autoscale and S3 - all done using the boto module. These connections create TCP files that can be identified within the results of:
lsof -p {process-id}
When a thread finishes, we remove the FileHandler and close the logger. We also explicitly close every connection that has been made using boto. In any cases where it's possible, we create the connections or files using the with syntax so that any resources can (hopefully) be disposed of afterwards.
However what we are discovering is that there are TCP requests still lingering as open files on the system after the threads have been terminated - in the CLOSE_WAIT state. This is not immediately a problem but eventually the number of open files on the system exceeds the limit set in /etc/security/limits.conf and the Python script stops executing as a result of it.
Currently we are covering ourselves by intermittently calling the GDB and instructing it to close any handlers we've identified as being stale, but this solution lacks elegance and is ignoring the real issue which is these TCP open files continuing to linger.
Is there a pattern I'm missing here outside of the options offered to me to close() a connection?

I discovered the same problem with sockets in the CLOSE_WAIT state with boto 2.3.0. The reason is that the connection is set up as follows:
import boto.ec2
conn = boto.ec2.connect_to_region("eu-west-1")
It will at first open a connection to find all regions, and then open a second connection to the given region. But the first connection was never closed. One can setup a ec2-connection manually through boto.connect_ec2_endpoint(...) or use boto >= 2.7.0
where a static region list is used.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.