Get all documents of a collection using Pymongo

Get all documents of a collection using Pymongo - python

I want to write a function to return all the documents contained in mycollection in mongodb
from pymongo import MongoClient
if __name__ == '__main__':
client = MongoClient("localhost", 27017, maxPoolSize=50)
db=client.mydatabase
collection=db['mycollection']
cursor = collection.find({})
for document in cursor:
print(document)
However, the function returns: Process finished with exit code 0

Here is the sample code which works fine when you run from command prompt.
from pymongo import MongoClient
if __name__ == '__main__':
client = MongoClient("localhost", 27017, maxPoolSize=50)
db = client.localhost
collection = db['chain']
cursor = collection.find({})
for document in cursor:
print(document)
Please check the collection name.

pymongo creates a cursor. Hence you'll get the object 'under' the cursor. To get all objects in general try:
list(db.collection.find({}))
This will force the cursor to iterate over each object and put it in a list()
Have fun...

I think this will work fine in your program.
cursor = db.mycollection # choosing the collection you need
for document in cursor.find():
print (document)

it works fine for me,try checking the exact database name and collection name.
and try changing from db=client.mydatabase to db=client['mydatabase'] .
If your database name is such that using attribute style access won’t work (like test-database), you can use dictionary style access instead.
source !

Related

Python MongoDB Find One

I am trying to find by id a document in the database, but I get None. What am I doing wrong?
python:
card = mongo.db['grl'].find_one({'id': 448510476})
or:
card = mongo.db['grl'].find_one({'id': '448510476'})
document:
{"_id":{"$oid":"5f25b1d787fc4c34a7d9aabe"},
"id":{"$numberInt":"448510476"},"first_name":"Arc","last_name":"Fl"}

I'm not sure how you are initializing your database but try this:
from pymongo import MongoClient
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.database #Selecting database named "database"
#find one in collection named "collection"
card = db.collection.find_one({"id": "448510476"})
print(card)

Raise StopIteration - can't use a tailable cursor on my MongoDB

I would like to stream new entries from my DB but i keep getting this error.
I took the code from the official docs of MongoDB and i just adapted it for my database, but i keep getting this error:
File "C:\Users\Davide\lib\site-packages\pymongo\cursor.py", line 1197, in next
raise StopIteration
StopIteration
Where could the problem be?
import time
from pymongo import MongoClient
import pymongo
client = MongoClient(port=27017)
oplog = client.local.oplog.rs
first = oplog.find().sort('$natural', pymongo.ASCENDING).limit(-1).next()
print(first)
ts = first['ts']
while True:
# For a regular capped collection CursorType.TAILABLE_AWAIT is the
# only option required to create a tailable cursor. When querying the
# oplog the oplog_replay option enables an optimization to quickly
# find the 'ts' value we're looking for. The oplog_replay option
# can only be used when querying the oplog.
cursor = oplog.find({'ts': {'$gt': ts}},
cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
oplog_replay=True)
while cursor.alive:
for doc in cursor:
ts = doc['ts']
print(doc)
# We end up here if the find() returned no documents or if the
# tailable cursor timed out (no new documents were added to the
# collection for more than 1 second).
time.sleep(1)

Deleting documents from collection in Pymongo?

I have the following:
from pymongo import MongoClient
client = MongoClient()
db=client.localhost
collection=db['accounts']
db.collection.remove({})
cursor = collection.find({})
for document in cursor:
print(document)
This second part is to just print all the documents in the collection. However, the collection isn't clearing every time I rerun the program. Does anyone know why?

Just do this
db.accounts.drop()

Instead of
db.collection.remove({})
you can try this
collection.delete_many({})
Hope that solves your problem.

Instead of doing this
db.collection.remove({})
do this
db.accounts.remove({})
Also you won't need this line collection=db['accounts']
If you want dynamic collection name, you can do the following:
collection_name = 'accounts'
getattr(db, collection_name).remove({})

SQLAlchemy, scoped_session - raw SQL INSERT doesn't write to DB

I have a Pyramid / SQLAlchemy, MySQL python app.
When I execute a raw SQL INSERT query, nothing gets written to the DB.
When using ORM, however, I can write to the DB. I read the docs, I read up about the ZopeTransactionExtension, read a good deal of SO questions, all to no avail.
What hasn't worked so far:
transaction.commit() - nothing is written to the DB. I do realize this statement is necessary with ZopeTransactionExtension but it just doesn't do the magic here.
dbsession().commit - doesn't work since I'm using ZopeTransactionExtension
dbsession().close() - nothing written
dbsession().flush() - nothing written
mark_changed(session) -
File "/home/dev/.virtualenvs/sc/local/lib/python2.7/site-packages/zope/sqlalchemy/datamanager.py", line 198, in join_transaction
if session.twophase:
AttributeError: 'scoped_session' object has no attribute 'twophase'"
What has worked but is not acceptable because it doesn't use scoped_session:
engine.execute(...)
I'm looking for how to execute raw SQL with a scoped_session (dbsession() in my code)
Here is my SQLAlchemy setup (models/__init__.py)
def dbsession():
assert (_dbsession is not None)
return _dbsession
def init_engines(settings, _testing_workarounds=False):
import zope.sqlalchemy
extension = zope.sqlalchemy.ZopeTransactionExtension()
global _dbsession
_dbsession = scoped_session(
sessionmaker(
autoflush=True,
expire_on_commit=False,
extension=extension,
)
)
engine = engine_from_config(settings, 'sqlalchemy.')
_dbsession.configure(bind=engine)
Here is a python script I wrote to isolate the problem. It resembles the real-world environment of where the problem occurs. All I want is to make the below script insert the data into the DB:
# -*- coding: utf-8 -*-
import sys
import transaction
from pyramid.paster import setup_logging, get_appsettings
from sc.models import init_engines, dbsession
from sqlalchemy.sql.expression import text
def __main__():
if len(sys.argv) < 2:
raise RuntimeError()
config_uri = sys.argv[1]
setup_logging(config_uri)
aa = init_engines(get_appsettings(config_uri))
session = dbsession()
session.execute(text("""INSERT INTO
operations (description, generated_description)
VALUES ('hello2', 'world');"""))
print list(session.execute("""SELECT * from operations""").fetchall()) # prints inserted data
transaction.commit()
print list(session.execute("""SELECT * from operations""").fetchall()) # doesn't print inserted data
if __name__ == '__main__':
__main__()
What is interesting, if I do:
session = dbsession()
session.execute(text("""INSERT INTO
operations (description, generated_description)
VALUES ('hello2', 'world');"""))
op = Operation(generated_description='aa', description='oo')
session.add(op)
then the first print outputs the raw SQL inserted row ('hello2' 'world'), and the second print prints both rows, and in fact both rows are inserted into the DB.
I cannot comprehend why using an ORM insert alongside raw SQL "fixes" it.
I really need to be able to call execute() on a scoped_session to insert data into the DB using raw SQL. Any advice?

It has been a while since I mixed raw sql with sqlalchemy, but whenever you mix them, you need to be aware of what happens behind the scenes with the ORM. First, check the autocommit flag. If the zope transaction is not configured correctly, the ORM insert might be triggering a commit.
Actually, after looking at the zope docs, it seems manual execute statements need an extra step. From their readme:
By default, zope.sqlalchemy puts sessions in an 'active' state when they are
first used. ORM write operations automatically move the session into a
'changed' state. This avoids unnecessary database commits. Sometimes it
is necessary to interact with the database directly through SQL. It is not
possible to guess whether such an operation is a read or a write. Therefore we
must manually mark the session as changed when manual SQL statements write
to the DB.
>>> session = Session()
>>> conn = session.connection()
>>> users = Base.metadata.tables['test_users']
>>> conn.execute(users.update(users.c.name=='bob'), name='ben')
<sqlalchemy.engine...ResultProxy object at ...>
>>> from zope.sqlalchemy import mark_changed
>>> mark_changed(session)
>>> transaction.commit()
>>> session = Session()
>>> str(session.query(User).all()[0].name)
'ben'
>>> transaction.abort()
It seems you aren't doing that, and so the transaction.commit does nothing.

How can Python Observe Changes to Mongodb's Oplog

I have multiple Python scripts writing to Mongodb using pyMongo. How can another Python script observe changes to a Mongo query and perform some function when the change occurs? mongodb is setup with oplog enabled.

I wrote a incremental backup tool for MongoDB some time ago, in Python. The tool monitors data changes by tailing the oplog. Here is the relevant part of the code.
Updated answer, MongDB 3.6+
As datdinhquoc cleverly points out in the comments below, for MongoDB 3.6 and up there are Change Streams.
Updated answer, pymongo 3
from time import sleep
from pymongo import MongoClient, ASCENDING
from pymongo.cursor import CursorType
from pymongo.errors import AutoReconnect
# Time to wait for data or connection.
_SLEEP = 1.0
if __name__ == '__main__':
oplog = MongoClient().local.oplog.rs
stamp = oplog.find().sort('$natural', ASCENDING).limit(-1).next()['ts']
while True:
kw = {}
kw['filter'] = {'ts': {'$gt': stamp}}
kw['cursor_type'] = CursorType.TAILABLE_AWAIT
kw['oplog_replay'] = True
cursor = oplog.find(**kw)
try:
while cursor.alive:
for doc in cursor:
stamp = doc['ts']
print(doc) # Do something with doc.
sleep(_SLEEP)
except AutoReconnect:
sleep(_SLEEP)
Also see http://api.mongodb.com/python/current/examples/tailable.html.
Original answer, pymongo 2
from time import sleep
from pymongo import MongoClient
from pymongo.cursor import _QUERY_OPTIONS
from pymongo.errors import AutoReconnect
from bson.timestamp import Timestamp
# Tailable cursor options.
_TAIL_OPTS = {'tailable': True, 'await_data': True}
# Time to wait for data or connection.
_SLEEP = 10
if __name__ == '__main__':
db = MongoClient().local
while True:
query = {'ts': {'$gt': Timestamp(some_timestamp, 0)}} # Replace with your query.
cursor = db.oplog.rs.find(query, **_TAIL_OPTS)
cursor.add_option(_QUERY_OPTIONS['oplog_replay'])
try:
while cursor.alive:
try:
doc = next(cursor)
# Do something with doc.
except (AutoReconnect, StopIteration):
sleep(_SLEEP)
finally:
cursor.close()

I ran into this issue today and haven't found an updated answer anywhere.
The Cursor class has changed as of v3.0 and no longer accepts the tailable and await_data arguments. This example will tail the oplog and print the oplog record when it finds a record newer than the last one it found.
# Adapted from the example here: https://jira.mongodb.org/browse/PYTHON-735
# to work with pymongo 3.0
import pymongo
from pymongo.cursor import CursorType
c = pymongo.MongoClient()
# Uncomment this for master/slave.
oplog = c.local.oplog['$main']
# Uncomment this for replica sets.
#oplog = c.local.oplog.rs
first = next(oplog.find().sort('$natural', pymongo.DESCENDING).limit(-1))
ts = first['ts']
while True:
cursor = oplog.find({'ts': {'$gt': ts}}, cursor_type=CursorType.TAILABLE_AWAIT, oplog_replay=True)
while cursor.alive:
for doc in cursor:
ts = doc['ts']
print doc
# Work with doc here

Query the oplog with a tailable cursor.
It is actually funny, because oplog-monitoring is exactly what the tailable-cursor feature was added for originally. I find it extremely useful for other things as well (e.g. implementing a mongodb-based pubsub, see this post for example), but that was the original purpose.

I had the same issue. I put together this rescommunes/oplog.py. Check comments and see __main__ for an example of how you could use it with your script.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get all documents of a collection using Pymongo - python

pymongo creates a cursor. Hence you'll get the object 'under' the cursor. To get all objects in general try: list(db.collection.find({})) This will force the cursor to iterate over each object and put it in a list() Have fun...

I think this will work fine in your program. cursor = db.mycollection # choosing the collection you need for document in cursor.find(): print (document)

Related

Python MongoDB Find One

Raise StopIteration - can't use a tailable cursor on my MongoDB

Deleting documents from collection in Pymongo?

SQLAlchemy, scoped_session - raw SQL INSERT doesn't write to DB

How can Python Observe Changes to Mongodb's Oplog

Categories

Resources