How can Python Observe Changes to Mongodb's Oplog

How can Python Observe Changes to Mongodb's Oplog - python

I have multiple Python scripts writing to Mongodb using pyMongo. How can another Python script observe changes to a Mongo query and perform some function when the change occurs? mongodb is setup with oplog enabled.

I wrote a incremental backup tool for MongoDB some time ago, in Python. The tool monitors data changes by tailing the oplog. Here is the relevant part of the code.
Updated answer, MongDB 3.6+
As datdinhquoc cleverly points out in the comments below, for MongoDB 3.6 and up there are Change Streams.
Updated answer, pymongo 3
from time import sleep
from pymongo import MongoClient, ASCENDING
from pymongo.cursor import CursorType
from pymongo.errors import AutoReconnect
# Time to wait for data or connection.
_SLEEP = 1.0
if __name__ == '__main__':
oplog = MongoClient().local.oplog.rs
stamp = oplog.find().sort('$natural', ASCENDING).limit(-1).next()['ts']
while True:
kw = {}
kw['filter'] = {'ts': {'$gt': stamp}}
kw['cursor_type'] = CursorType.TAILABLE_AWAIT
kw['oplog_replay'] = True
cursor = oplog.find(**kw)
try:
while cursor.alive:
for doc in cursor:
stamp = doc['ts']
print(doc) # Do something with doc.
sleep(_SLEEP)
except AutoReconnect:
sleep(_SLEEP)
Also see http://api.mongodb.com/python/current/examples/tailable.html.
Original answer, pymongo 2
from time import sleep
from pymongo import MongoClient
from pymongo.cursor import _QUERY_OPTIONS
from pymongo.errors import AutoReconnect
from bson.timestamp import Timestamp
# Tailable cursor options.
_TAIL_OPTS = {'tailable': True, 'await_data': True}
# Time to wait for data or connection.
_SLEEP = 10
if __name__ == '__main__':
db = MongoClient().local
while True:
query = {'ts': {'$gt': Timestamp(some_timestamp, 0)}} # Replace with your query.
cursor = db.oplog.rs.find(query, **_TAIL_OPTS)
cursor.add_option(_QUERY_OPTIONS['oplog_replay'])
try:
while cursor.alive:
try:
doc = next(cursor)
# Do something with doc.
except (AutoReconnect, StopIteration):
sleep(_SLEEP)
finally:
cursor.close()

I ran into this issue today and haven't found an updated answer anywhere.
The Cursor class has changed as of v3.0 and no longer accepts the tailable and await_data arguments. This example will tail the oplog and print the oplog record when it finds a record newer than the last one it found.
# Adapted from the example here: https://jira.mongodb.org/browse/PYTHON-735
# to work with pymongo 3.0
import pymongo
from pymongo.cursor import CursorType
c = pymongo.MongoClient()
# Uncomment this for master/slave.
oplog = c.local.oplog['$main']
# Uncomment this for replica sets.
#oplog = c.local.oplog.rs
first = next(oplog.find().sort('$natural', pymongo.DESCENDING).limit(-1))
ts = first['ts']
while True:
cursor = oplog.find({'ts': {'$gt': ts}}, cursor_type=CursorType.TAILABLE_AWAIT, oplog_replay=True)
while cursor.alive:
for doc in cursor:
ts = doc['ts']
print doc
# Work with doc here

Query the oplog with a tailable cursor.
It is actually funny, because oplog-monitoring is exactly what the tailable-cursor feature was added for originally. I find it extremely useful for other things as well (e.g. implementing a mongodb-based pubsub, see this post for example), but that was the original purpose.

I had the same issue. I put together this rescommunes/oplog.py. Check comments and see __main__ for an example of how you could use it with your script.

Related

'SQLTable' object has no attribute 'insert_statement'

I am setting up a new computer at work, and after installing anaconda and other various packages I have on my other computer, I am attempting to run some code that works fine on my other computer.
However, when trying to use SQLalchemy to import into redshift, I am getting a new error that I can't find anything on via google:
'SQLTable' object has no attribute 'insert_statement'
this appears to be some issue with padas.io.sql but I have no clue what
here is the code block:
import io
from pandas.io.sql import SQLTable
def _execute_insert(self, conn, keys, data_iter):
print("Using monkey-patched _execute_insert")
data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
conn.execute(self.insert_statement().values(data))
SQLTable._execute_insert = _execute_insert
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import text
dbschema='xref'
engine = create_engine('not_showing_you_this_part',
connect_args={'options': '-csearch_path={}'.format(dbschema)})
# test
from sqlalchemy import event, create_engine
#event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
cursor.commit()
# end test
api_start_time = time.time()
print('starting SQL query')
# change yh to the dataframe you want to upload
# under name = : enter in the name of the table you want to create or append to
df.to_sql(name='computer_test', con = engine, if_exists = 'append',index=False)
print('sql insert took: ' + str((time.time() - api_start_time)) + ' seconds')
for reference, the monkey-patch part is from:
How to speed up insertion from pandas.DataFrame .to_sql
full error in image

I kept on searching for the answer, only to see that a gentlemen had answered you question in the comment section.
I was using a very similar code for connecting and inserting to Redshift.
And the mistake I was committing was to use the below
conn.execute(self.insert_statement().values(data))
Replace the above with the code below:
conn.execute(self.table.insert().values(data))
Shoutout to https://stackoverflow.com/users/6560549/supershoot for answering it in the comments.

Using SQL Server file streaming in Python

I am attempting to use SQL Server 2017 filestream in python. All of the functionality i use goes through sqlalchemy, thus i am attempting to find a way of using this, since i haven't found any implementation within sqlalchemy or other libraries (may have missed something, if so please point me to a working and tested implementation).
I have decided to approach this using the dll, based on https://github.com/VisionMark/django-mssql-filestream/blob/master/sql_filestream/win32_streaming_api.py . However, my call to the OpenSqlFilestream fails and returns -1 instead of file handle. I have no idea what the issue is or how to fix it.
from ctypes import c_char, sizeof, windll
from sqlalchemy import create_engine
from sqlalchemy.orm import session_maker
import msvcrt
import os
msodbcsql = windll.LoadLibrary("C:\Windows\System32\msodbcsql17.dll")
engine = create_engine("mssql+pyodbc://user:pass#test/test?TrustedConnection=yes+driver=ODBC Driver+17+for+SQL+Server")
maker = session_maker(bind=engine)
session = session_maker()
## first query should begind transaction
path = session.execute("SELECT file_stream.PathName() FROM test_filetable").fetchall()[0][0]
## this returns str like "\\\\test\\*"
context = session.execute("SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()").fetchall()[0][0]
## returns bytes
_context = (c_char*len(context)).from_buffer_copy(context)
## This call fails
handle = msodbcsql.OpenSqlFilestream(
path, # FilestreamPath
0, # DesiredAccess
0, # OpenOptions
_context, # FilestreamTransactionContext
sizeof(_context), # FilestreamTransactionContextLength
0 # AllocationSize
)
## this returns -1 instead of handle
## Never reached, but this should create usable file
desc = msvcrt.open_osfhandle(fsHandle, os.O_RDONLY)
_file = os.fdopen(desc, 'r')
All of the queries work and output (as far as i understand) correct data.
How do i obtain filestream access to a file on SQL Server 2017 from python (3.7)?
Edit: The objects i read go to the size of gigabytes and the process only needs stream access.

My guess is that your issue is related to
the fact that a SQLAlchemy Session is much more than just a raw DB API Connection, and/or
the transaction context is not appropriate for your invocation of OpenSqlFilestream
For what it's worth, the following works for me with CPython 3.7.2 and pythonnet 2.4.0:
import clr
clr.AddReference("System.Data")
from System.Data import IsolationLevel
from System.Data.SqlClient import SqlCommand, SqlConnection
from System.Data.SqlTypes import SqlFileStream
from System.IO import File, FileAccess, FileOptions
# adapted from c# code at
# https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/filestream-data
connection_string = r"Data Source=(local)\SQLEXPRESS;Initial Catalog=myDB;Integrated Security=True"
con = SqlConnection(connection_string)
con.Open()
sql = """\
SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
FROM employees WHERE EmployeeID = 1"""
cmd = SqlCommand(sql, con)
tran = con.BeginTransaction(IsolationLevel.ReadCommitted)
cmd.Transaction = tran
rdr = cmd.ExecuteReader()
rdr.Read()
path = rdr.GetString(0)
transaction_context = rdr.GetSqlBytes(1).Buffer
rdr.Close()
allocation_size = 0
input_stream = SqlFileStream(path, transaction_context,
FileAccess.Read, FileOptions.SequentialScan, allocation_size)
output_stream = File.Create(r"C:\Users\Gord\Desktop\photo.bmp")
input_stream.CopyTo(output_stream)
output_stream.Close()
input_stream.Close()
tran.Commit()
con.Close()

Raise StopIteration - can't use a tailable cursor on my MongoDB

I would like to stream new entries from my DB but i keep getting this error.
I took the code from the official docs of MongoDB and i just adapted it for my database, but i keep getting this error:
File "C:\Users\Davide\lib\site-packages\pymongo\cursor.py", line 1197, in next
raise StopIteration
StopIteration
Where could the problem be?
import time
from pymongo import MongoClient
import pymongo
client = MongoClient(port=27017)
oplog = client.local.oplog.rs
first = oplog.find().sort('$natural', pymongo.ASCENDING).limit(-1).next()
print(first)
ts = first['ts']
while True:
# For a regular capped collection CursorType.TAILABLE_AWAIT is the
# only option required to create a tailable cursor. When querying the
# oplog the oplog_replay option enables an optimization to quickly
# find the 'ts' value we're looking for. The oplog_replay option
# can only be used when querying the oplog.
cursor = oplog.find({'ts': {'$gt': ts}},
cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
oplog_replay=True)
while cursor.alive:
for doc in cursor:
ts = doc['ts']
print(doc)
# We end up here if the find() returned no documents or if the
# tailable cursor timed out (no new documents were added to the
# collection for more than 1 second).
time.sleep(1)

Get all documents of a collection using Pymongo

I want to write a function to return all the documents contained in mycollection in mongodb
from pymongo import MongoClient
if __name__ == '__main__':
client = MongoClient("localhost", 27017, maxPoolSize=50)
db=client.mydatabase
collection=db['mycollection']
cursor = collection.find({})
for document in cursor:
print(document)
However, the function returns: Process finished with exit code 0

Here is the sample code which works fine when you run from command prompt.
from pymongo import MongoClient
if __name__ == '__main__':
client = MongoClient("localhost", 27017, maxPoolSize=50)
db = client.localhost
collection = db['chain']
cursor = collection.find({})
for document in cursor:
print(document)
Please check the collection name.

pymongo creates a cursor. Hence you'll get the object 'under' the cursor. To get all objects in general try:
list(db.collection.find({}))
This will force the cursor to iterate over each object and put it in a list()
Have fun...

I think this will work fine in your program.
cursor = db.mycollection # choosing the collection you need
for document in cursor.find():
print (document)

it works fine for me,try checking the exact database name and collection name.
and try changing from db=client.mydatabase to db=client['mydatabase'] .
If your database name is such that using attribute style access won’t work (like test-database), you can use dictionary style access instead.
source !

SQLAlchemy, scoped_session - raw SQL INSERT doesn't write to DB

I have a Pyramid / SQLAlchemy, MySQL python app.
When I execute a raw SQL INSERT query, nothing gets written to the DB.
When using ORM, however, I can write to the DB. I read the docs, I read up about the ZopeTransactionExtension, read a good deal of SO questions, all to no avail.
What hasn't worked so far:
transaction.commit() - nothing is written to the DB. I do realize this statement is necessary with ZopeTransactionExtension but it just doesn't do the magic here.
dbsession().commit - doesn't work since I'm using ZopeTransactionExtension
dbsession().close() - nothing written
dbsession().flush() - nothing written
mark_changed(session) -
File "/home/dev/.virtualenvs/sc/local/lib/python2.7/site-packages/zope/sqlalchemy/datamanager.py", line 198, in join_transaction
if session.twophase:
AttributeError: 'scoped_session' object has no attribute 'twophase'"
What has worked but is not acceptable because it doesn't use scoped_session:
engine.execute(...)
I'm looking for how to execute raw SQL with a scoped_session (dbsession() in my code)
Here is my SQLAlchemy setup (models/__init__.py)
def dbsession():
assert (_dbsession is not None)
return _dbsession
def init_engines(settings, _testing_workarounds=False):
import zope.sqlalchemy
extension = zope.sqlalchemy.ZopeTransactionExtension()
global _dbsession
_dbsession = scoped_session(
sessionmaker(
autoflush=True,
expire_on_commit=False,
extension=extension,
)
)
engine = engine_from_config(settings, 'sqlalchemy.')
_dbsession.configure(bind=engine)
Here is a python script I wrote to isolate the problem. It resembles the real-world environment of where the problem occurs. All I want is to make the below script insert the data into the DB:
# -*- coding: utf-8 -*-
import sys
import transaction
from pyramid.paster import setup_logging, get_appsettings
from sc.models import init_engines, dbsession
from sqlalchemy.sql.expression import text
def __main__():
if len(sys.argv) < 2:
raise RuntimeError()
config_uri = sys.argv[1]
setup_logging(config_uri)
aa = init_engines(get_appsettings(config_uri))
session = dbsession()
session.execute(text("""INSERT INTO
operations (description, generated_description)
VALUES ('hello2', 'world');"""))
print list(session.execute("""SELECT * from operations""").fetchall()) # prints inserted data
transaction.commit()
print list(session.execute("""SELECT * from operations""").fetchall()) # doesn't print inserted data
if __name__ == '__main__':
__main__()
What is interesting, if I do:
session = dbsession()
session.execute(text("""INSERT INTO
operations (description, generated_description)
VALUES ('hello2', 'world');"""))
op = Operation(generated_description='aa', description='oo')
session.add(op)
then the first print outputs the raw SQL inserted row ('hello2' 'world'), and the second print prints both rows, and in fact both rows are inserted into the DB.
I cannot comprehend why using an ORM insert alongside raw SQL "fixes" it.
I really need to be able to call execute() on a scoped_session to insert data into the DB using raw SQL. Any advice?

It has been a while since I mixed raw sql with sqlalchemy, but whenever you mix them, you need to be aware of what happens behind the scenes with the ORM. First, check the autocommit flag. If the zope transaction is not configured correctly, the ORM insert might be triggering a commit.
Actually, after looking at the zope docs, it seems manual execute statements need an extra step. From their readme:
By default, zope.sqlalchemy puts sessions in an 'active' state when they are
first used. ORM write operations automatically move the session into a
'changed' state. This avoids unnecessary database commits. Sometimes it
is necessary to interact with the database directly through SQL. It is not
possible to guess whether such an operation is a read or a write. Therefore we
must manually mark the session as changed when manual SQL statements write
to the DB.
>>> session = Session()
>>> conn = session.connection()
>>> users = Base.metadata.tables['test_users']
>>> conn.execute(users.update(users.c.name=='bob'), name='ben')
<sqlalchemy.engine...ResultProxy object at ...>
>>> from zope.sqlalchemy import mark_changed
>>> mark_changed(session)
>>> transaction.commit()
>>> session = Session()
>>> str(session.query(User).all()[0].name)
'ben'
>>> transaction.abort()
It seems you aren't doing that, and so the transaction.commit does nothing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can Python Observe Changes to Mongodb's Oplog - python

I have multiple Python scripts writing to Mongodb using pyMongo. How can another Python script observe changes to a Mongo query and perform some function when the change occurs? mongodb is setup with oplog enabled.

I had the same issue. I put together this rescommunes/oplog.py. Check comments and see main for an example of how you could use it with your script.

Related

'SQLTable' object has no attribute 'insert_statement'

Using SQL Server file streaming in Python

Raise StopIteration - can't use a tailable cursor on my MongoDB

Get all documents of a collection using Pymongo

SQLAlchemy, scoped_session - raw SQL INSERT doesn't write to DB

Categories

Resources