I have a Large Mongo DB with raw web scraping data. I have a process that reads the mongo docs and creates records in my MySQL reporting DB. I need to track the documents that I have processed in the MongoDB. I am trying to use the ObjectID but can't seem to convert it to a string. I am using Pymongo as my client.
for i in Coll.find({"ISBN": {"$exists" : True}})[20:50]:
print('starting collection loop')
#Check if doc has been processed
if not ProcessingLog.objects.filter(mongoID = i['_id']).exists():
mongoID = ProcessingLog(mongoID = i['_id'],source = 'amazon',createDate= datetime.datetime.now())
....
I get the following error
ValueError: too many values to unpack
pymongo includes methods to work with ObjectId() as other type.
You can see what they are in the docs here. You can probably make do with just str(o).
Related
I am using python with firebase, and have a table, I need to update all items that have a specific value in a field, I found out how to query for all the items having that value here and here
Now I need to update all the items that I get in return.
What I would like to do, is to use a single query to perform a single operation that will find the correct items using the where query, and update a certain field to a certain value, I need that functionality in order to keep some data consistent.
Thanks
It is possible to do it, but in more steps:
# the usual preparation
import firebase_admin
from firebase_admin import credentials, firestore
databaseURL = {'databaseURL': "https://<YOUR-DB>.firebaseio.com"}
cred = credentials.Certificate("<YOUR-SERVICE-KEY>.json")
firebase_admin.initialize_app(cred, databaseURL)
database = firestore.client()
col_ref = database.collection('<YOUR-COLLECTION>')
# Query generator for the documents; get all people named Pepe
results = col_ref.where('name', '==', 'Pepe').get()
# Update Pepe to José
field_updates = {"name": "José"}
for item in results:
doc = col_ref.document(item.id)
doc.update(field_updates)
(I use Cloud Firestore, maybe it is different in Realtime Database)
You can iterate through the query and find the DocumentReference in each iteration.
db = firestore.client()
a= request_json['message']['a']
b= request_json['message']['b']
ref = db.collection('doc').where(a'', u'==', u'a').stream()
for i in ref:
i.reference.update({"c":"c"}) # get the document reference and use it to update\delete ...
I have a MongoDB database that is storing data from ROS topics that my robot is logging. I am trying to print the data in MongoDB by using the following python script:
from pymongo import MongoClient
client = MongoClient('cpr-j100-0101', 62345)
db1 = client.front_scan
db2 = client.cmd_vel
db3 = client.odometry_filtered
print db1
print db2
print db3
but I dont get the result I want when I run this script. I have attached the result of running this script as an image. Instead of this, I would like to actually be able to access the data within mongoDB.enter image description here
You can't print a database before accessing it. First, you need to select what database you need to print. For an example let's think you have 2 collections in db1 as coll1 and coll2. By printing the database means you are going to print the documents of the collections that are in the database.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client.myDatabase
#my dummy database is myDatabase.
coll1 = db.coll1 #selecting the coll1 in myDatabase
for document in coll1.find():
print (document)
so from the above code, you can print all the documents in the coll1 collection of myDatabase. The same manner you can print the databases one by one.
With this script you actually don't do much. You just create three databases and that's basically it. You never insert data, or read data from the database. You're just printing the database object.
I believe, that MongoDB Manual should be useful...
What is the proper way of moving a number of records from one collection to another. I have come across several other SO posts such as this which deal with achieve the same goal but none have a python implementation.
#Taking a number of records from one database and returning the cursor
cursor_excess_new = db.test_collection_new.find().sort([("_id", 1)]).limit(excess_num)
# db.test.insert_many(doc for doc in cursor_excess_new).inserted_ids
# Iterating cursor and Trying to write to another database
# for doc in cursor_excess_new:
# db.test_collection_old.insert_one(doc)
result = db.test_collection_old.bulk_write([
for doc in cursor_excess_new:
InsertMany(doc for each doc in cur)
pprint(doc)
])
If I use insert_many, I get the following error: pymongo.errors.OperationFailure: Writes to config servers must have batch size of 1, found 10
bulk_write is giving me a syntax error at the start of for loop.
What is the best practice and correct way of transferring records from one collection to another in pymongo so that it is atomic?
Collection.bulk_write accepts as argument an iterable of query operations.
pymongo has pymongo.operations.InsertOne operation not InsertMany.
For your situation, you can build a list of InsertOne operations for each document in the source collection. Then do a bulk_write on the destination using the built-up list of operations.
from pymongo import InsertOne
...
cursor_excess_new = (
db.test_collection_new
.find()
.sort([("_id", 1)])
.limit(excess_num)
)
queries = [InsertOne(doc) for doc in cursor_excess_new]
db.test_collection_old.bulk_write(queries)
You don't need "for loop".
myList=list(collection1.find({}))
collection2.insert_many(myList)
collection1.delete_many({})
If you need to filter it you can use the following code:
myList=list(collection1.find({'status':10}))
collection2.insert_many(myList)
collection1.delete_many({'status':10})
But be careful, because it has no warranty to move successfully so you need to control transactions. If you're going to use the following code you should consider that your MongoDb shouldn't be Standalone and you need to active Replication and have another instance.
with myClient.start_session() as mySession:
with mySession.start_transaction():
...yourcode...
Finally, the above code has the warranty to move (insert and delete) successfully but the transaction isn't in your hand and you can't get the result of this transaction so you can use the following code to control both moving and transaction:
with myClient.start_session() as mySession:
mySession.start_transaction()
try:
...yourcode...
mySession.commit_transaction()
print("Done")
except Exception as e:
mySession.abort_transaction()
print("Failed",e)
Never used PyMongo so I'm new to this stuff. I want to be able to save one of my lists to MongoDB. For example, I have a list imageIds = ["zw8SeIUW", "f28BYZ"], which is appended to frequently. After each append, the list imageIds should be saved to the database.
import pymongo
from pymongo import MongoClient
db = client.databaseForImages
and then later
imageIds.append(data)
db.databaseForImages.save(imageIds)
Why doesn't this work? What is the solution?
First, if you don't know what a python dict is, I recommend brushing up on Python fundamentals . Check out Google's Python Class or Learn Python the Hard Way. Otherwise, you will be back here every 10 minutes with a new question...
Now, you have to connect to the mongoDB server/instance:
client = MongoClient('hostname', port_number)
Connect to a database:
db = client.imagedb
Then save the record to the collection "image_data".
record = {'image_ids': imageIds}
db.image_data.save(record)
Using save(), the record dict is updated with an '_id' field which now points to the record in this collection. To update it with a new appended imageIds:
record['image_ids'] = imageIds # Already contains the original _id
db.image_data.save(record)
I have what is likely an easy question. I'm trying to pull a JSON from an online source, and store it in a SQLite table. In addition to storing the data in a rich table, corresponding to the many fields in the JSON, I would like to also just dump the entire JSON into a table every time it is pulled.
The table looks like:
CREATE TABLE Raw_JSONs (ID INTEGER PRIMARY KEY ASC, T DATE DEFAULT (datetime('now','localtime')), JSON text);
I've pulled a JSON from some URL using the following python code:
from pyquery import PyQuery
from lxml import etree
import urllib
x = PyQuery(url='json')
y = x('p').text()
Now, I'd like to execute the following INSERT command:
import sqlite3
db = sqlite3.connect('a.db')
c = db.cursor()
c.execute("insert into Raw_JSONs values(NULL,DATETIME('now'),?)", y)
But I'm told that I've supplied the incorrect number bindings (i.e. thousands, instead of just 1). I gather it's reading the y variable as all the different elements of the JSON.
Can someone help me store just the JSON, in it's entirety?
Also, as I'm obviously new to this JSON game, any online resources to recommend would be amazing.
Thanks!
.execute() expects a sequence, better give it a one-element tuple:
c.execute("insert into Raw_JSONs values(NULL,DATETIME('now'),?)", (y,))
A Python string is a sequence too, one of individual characters. So the .execute() call tried to treat each separate character as a parameter for your query, and unless your string is one character short that means it'll not provide the right number of parameters.
Don't forget to commit your inserts:
db.commit()
or use the database connection as a context manager:
with db:
# inserts executed here will automatically commit if no exceptions are raised.
You may also be interested to know about the built in sqlite modules adapters. These can convert any python object to an sqlite column both ways. See the standard documentation and the adapters section.