Please apologize me if my question is naive, I am new to python and I am trying my hand on using the collections using pymongo. I have tried to extract the names using
collects = db.collection_names(); #This returns a list with names of collections
But when I tried to get the cursor using
cursor = db.collects[1].find(); #This returns a cursor which has no reference to a collection.
I understand that the above code uses a string instead of an object. So, I was wondering how I could accomplish this task of retaining a cursor for each collection in the DB, which I can use later to perform operations of search and update etc.
If you are using the pymongo driver you must use the get_collection method or a dict-style lookups instead. Also you may want to set the include_system_collections to False in collection_names so you don't include system collections (e.g system.indexes)
import pymongo
client = pymongo.MongoClient()
db = client.db
collects = db.collection_names(include_system_collections=False)
cursor = db.get_collection(collects[1]).find()
or
cursor = db[collects[1]].find()
Sry, i can't create a comment yet, but have you tried?:
cursor = db.getCollection(collects[1]).find();
Related
I have a database of reviews and want to create a new field in my database that indicates whether a review contains words relating to "pool".
import re
import pandas as pd
from pymongo import MongoClient
client = MongoClient()
db = client.Hotels_Copenhagen
collection = db.get_collection("hotel_review_table")
data = pd.DataFrame(list(collection.find()))
def common_member(a, b):
a_set = set(a)
b_set = set(b)
if a_set & b_set:
return True
else:
return False
pool_set = {"pool","swim","swimming"}
for single_review in data.review_text:
make_it_lowercase = str(single_review).lower()
tokenize_it = re.split("\s|\.|,", make_it_lowercase)
pool_mentioned = common_member(tokenize_it, pool_set)
db.hotel_review_table.update_one({}, {"$set":{"pool_mentioned": pool_mentioned}})
In python I already counted the amount of reviews containing words related to "pool" and it turns out that 1k/ 50k of my reviews talk about pools.
I solved my previously posted problem of getting the same entry everywhere by moving the db.hotel_review_table.update_one line into the loop.
Thus the main problem is solved. However, it takes quite some time to update the database like this. Is there any other way to make it faster ?
You've gone to a lot of trouble to implement a feature that is available straight out of the box in MongoDB . You need to use text indexes.
Create a text index (in MongoDB shell):
db.hotel_review_table.createIndex( { "single_review": "text" } )
Then your code distils down to:
from pymongo import MongoClient
db = MongoClient()['Hotels_Copenhagen']
for keyword in ['pool', 'swim', 'swimming']:
db.hotel_review_table.update_many({'single_review': keyword}, {'$set': {'pool_mentioned': True}})
Note this doesn't set the value to false in the case that it isn't mentioned; if this is really needed, you can write another update to set any values that aren't true to false.
I am using python with firebase, and have a table, I need to update all items that have a specific value in a field, I found out how to query for all the items having that value here and here
Now I need to update all the items that I get in return.
What I would like to do, is to use a single query to perform a single operation that will find the correct items using the where query, and update a certain field to a certain value, I need that functionality in order to keep some data consistent.
Thanks
It is possible to do it, but in more steps:
# the usual preparation
import firebase_admin
from firebase_admin import credentials, firestore
databaseURL = {'databaseURL': "https://<YOUR-DB>.firebaseio.com"}
cred = credentials.Certificate("<YOUR-SERVICE-KEY>.json")
firebase_admin.initialize_app(cred, databaseURL)
database = firestore.client()
col_ref = database.collection('<YOUR-COLLECTION>')
# Query generator for the documents; get all people named Pepe
results = col_ref.where('name', '==', 'Pepe').get()
# Update Pepe to José
field_updates = {"name": "José"}
for item in results:
doc = col_ref.document(item.id)
doc.update(field_updates)
(I use Cloud Firestore, maybe it is different in Realtime Database)
You can iterate through the query and find the DocumentReference in each iteration.
db = firestore.client()
a= request_json['message']['a']
b= request_json['message']['b']
ref = db.collection('doc').where(a'', u'==', u'a').stream()
for i in ref:
i.reference.update({"c":"c"}) # get the document reference and use it to update\delete ...
What is the proper way of moving a number of records from one collection to another. I have come across several other SO posts such as this which deal with achieve the same goal but none have a python implementation.
#Taking a number of records from one database and returning the cursor
cursor_excess_new = db.test_collection_new.find().sort([("_id", 1)]).limit(excess_num)
# db.test.insert_many(doc for doc in cursor_excess_new).inserted_ids
# Iterating cursor and Trying to write to another database
# for doc in cursor_excess_new:
# db.test_collection_old.insert_one(doc)
result = db.test_collection_old.bulk_write([
for doc in cursor_excess_new:
InsertMany(doc for each doc in cur)
pprint(doc)
])
If I use insert_many, I get the following error: pymongo.errors.OperationFailure: Writes to config servers must have batch size of 1, found 10
bulk_write is giving me a syntax error at the start of for loop.
What is the best practice and correct way of transferring records from one collection to another in pymongo so that it is atomic?
Collection.bulk_write accepts as argument an iterable of query operations.
pymongo has pymongo.operations.InsertOne operation not InsertMany.
For your situation, you can build a list of InsertOne operations for each document in the source collection. Then do a bulk_write on the destination using the built-up list of operations.
from pymongo import InsertOne
...
cursor_excess_new = (
db.test_collection_new
.find()
.sort([("_id", 1)])
.limit(excess_num)
)
queries = [InsertOne(doc) for doc in cursor_excess_new]
db.test_collection_old.bulk_write(queries)
You don't need "for loop".
myList=list(collection1.find({}))
collection2.insert_many(myList)
collection1.delete_many({})
If you need to filter it you can use the following code:
myList=list(collection1.find({'status':10}))
collection2.insert_many(myList)
collection1.delete_many({'status':10})
But be careful, because it has no warranty to move successfully so you need to control transactions. If you're going to use the following code you should consider that your MongoDb shouldn't be Standalone and you need to active Replication and have another instance.
with myClient.start_session() as mySession:
with mySession.start_transaction():
...yourcode...
Finally, the above code has the warranty to move (insert and delete) successfully but the transaction isn't in your hand and you can't get the result of this transaction so you can use the following code to control both moving and transaction:
with myClient.start_session() as mySession:
mySession.start_transaction()
try:
...yourcode...
mySession.commit_transaction()
print("Done")
except Exception as e:
mySession.abort_transaction()
print("Failed",e)
Never used PyMongo so I'm new to this stuff. I want to be able to save one of my lists to MongoDB. For example, I have a list imageIds = ["zw8SeIUW", "f28BYZ"], which is appended to frequently. After each append, the list imageIds should be saved to the database.
import pymongo
from pymongo import MongoClient
db = client.databaseForImages
and then later
imageIds.append(data)
db.databaseForImages.save(imageIds)
Why doesn't this work? What is the solution?
First, if you don't know what a python dict is, I recommend brushing up on Python fundamentals . Check out Google's Python Class or Learn Python the Hard Way. Otherwise, you will be back here every 10 minutes with a new question...
Now, you have to connect to the mongoDB server/instance:
client = MongoClient('hostname', port_number)
Connect to a database:
db = client.imagedb
Then save the record to the collection "image_data".
record = {'image_ids': imageIds}
db.image_data.save(record)
Using save(), the record dict is updated with an '_id' field which now points to the record in this collection. To update it with a new appended imageIds:
record['image_ids'] = imageIds # Already contains the original _id
db.image_data.save(record)
I have what is likely an easy question. I'm trying to pull a JSON from an online source, and store it in a SQLite table. In addition to storing the data in a rich table, corresponding to the many fields in the JSON, I would like to also just dump the entire JSON into a table every time it is pulled.
The table looks like:
CREATE TABLE Raw_JSONs (ID INTEGER PRIMARY KEY ASC, T DATE DEFAULT (datetime('now','localtime')), JSON text);
I've pulled a JSON from some URL using the following python code:
from pyquery import PyQuery
from lxml import etree
import urllib
x = PyQuery(url='json')
y = x('p').text()
Now, I'd like to execute the following INSERT command:
import sqlite3
db = sqlite3.connect('a.db')
c = db.cursor()
c.execute("insert into Raw_JSONs values(NULL,DATETIME('now'),?)", y)
But I'm told that I've supplied the incorrect number bindings (i.e. thousands, instead of just 1). I gather it's reading the y variable as all the different elements of the JSON.
Can someone help me store just the JSON, in it's entirety?
Also, as I'm obviously new to this JSON game, any online resources to recommend would be amazing.
Thanks!
.execute() expects a sequence, better give it a one-element tuple:
c.execute("insert into Raw_JSONs values(NULL,DATETIME('now'),?)", (y,))
A Python string is a sequence too, one of individual characters. So the .execute() call tried to treat each separate character as a parameter for your query, and unless your string is one character short that means it'll not provide the right number of parameters.
Don't forget to commit your inserts:
db.commit()
or use the database connection as a context manager:
with db:
# inserts executed here will automatically commit if no exceptions are raised.
You may also be interested to know about the built in sqlite modules adapters. These can convert any python object to an sqlite column both ways. See the standard documentation and the adapters section.