how to update entire object without changing the id in pymongo? - python

I am trying to update all properties of the record/object which is stored in MongoDB, now I am trying to do like this.
Deleted the object, but keeping the ID of the object being deleted.
Create a new object with the same ID which I have deleted.
Is it correct ? or What is they to do above objective using pymongo ?
mongo_object = {
_id : 123,
prop_key_1: some_value,
// ... many present
prop_key_n: some_value,
}
def delete(record):
doc = get_db().reviews.delete_many({"id" : record["_id"]})
print(doc.deleted_count)
# all key values are changed, mongo_object is changed except the id.
delete(mongo_object)
db.collection_name.insert_one(mongo_object)
But above code is not deleting the object, the doc.deleted_count is 0.

db.collection_name.update_one({"_id" : record["_id"]}, new_data}
just use update without $set , the document will get replaced completely without changing the _id

from bson.objectid import ObjectId
def replace_one(record):
result = client.test_db.test_collection.replace_one({"_id":ObjectId(record["_id"])}, record,upsert=True)
print(result.matched_count)
What is the correct way to query MongoDB for _id using string by using Python?
Pymongo doc - http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.replace_one

Related

How get Id when do upsert with flask_pymongo?

I want to get the document id, when I do and upsert, currently flask-pymongo only returns object Id when the document is inserted but not when is updated.
I am using the following code:
a = mongo.db.abcd.update_one(
{'abcd': 'abcd1'}, {"$set": {"abcd": "abcd2"}}, upsert=True)
for value in a.raw_result.items():
print(value)
There are any way to return the id?
Thanks
update_one() returns a instance of UpdateResult (https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.update_one) and UpdateResult have a property upserted_id.
The documentation say: The _id of the inserted document if an upsert took place. Otherwise None.
https://pymongo.readthedocs.io/en/stable/api/pymongo/results.html#pymongo.results.UpdateResult.upserted_id
Looks like that is what you need

How to prevent python from inserting duplicates into mongodb?

I have json data stored in a variable that gets inserted into a MongoDB, once per day with Python. But the json data in the variable often does not change, but it still get's inserted into the MongoDB ... which creates masses of duplicates of the same entries.
Every entry in the json data variable each has 1 unique key: uuid.
How do you prevent Python from inserting duplicates into MongoDB? I looked into db.collection.update(), but I'm not sure if its suitable and I don't know how to use it with a variable?
As long as you can check its id for uniqueness, you can use the method update_one() and set upsert for that.
For example,
filter_data = {'uuid': '111'}
new_data = {'$set': {'new_value': 25}}
db.collection.update_one(filter_data, new_data, upsert=True)
This will check if uuid = '111' exists, if not it will create a document; otherwise, update it.

solve E11000 duplicate key error collection: _id_ dup key in pymongo

I am trying to insert a great number of document(+1M) using a bulk_write instruction. In order to do that, I create a list of InsertOne function.
python version = 3.7.4
pymongo version = 3.8.0
Document creation:
document = {
'dictionary': ObjectId(dictionary_id),
'price': price,
'source': source,
'promo': promo,
'date': now_utc,
'updatedAt': now_utc,
'createdAt:': now_utc
}
# add line to debug
if '_id' in document.keys():
print(document)
return document
I create the full list of document by adding a new field from a list of elements and create the query by using InsertOne
bulk = []
for element in list_elements:
for document in documents:
document['new_field'] = element
# add line to debug
if '_id' in document.keys():
print(document)
insert = InsertOne(document)
bulk.append(insert)
return bulk
I do the insert by using bulk_write command
collection.bulk_write(bulk, ordered=False)
I attach the documentation https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.bulk_write
According to the documentation,the _id field is added automatically
Parameter - document: The document to insert. If the document is missing an _id field one will be added.
And somehow it seems that is doing it wrong because some of them have the same value.
Receiving this error(with differents _id of course) for 700k of the 1M documents
'E11000 duplicate key error collection: database.collection index: _id_ dup key: { _id: ObjectId(\'5f5fccb4b6f2a4ede9f6df62\') }'
Seems a bug to me from pymongo, because I used this approach in many situations but I didn't with such size of documents
The _id field has to be unique for sure, but, due to this is done automatically by pymongo, I don't know how to approach to this problem, perhaps using a UpdateOne with upsert True with an impossible filter and hope for the best.
I would appreciate any solution or work around for this problem
It seems that as I was adding the new field of the document and append it into the list, I created similar instances of the same element, so I had the same queries len(list_elements) times and that is why I had the duplicated key error.
to solve the problem, I append to the list a copy of the document
bulk.append(document.copy())
and then create the queries with that list
I would like to thank #Belly Buster for his help in the issue
If any of the documents from your code snippet already contain an _id, a new one won't be added, and you run the risk of getting a duplicate error as you have observed.

ObjectID generated by server on pymongo

I am using pymongo (python module for mongodb).
I want the ObjectID to be created automatically by the server, however it seems to be created by pymongo itself when we don't specify it.
The problem it raises is that I use ObjectID to sort by time (by just sorting by the _id field). However it seems that it is using the time set on each computer so we cannot truly rely on it.
Any idea on how to solve this problem?
If you call save and pass it a document without an _id field, you can force the server to add the _id instead of the client by setting the (enigmatically-named) manipulate option to False:
coll.save({'foo': 'bar'}, manipulate=False)
I'm not Python user but I'm afraid there's no way to generate _id by server. For performance reasons _id is always generated by driver thus when you insert a document you don't need to do another query to get the _id back.
Here's a possible way you can do it by generating a int sequence _id, just like the IDENTITY ID of SqlServer. To do this, you need to keep a record in you certain collection for example in my project there's a seed, which has only one record:
{_id: ObjectId("..."), seqNo: 1 }
The trick is, you have to use findAndModify to keep the find and modify in the same "transaction".
var idSeed = db.seed.findAndModify({
query: {},
sort: {seqNo: 1},
update: { $inc: { seqNo: 1 } },
new: false
});
var id = idSeed.seqNo;
This way you'll have all you instances get a unique sequence# and you can use it to sort the records.

How to set _id to be 32bit Integer?

I am currently pulling data into MongoDB, and will later need to pull this data into a separate application. This application has a requirement for the _id field to be a 32bit integer.
Be sure to explicitly set the _id attribute in the result document to unique 32 bit integers.
source
I am making use of pymongo to insert documents into a collection.
def parse_tweet(in_t):
t = {}
t["text"] = in_t["text"]
t["shape"] = in_t["coordinates"]["coordinates"][0], in_t["coordinates"]["coordinates"][1]
return t
This gives me the expected documents:
{
"_id" : ObjectId("50a0de04f26afb14f4bba03d"),
"text" : "hello world",
"shape" : [144.9557834, -37.8208589],
}
How can I explicitly set the _id value to be a 32bit integer?
I don't intend on storing more than 6 million documents.
Just generate an id and pass it along. Id can be anything (except for array).
def parse_tweet(in_t):
t = {}
t["_id"] = get_me_an_int32_id
t["text"] = in_t["text"]
t["shape"] = in_t["coordinates"]["coordinates"][0], in_t["coordinates"]["coordinates"][1]
return t
You will have to take care of its uniqueness yourself. MongoDB will only ensure that you don't store duplicate values. But where you get unique values - that's your problem.
Here are some ideas: How to make an Autoincrementing field.

Categories

Resources