Need help in understanding what is happening here and a suggestion to avoid this!
Here is my snippet:
result = [list of dictionary objects(dictionary objects have 2 keys and 2 String values)]
copyResults = list(results);
## Here I try to insert each Dict into MongoDB (Using PyMongo)
for item in copyResults:
dbcollection.save(item) # This is all saving fine in MongoDB.
But when I loop thru that original result list again it shows dictionary objects with a new field added
automatically which is ObjectId from MongoDB!
Later in code I need to transform that original result list to json but this ObjectId is causing issues.No clue why this is getting added to original list.
I have already tried copy or creating new list etc. It still adds up ObjectId in the original list after saving.
Please suggest!
every document saved in mongodb requires '_id' field - which has to be unique among documents in the collection. if you don't provide one, mongodb will automatically create one with ObjectId (bson.objectid.ObjectId for pymongo)
If you need to export documents to json, you have to pop '_id' field before jsonifying it.
Or you could use:
rows['_id'] = str(rows['_id'])
Remember to set it back if you then need to update
Related
I have json data stored in a variable that gets inserted into a MongoDB, once per day with Python. But the json data in the variable often does not change, but it still get's inserted into the MongoDB ... which creates masses of duplicates of the same entries.
Every entry in the json data variable each has 1 unique key: uuid.
How do you prevent Python from inserting duplicates into MongoDB? I looked into db.collection.update(), but I'm not sure if its suitable and I don't know how to use it with a variable?
As long as you can check its id for uniqueness, you can use the method update_one() and set upsert for that.
For example,
filter_data = {'uuid': '111'}
new_data = {'$set': {'new_value': 25}}
db.collection.update_one(filter_data, new_data, upsert=True)
This will check if uuid = '111' exists, if not it will create a document; otherwise, update it.
I am trying to insert a great number of document(+1M) using a bulk_write instruction. In order to do that, I create a list of InsertOne function.
python version = 3.7.4
pymongo version = 3.8.0
Document creation:
document = {
'dictionary': ObjectId(dictionary_id),
'price': price,
'source': source,
'promo': promo,
'date': now_utc,
'updatedAt': now_utc,
'createdAt:': now_utc
}
# add line to debug
if '_id' in document.keys():
print(document)
return document
I create the full list of document by adding a new field from a list of elements and create the query by using InsertOne
bulk = []
for element in list_elements:
for document in documents:
document['new_field'] = element
# add line to debug
if '_id' in document.keys():
print(document)
insert = InsertOne(document)
bulk.append(insert)
return bulk
I do the insert by using bulk_write command
collection.bulk_write(bulk, ordered=False)
I attach the documentation https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.bulk_write
According to the documentation,the _id field is added automatically
Parameter - document: The document to insert. If the document is missing an _id field one will be added.
And somehow it seems that is doing it wrong because some of them have the same value.
Receiving this error(with differents _id of course) for 700k of the 1M documents
'E11000 duplicate key error collection: database.collection index: _id_ dup key: { _id: ObjectId(\'5f5fccb4b6f2a4ede9f6df62\') }'
Seems a bug to me from pymongo, because I used this approach in many situations but I didn't with such size of documents
The _id field has to be unique for sure, but, due to this is done automatically by pymongo, I don't know how to approach to this problem, perhaps using a UpdateOne with upsert True with an impossible filter and hope for the best.
I would appreciate any solution or work around for this problem
It seems that as I was adding the new field of the document and append it into the list, I created similar instances of the same element, so I had the same queries len(list_elements) times and that is why I had the duplicated key error.
to solve the problem, I append to the list a copy of the document
bulk.append(document.copy())
and then create the queries with that list
I would like to thank #Belly Buster for his help in the issue
If any of the documents from your code snippet already contain an _id, a new one won't be added, and you run the risk of getting a duplicate error as you have observed.
I am new to Django and am having some issues writing to a MySQL dB. What I am trying to do is loop through nested JSON data, and dynamically create a string to be used with a save() method.
I've looped through my nested JSON data and successfully created a string that contains the data I want to save in a single row to the MySQL table "mysqltable":
q = "station_id='thisid',stall_id='thisstaull',source='source',target='test'"
I then try to save this to the table in MySQL:
b = mysqltable(q)
b.save()
But I am getting the error:
TypeError: int() argument must be a string or a number, not 'mysqltable'
What I think is happening is that it doesn't like the fact I have created a string to use in b = mysqltable(q). When I just write out the statement like the below it works fine:
q = mysqltable(station_id='thisid',stall_id='thisstaull',source='source',target='test')
q.save()
But I am not sure how to take that string and make it available to use with b.save(). Any help would be greatly appreciated!
Instead string, create a dictionary, and then pass it directly to mysqltable:
mysqltable(**dictWithData)
Of course you can re-parse string onto dictionary, but this is useless work...
Never used PyMongo so I'm new to this stuff. I want to be able to save one of my lists to MongoDB. For example, I have a list imageIds = ["zw8SeIUW", "f28BYZ"], which is appended to frequently. After each append, the list imageIds should be saved to the database.
import pymongo
from pymongo import MongoClient
db = client.databaseForImages
and then later
imageIds.append(data)
db.databaseForImages.save(imageIds)
Why doesn't this work? What is the solution?
First, if you don't know what a python dict is, I recommend brushing up on Python fundamentals . Check out Google's Python Class or Learn Python the Hard Way. Otherwise, you will be back here every 10 minutes with a new question...
Now, you have to connect to the mongoDB server/instance:
client = MongoClient('hostname', port_number)
Connect to a database:
db = client.imagedb
Then save the record to the collection "image_data".
record = {'image_ids': imageIds}
db.image_data.save(record)
Using save(), the record dict is updated with an '_id' field which now points to the record in this collection. To update it with a new appended imageIds:
record['image_ids'] = imageIds # Already contains the original _id
db.image_data.save(record)
I am using PyMongo to insert data (title, description, phone_number ...) into MongoDB. However, when I use mongo client to view the data, it displays the properties in a strange order. Specifically, phone_number property is displayed first, followed by title and then comes description. Is there some way I can force a particular order?
The above question and answer are quite old. Anyhow, if somebody visits this I feel like I should add:
This answer is completely wrong. Actually in Mongo Documents ARE ordered key-value pairs. However when using pymongo it will use python dicts for documents which indeed are not ordered (as of cpython 3.6 python dicts retain order, however this is considered an implementation detail). But this is a limitation of the pymongo driver.
Be aware, that this limitation actually impacts the usability. If you query the db for a subdocument it will only match if the order of the key-values pairs is correct.
Just try the following code yourself:
from pymongo import MongoClient
db = MongoClient().testdb
col = db.testcol
subdoc = {
'field1': 1,
'field2': 2,
'filed3': 3
}
document = {
'subdoc': subdoc
}
col.insert_one(document)
print(col.find({'subdoc': subdoc}).count())
Each time this code gets executed the 'same' document is added to the collection. Thus, each time we run this code snippet the printed value 'should' increase by one. It does not because find only maches subdocuemnts with the correct ordering but python dicts just insert the subdoc in arbitrary order.
see the following answer how to use ordered dict to overcome this: https://stackoverflow.com/a/30787769/4273834
Original answer (2013):
MongoDB documents are BSON objects, unordered dictionaries of key-value pairs. So, you can't rely on or set a specific fields order. The only thing you can operate is which fields to display and which not to, see docs on find's projection argument.
Also see related questions on SO:
MongoDB field order and document position change after update
Can MongoDB and its drivers preserve the ordering of document elements
Ordering fields from find query with projection
Hope that helps.