mongoengine how to use a variable as a field? - python

How do you use a variable as a field to find a specific value within a mongodb database? I am trying to target a list inside of a list in my collection making use of an integer variable that is grabbed from wtforms. When I try to input the variable I get an error message of:
mongoengine.errors.OperationError: Update failed (Cannot create field 'studentSlot' in element {Attendance: [ [] ]})
the code segment:
if request.method == "POST":
if "form" in request.form and form.validate_on_submit():
klass = classes.objects(id = form.classid.data).first()
studentSlot = form.slotid.data
if klass:
studentSlot = form.slotid.data
klass.update(push__Attendance__studentSlot__1 = form.attended.data)
example document:
{
"_id" : ObjectId("637545927a45e617da1cba8e"),
"CourseName" : "Themes in U.S. History",
"Prerequiste" : [
],
"MinimumRequirement" : [
],
"Description" : "The course is an introduction to major issues in the history of the United States, from colonial times to the twentieth\r\ncentury. Topics may include: the origins of slavery and racism; industrialization and the growth of cities and suburbs;\r\nthe growth of the American empire; movements for social change.",
"RoomID" : NumberInt(31),
"BuildingID" : NumberInt(2),
"Type" : "Undergraduate",
"TimeSlot" : NumberInt(3),
"Credits" : NumberInt(4),
"MaxCapacity" : NumberInt(30),
"Day" : "Tuesday & Thursday",
"Professor" : ObjectId("638be8c63d5b3e062b56f8ea"),
"Enrolled" : [
ObjectId("6376b19ef448207c0a721245")
],
"Department" : "American Studies",
"Attendance" : [
[
]
],
"Grades" : [
],
"Year" : NumberInt(2022),
"Season" : "Spring",
"StartDate" : ISODate("2022-01-26T00:00:00.000+0000"),
"EndDate" : ISODate("2022-05-20T00:00:00.000+0000"),
"Crn" : "AS536",
"AttendanceDate" : [
]
}
If studentSlot in the last line is replaced with 0 (for example), it works perfectly fine. This would be suitable if I didn't need to grab the slotid number from the user.
So put simply how do I make the studentSlot variable be read as the variable as opposed to being read as a field that doesn't exist?
Any help is appreciated.

Related

Delete all documents returned in a find().limit()

I'm using db.collection.find({}, {'_id': False}).limit(2000) to get the documents from a collection. This documents are sent to a Facebook API, after the API return success this documents need to be deleted from the collection.
My main doubt is:
Is there a way to I delete all this 2000 documents withou using a for
loop? I know that collection.find returns a cursor, is there a way
to use this cursor in a delete_many?
The structure of my document is:
{
"_id" : ObjectId("61608068887f1a0e2162d94b"),
"event_time" : "1632582893",
"value" : "549.9000",
"contents" : [
{
"product_id" : "1-1",
"quantity" : "1.000000",
"value" : "10"
}
]
}
To solve this problem, based on the comments of #adarsh and #J.F I've used the following code:
rm = [x['_id'] for x in MongoDB(mongo).db.get_collection("DataToSend").find({}, {'_id' : 1}).limit(2000)
MongoDB(mongo).db.get_collection("DataToSend").delete_many({'_id' : { '$in' : list(rm)}})

Find a subdocument in array PyMongo

I want to query what comments have been made by any User about machine learning book between '2020-03-15' and '2020-04-25', ordered the comments from the most recent to the least recent.
Here is my document.
lib_books = db.lib_books
document_book1 = ({
"bookid" : "99051fe9-6a9c-46c2-b949-38ef78858dd0",
"title" : "Machine learning",
"author" : "Tom Michael",
"date_of_first_publication" : "2000-10-02",
"number_of_pages" : 414,
"publisher" : "New York : McGraw-Hill",
"topics" : ["Machine learning", "Computer algorithms"],
"checkout_list" : [
{
"time_checked_out" : "2020-03-20 09:11:22",
"userid" : "ef1234",
"comments" : [
{
"comment1" : "I just finished it and it is worth learning!",
"time_commented" : "2020-04-01 10:35:13"
},
{
"comment2" : "Some cases are a little bit outdated.",
"time_commented" : "2020-03-25 13:19:13"
},
{
"comment3" : "Can't wait to learning it!!!",
"time_commented" : "2020-03-21 08:21:42"
}]
},
{
"time_checked_out" : "2020-03-04 16:18:02",
"userid" : "ab1234",
"comments" : [
{
"comment1" : "The book is a little bit difficult but worth reading.",
"time_commented" : "2020-03-20 12:18:02"
},
{
"comment2" : "It's hard and takes a lot of time to understand",
"time_commented" : "2020-03-15 11:22:42"
},
{
"comment3" : "I just start reading, the principle of model is well explained.",
"time_commented" : "2020-03-05 09:11:42"
}]
}]
})
I tried this code, but it returns nothing.
query_test = lib_books.find({"bookid": "99051fe9-6a9c-46c2-b949-38ef78858dd0", "checkout_list.comments.time_commented" : {"$gte" : "2020-03-20", "$lte" : "2020-04-20"}})
for x in query_test:
print(x)
Can you try this
pipeline = [{'$match':{'bookid':"99051fe9-6a9c-46c2-b949-38ef78858dd0"}},//bookid filter
{'$unwind':'$checkout_list'},
{'$unwind':'$checkout_list.comments'},
{'$match':{'checkout_list.comments.time_commented':{"$gte" : "2020-03-20", "$lte" : "2020-04-20"}}},
{'$project':{'_id':0,'bookid':1,'title':1,'comment':'$checkout_list.comments'}},
{'$sort':{'checkout_list.comments.time_commented':-1}}]
query_test = lib_books.aggregate(pipeline)
#{"bookid": "99051fe9-6a9c-46c2-b949-38ef78858dd0", "checkout_list.comments.time_commented" : {"$gte" : "2020-03-20", "$lte" : "2020-04-20"}})
for x in query_test:
print(x)
I would recommend that you maintain comment field as one name, rather than keeping it as 'comment1', 'comment2', etc. If the field had been 'comment', it can be brought to the root itself
Aggregate can be modified as below
pipeline = [{'$match':{'bookid':"99051fe9-6a9c-46c2-b949-38ef78858dd0"}},//bookid filter
{'$unwind':'$checkout_list'},
{'$unwind':'$checkout_list.comments'},
{'$match':{'checkout_list.comments.time_commented':{"$gte" : "2020-03-20", "$lte" : "2020-04-20"}}},
{'$project':{'_id':0,'bookid':1,'title':1,'comment':'$checkout_list.comments.comment','time_commented':'$checkout_list.comments.time_commented'}},
{'$sort':{'time_commented':-1}}]
MongoDB Query, in case if required
db.books.aggregate([
{$match:{'bookid':"99051fe9-6a9c-46c2-b949-38ef78858dd0"}},//bookid filter
{$unwind:'$checkout_list'},
{$unwind:'$checkout_list.comments'},
{$match:{'checkout_list.comments.time_commented':{"$gte" : "2020-03-20", "$lte" : "2020-04-20"}}},
{$project:{_id:0,bookid:1,title:1,comment:'$checkout_list.comments.comment',time_commented:'$checkout_list.comments.time_commented'}},
{$sort:{'time_commented':-1}}
])
if there are multiple documents that you need to search, then you can use $in condition.
{$match:{'bookid':{$in:["99051fe9-6a9c-46c2-b949-38ef78858dd0","99051fe9-6a9c-46c2-b949-38ef78858dd1"]}}},//bookid filter

Python parsing json file to access values returning TypeError

I am using python to parse a json file full of url data to try and build a url reputation classifier. There are around 2,000 entries in the json file and not all of them have all of the fields present. A typical entry looks like this:
[
{
"host_len" : 12,
"fragment" : null,
"url_len" : 84,
"default_port" : 80,
"domain_age_days" : "5621",
"tld" : "com",
"num_domain_tokens" : 3,
"ips" : [
{
"geo" : "CN",
"ip" : "115.236.98.124",
"type" : "A"
}
],
"malicious_url" : 0,
"url" : "http://www.oppo.com/?utm_source=WeiBo&utm_medium=OPPO&utm_campaign=DailyFlow",
"alexa_rank" : "25523",
"query" : "utm_source=WeiBo&utm_medium=OPPO&utm_campaign=DailyFlow",
"file_extension" : null,
"registered_domain" : "oppo.com",
"scheme" : "http",
"path" : "/",
"path_len" : 1,
"port" : 80,
"host" : "www.oppo.com",
"domain_tokens" : [
"www",
"oppo",
"com"
],
"mxhosts" : [
{
"mxhost" : "mail1.oppo.com",
"ips" : [
{
"geo" : "CN",
"ip" : "121.12.164.123",
"type" : "A"
}
]
}
],
"path_tokens" : [
""
],
"num_path_tokens" : 1
}
]
I am trying to access the data stored in the fields "ips" and "mxhosts" to compare the "geo" location. To try and access the first "ips" field I'm using:
corpus = open(file)
urldata = json.load(corpus, encoding="latin1")
for record in urldata:
print record["ips"][0]["geo"]
But as I mentioned not all of the json entries have all of the fields. "ips" is always present but sometimes it's "null" and the same goes for "geo". I'm trying to check for the data before accessing it using:
if(record["ips"] is not None and record["ips"][0]["geo"] is not None):
But I this an error:
if(record["ips"] is not None and record["ips"][0]["geo"] is not None):
TypeError: string indices must be integers
When I try to check it using this:
if("ips" in record):
I get this error message:
print record["ips"][0]["geo"]
TypeError: 'NoneType' object has no attribute '__getitem__'
So I'm not sure how to check if the record I'm trying to access exists before I access it, or if I'm even accessing in the most correct way. Thanks.
You can simply check if record["ips"] is not None, or more simply if it's True, before proceeding to access it as a list; otherwise you would be calling a list method on a None object.
for record in urldata:
if record["ips"]:
print record["ips"][0]["geo"]
So it ended up being a little convoluted due to the inconsistent nature of the json file, but I had to end up first checking that "ips" was not null and then checking that "geo" was present in record["ips"][0]. This is what it looks like:
if(record["ips"] is not None and "geo" in record["ips"][0]):
print record["ips"][0]["geo"]
Thanks for the feedback everyone!

MongoDB generating same ID between inserts

I am using pymongo and I am trying to insert dicts into mongodb database. My dictionaries look like this
{
"name" : "abc",
"Jobs" : [
{
"position" : "Systems Engineer (Data Analyst)",
"time" : [
"October 2014",
"May 2015"
],
"is_current" : 1,
"location" : "xyz",
"organization" : "xyz"
},
{
"position" : "Systems Engineer (MDM Support Lead)",
"time" : [
"January 2014",
"October 2014"
],
"is_current" : 1,
"location" : "xxx",
"organization" : "xxx"
},
{
"position" : "Asst. Systems Engineer (ETL Support Executive)",
"time" : [
"May 2012",
"December 2013"
],
"is_current" : 1,
"location" : "zzz",
"organization" : "xzx"
},
],
"location" : "Buffalo, New York",
"education" : [
{
"school" : "State University of New York at Buffalo - School of Management",
"major" : "Management Information Systems, General",
"degree" : "Master of Science (MS), "
},
{
"school" : "Rajiv Gandhi Prodyogiki Vishwavidyalaya",
"major" : "Electrical and Electronics Engineering",
"degree" : "Bachelor of Engineering (B.E.), "
}
],
"id" : "abc123",
"profile_link" : "example.com",
"html_source" : "<html> some_source_code </html>"
}
I am getting this error:
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index:
Linkedin_DB.employee_info.$id dup key: { :
ObjectId('56b64f6071c54604f02510a8') }
When I run my program 1st document gets inserted properly but when I insert the second document I get this error. When I start my script again the document which was not inserted because of this error get inserted properly and error comes for next document and this continues.
Clearly mognodb is using the same objecID during two inserts. I don't understand why mongodb is failing to generate a unique ID for new documents.
My code to save passed data:
class Mongosave:
"""
Pass collection_name and dict data
This module stores the passed dict in collection
"""
def __init__(self):
self.connection = pymongo.MongoClient()
self.db = self.connection['Linkedin_DB']
def _exists(self, id):
#To check if user alredy exists
return True if list(self.collection.find({'id': id})) else False
def save(self, collection_name, data):
self.collection = self.db[collection_name]
if not self._exists(data['id']):
print (data['id'])
self.collection.insert(data)
else:
self.collection.update({'id':data['id']}, {"$set": data})
I can figure out why this is happening. Any help is appreciated.
The problem is that your save method is using a field called "id" to decide if it should do an insert or an upsert. You want to use "_id" instead. You can read about the _id field and index here. PyMongo automatically adds an _id to you document if one is not already present. You can read more about that here.
You might have inserted two copies of the same document into your collection in one run.
I cannot quite understand what do you mean by:
When I start my script again the document which was not inserted because of this error get inserted properly and error comes for next document and this continues.
What I do know is if you do:
from pymongo import MongoClient
client = MongoClient()
db = client['someDB']
collection = db['someCollection']
someDocument = {'x': 1}
for i in range(10):
collection.insert_one(someDocument)
You'll get a:
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index:
This make me think although pymongo would generate a unique _id for you if you don't provide one, it is not guaranteed to be unique, especially if the document provided is not unique. Presumably pymongo is using some sort of hash algorithm on what you insert for their auto-gen _id without changing the seed.
Try generate your own _id and see if it would happen again.
Edit:
I just tried this and it works:
for i in range(10):
collection.insert_one({'x':1})
This make me think the way pymongo generates _id is associated with the object you feed into it, this time I'm not referencing to the same object anymore and the problem disappeared.
Are you giving your database two references of a same object?

How to Query this in MongoDB?

My items store in MongoDB like this :
{"ProductName":"XXXX",
"Catalogs" : [
{
"50008064" : "Apple"
},
{
"50010566" : "Box"
},
{
"50016422" : "Water"
}
]}
Now I want query all the items belong to Catalog:50008064,how to?
(the catalog id "50008064" , catalog name "Apple")
You cannot query this in an efficient manner and performance will decrease as your data grows. As such I would consider it a schema bug and you should refactor/migrate to the following model which does allow for indexing :
{"ProductName":"XXXX",
"Catalogs" : [
{
id : "50008064",
value : "Apple"
},
{
id : "50010566",
value : "Box"
},
{
id : "50016422",
value : "Water"
}
]}
And then index :
ensureIndex({'Catalogs.id':1})
Again, I strongly suggest you change your schema as this is a potential performance bottleneck you cannot fix any other way.
This should probably work according to the entry here, although this won't be very fast, as stated in in the link.
db.products.find({ "Catalogs.50008064" : { $exists: true } } )

Categories

Resources