So I have an application that uses MongoDB as a database. The application makes use of a few collections.
When and how should I go about defining the "schema" of the database which includes setting up all the collections as well as indexes needed?
AFAIK, you are unable to define empty collections in MongoDB (correct me if I am wrong, if I can do this it will basically answer this question). Should I insert a dummy value for each collection and use that to setup all my indexes?
What is the best practice for this?
You don't create collections in MongoDB.
You just start using them immediately whether they “exist” or not.
Now to defining the “schema”. As I said, you just start using a collection, so, if you need to ensure an index, just go ahead and do this. No collection creation. Any collection will effectively be created when you first modify it (creating an index counts).
> db.no_such_collection.getIndices()
[ ]
> db.no_such_collection.ensureIndex({whatever: 1})
> db.no_such_collection.getIndices()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.no_such_collection",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"whatever" : 1
},
"ns" : "test.no_such_collection",
"name" : "whatever_1"
}
]
Create empty collection
This is how you could create empty collection in MongoDB using build in interactive terminal:
db.createCollection('someName'); // create empty collection
Just you don't really have to, because as someone pointed before, they will get created in real time once you start to interact with the database.
MongoDB is schema-less end of story, but ...
You could create your own class that interacts with mongo Database. In that class you could define rules that have to fulfilled before it can insert data to mongo collection, otherwise throw custom exception.
Or if you using node.js server-side you could install mongoose node package which allows you to interact with database in OOP style (Why bother to reinvent the wheel, right?).
Mongoose provides a straight-forward, schema-based solution to model
your application data. It includes built-in type casting, validation,
query building, business logic hooks and more, out of the box.
docs: mongoose NPM installation and basic usage
https://www.npmjs.com/package/mongoose
mongoose full documentation http://mongoosejs.com
Mongoose use example (defining schema and inserting data)
var personSchema = new Schema({
name: { type: String, default: 'anonymous' },
age: { type: Number, min: 18, index: true },
bio: { type: String, match: /[a-zA-Z ]/ },
date: { type: Date, default: Date.now },
});
var personModel = mongoose.model('Person', personSchema);
var comment1 = new personModel({
name: 'Witkor',
age: '29',
bio: 'Description',
});
comment1.save(function (err, comment) {
if (err) console.log(err);
else console.log('fallowing comment was saved:', comment);
});
Wrapping up ...
Being able to set schema along with restriction in our code doesn't change the fact that MongoDB itself is schema-less which in some scenarios is actually an advantage. This way if you ever decide to make changes to schema, but you don't bother about backward compatibility, just edit schema in your script, and you are done. This is the basic idea behind the MongoDB to be able to store different sets of data in each document with in the same collection. However, some restriction in code base logic are always desirable.
As of version 3.2, mongodb now provides schema validation at the collection level:
https://docs.mongodb.com/manual/core/schema-validation/
Example for create a schema :
db.createCollection("students", {
validator: {
$jsonSchema: {
bsonType: "object",
required: [ "name", "year", "major", "address" ],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
year: {
bsonType: "int",
minimum: 2017,
maximum: 3017,
description: "must be an integer in [ 2017, 3017 ] and is required"
},
major: {
enum: [ "Math", "English", "Computer Science", "History", null ],
description: "can only be one of the enum values and is required"
},
gpa: {
bsonType: [ "double" ],
description: "must be a double if the field exists"
},
address: {
bsonType: "object",
required: [ "city" ],
properties: {
street: {
bsonType: "string",
description: "must be a string if the field exists"
},
city: {
bsonType: "string",
description: "must be a string and is required"
}
}
}
}
}
}
})
const mongoose = require("mongoose");
const RegisterSchema = mongoose.Schema({
username: {
type: String,
unique: true,
requied: true,
},
email: {
type: String,
unique: true,
requied: true,
},
password: {
type: String,
requied: true,
},
date: {
type: Date,
default: Date.now,
},
});
exports.module = Register = mongoose.model("Register", RegisterSchema);
I watched this tutorial.
You have already been taught that MongoDB is schemaless. However, in practice, we have a kind of "schema", and that is the object space of the object, whose relations a MongoDB database represents. With the ceveat that Ruby is my go-to language, and that I make no claims about exhaustiveness of this answer, I recommend to try two pieces of software:
1. ActiveRecord (part of Rails)
2. Mongoid (standalone MongoDB "schema", or rather, object persistence system in Ruby)
Expect a learning curve, though. I hope that others will point you to solutions in other great languages outside my expertise, such as Python.
1.Install mongoose:
npm install mongoose
2. Set-up connection string and call-backs
// getting-started.js
var mongoose = require('mongoose');
mongoose.connect('mongodb://localhost/test');
//call-backs
var db = mongoose.connection;
db.on('error', console.error.bind(console, 'connection error:'));
db.once('open', function() {
// we're connected!
});
3. Write your schema
var kittySchema = new mongoose.Schema({
name: String
});
4. Model the schema
var Kitten = mongoose.model('Kitten', kittySchema);
5. Create a document
var silence = new Kitten({ name: 'Tom' });
console.log(silence.name); // Prints 'Tom' to console
// NOTE: methods must be added to the schema before compiling it with mongoose.model()
kittySchema.methods.speak = function () {
var greeting = this.name
? "Meow name is " + this.name
: "I don't have a name";
console.log(greeting);
}
enter code here
var Kitten = mongoose.model('Kitten', kittySchema);
Functions added to the methods property of a schema get compiled into the Model prototype and exposed on each document instance:
var fluffy = new Kitten({ name: 'fluffy' });
fluffy.speak(); // "Meow name is fluffy"
I have been working with the Zapier storage api through the store.zapier.com endpoint and have been successful at setting and retrieving values. However I have recently found a need to store more complex information that I would like to update over time.
The data I am storing at the moment looks like the following:
{
"task_id_1": {"google_id": "google_id_1", "due_on": "2018-10-24T17:00:00.000Z"},
"task_id_2": {"google_id": "google_id_2", "due_on": "2018-10-23T20:00:00.000Z"},
"task_id_3": {"google_id": "google_id_3", "due_on": "2018-10-25T21:00:00.000Z"},
}
What I would like to do is update the "due_on" child value of any arbitrary task_id_n without having to delete and add it again. Reading the API information at store.zapier.com I see you can send a patch request combined with a specific action to have better control over the stored data. I attempt to use the patch request and the "set_child_value" action as follows:
def update_child(self, parent_key, child_key, child_value):
header = self.generate_header()
data = {
"action" : "set_child_value",
"data" : {
"key" : parent_key,
"value" : {child_key : child_value}
}
}
result = requests.patch(self.URL, headers=header, json=data)
return result
When I send this request Zapier responds with a 200 status code but the storage is not updated. Any ideas what I might be missing?
Zapier Store doesn't seem to be validating the request body past the "action" and "data" fields.
When you make a request with the "data" field set to an array, you trigger a validation error that describes the schema for the data field (What a way to find documentation for an API! smh).
In the request body, the data field schema for "set_child_value" action is:
{
"action" : {
"enum": [
"delete",
"increment_by",
"set_child_value",
"list_pop",
"set_value_if",
"remove_child_value",
"list_push"
]
},
"data" : {
"key" : {
"type": "object"
},
"values" : {
"type": "object"
}
}
}
Note that it's "values" and not "value"
I was able to update specific child values by modifying my request from a PATCH to a PUT. I had to do away with the data structure of:
data = {
"action" : "set_child_value",
"data" : {
"key" : parent_key,
"value" : {child_key : child_value}
}
and instead send it along as:
data = {
parent_key : {child_key : child_value}
}
My updated request looks like:
def update_child(self, parent_key, child_key, child_value):
header = self.generate_header()
data = {
parent_key : {child_key : child_value}
}
result = requests.put(self.URL, headers=header, json=data)
return result
Never really resolved the issue with the patch method I was attempting before, it does work for other Zapier storage methods such as "pop_from_list" and "push_to_list". Anyhow this is a suitable solution for anyone who runs into the same problem.
I've got a fairly long MongoDB query which I have been using in the console:
db.addresses.find({
$and: [ {"date": {$gte: "2017-06-01"}},
{"date": {$lt: "2017-06-60"}},
{ $or: [{"address.postal_code" : { $regex: /^SW1 /i } },
{"address.postal_code" : { $regex: /^SW2 /i } },
{"address.postal_code" : { $regex: /^SW3 /i } },
{"address.postal_code" : { $regex: /^SW4 /i } }
]}
]})
I now need to use this in Python using PyMongo but am understandably getting a:
SyntaxError: invalid syntax
As it's fairly complex, is there an easy way to escape it as I've got several to use, I've tried enclosing the above query like this:
"""The query from above between the brackets"""
addresses_to_process = db.addresses.find(query)
But I get an error:
TypeError: filter must be an instance of dict, bson.son.SON, or other type that inherits from collections.Mapping
I am trying to make an aggregation query using flask-mongoengine, and from what I have read it does not sound like it is possible.
I have looked over several forum threads, e-mail chains and a few questions on Stack Overflow, but I have not found a really good example of how to implement aggregation with flask-mongoengine.
There is a comment in this question that says you have to use "raw pymongo and aggregation functionality." However, there is no examples of how that might work. I have tinkered with Python and have a basic application up using Flask framework, but delving into full fledged applications & connecting/querying to Mongo is pretty new to me.
Can someone provide an example (or link to an example) of how I might utilize my flask-mongoengine models, but query using the aggregation framework with PyMongo?
Will this require two connections to MongoDB (one for PyMongo to perform the aggregation query, and a second for the regular query/insert/updating via MongoEngine)?
An example of the aggregation query I would like to perform is as follows (this query gets me exactly the information I want in the Mongo shell):
db.entry.aggregate([
{ '$group' :
{ '_id' : { 'carrier' : '$carrierA', 'category' : '$category' },
'count' : { '$sum' : 1 }
}
}
])
An example of the output from this query:
{ "_id" : { "carrier" : "Carrier 1", "category" : "XYZ" }, "count" : 2 }
{ "_id" : { "carrier" : "Carrier 1", "category" : "ABC" }, "count" : 4 }
{ "_id" : { "carrier" : "Carrier 2", "category" : "XYZ" }, "count" : 31 }
{ "_id" : { "carrier" : "Carrier 2", "category" : "ABC" }, "count" : 6 }
The class your define with Mongoengine actually has a _get_collection() method which gets the "raw" collection object as implemented in the pymongo driver.
I'm just using the name Model here as a placeholder for your actual class defined for the connection in this example:
Model._get_collection().aggregate([
{ '$group' :
{ '_id' : { 'carrier' : '$carrierA', 'category' : '$category' },
'count' : { '$sum' : 1 }
}
}
])
So you can always access the pymongo objects without establishing a separate connection. Mongoengine is itself build upon pymongo.
aggregate is available since Mongoengine 0.9.
Link to the API Reference.
As there is no example whatsoever around, here is how you perform an aggregate query using aggregation framework with Mongoengine > 0.9
pipeline = [
{ '$group' :
{ '_id' : { 'carrier' : '$carrierA', 'category' : '$category' },
'count' : { '$sum' : 1 }
}
}]
Model.objects().aggregate(*pipeline)
I have a token saved in mongo db like .
db.user.findOne({'token':'7fd74c28-8ba1-11e2-9073-e840f23c81a0'}['uuid'])
{
"_id" : ObjectId("5140114fae4cb51773d8c4f8"),
"username" : "jjj51#gmail.com",
"name" : "vivek",
"mobile" : "12345",
"is_active" : false,
"token" : BinData(3,"hLL6kIugEeKif+hA8jyBoA==")
}
The above query works fine when i execute in the mongo db command line interface .
The same query when i am trying to run in Django view lik.
get_user = db.user.findOne({'token':token}['uuid'])
or `get_user = db.user.findOne({'token':'7fd74c28-8ba1-11e2-9073-e840f23c81a0'}['uuid'])`
I am getting an error
KeyError at /activateaccount/
'uuid'
Please help me out why I am getting this error .
My database
db.user.find()
{ "_id" : ObjectId("5140114fae4cb51773d8c4f8"), "username" : "ghgh#gmail.com", "name" : "Rohit", "mobile" : "12345", "is_active" : false, "token" : BinData(3,"hLL6kIugEeKif+hA8jyBoA==") }
{ "_id" : ObjectId("51401194ae4cb51773d8c4f9"), "username" : "ghg#gmail.com", "name" : "rohit", "mobile" : "12345", "is_active" : false, "token" : BinData(3,"rgBIMIugEeKQBuhA8jyBoA==") }
{ "_id" : ObjectId("514012fcae4cb51874ca3e6f"), "username" : "ghgh#gmail.com", "name" : "rahul", "mobile" : "8528256", "is_active" : false, "token" : BinData(3,"f9dMKIuhEeKQc+hA8jyBoA==") }
TL;DR your query is faulty.
Longer explanation:
{'token':'7fd74c28-8ba1-11e2-9073-e840f23c81a0'}['uuid']
translates to undefined, because you're trying to get the property uuid from an object that doesn't have that property. In the Mongo shell, which uses Javascript, that translates to the following query:
db.user.findOne(undefined)
You'll get some random (okay, not so random, probably the first) result.
Python is a bit more strict when you're trying to get an unknown key from a dictionary:
{'token':token}['uuid']
Since uuid isn't a valid key in the dictionary {'token':token}, you'll get a KeyError when you try to access it.
EDIT: since you've used Python UUID types to store the tokens in the database, you also need to use the same type in your query:
from uuid import UUID
token = '7fd74c28-8ba1-11e2-9073-e840f23c81a0'
get_user = db.user.find_one({'token' : UUID(token) })