Set schema to prevent nested values in mongodb in python [duplicate]

Set schema to prevent nested values in mongodb in python [duplicate] - python

So I have an application that uses MongoDB as a database. The application makes use of a few collections.
When and how should I go about defining the "schema" of the database which includes setting up all the collections as well as indexes needed?
AFAIK, you are unable to define empty collections in MongoDB (correct me if I am wrong, if I can do this it will basically answer this question). Should I insert a dummy value for each collection and use that to setup all my indexes?
What is the best practice for this?

You don't create collections in MongoDB.
You just start using them immediately whether they “exist” or not.
Now to defining the “schema”. As I said, you just start using a collection, so, if you need to ensure an index, just go ahead and do this. No collection creation. Any collection will effectively be created when you first modify it (creating an index counts).
> db.no_such_collection.getIndices()
[ ]
> db.no_such_collection.ensureIndex({whatever: 1})
> db.no_such_collection.getIndices()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.no_such_collection",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"whatever" : 1
},
"ns" : "test.no_such_collection",
"name" : "whatever_1"
}
]

Create empty collection
This is how you could create empty collection in MongoDB using build in interactive terminal:
db.createCollection('someName'); // create empty collection
Just you don't really have to, because as someone pointed before, they will get created in real time once you start to interact with the database.
MongoDB is schema-less end of story, but ...
You could create your own class that interacts with mongo Database. In that class you could define rules that have to fulfilled before it can insert data to mongo collection, otherwise throw custom exception.
Or if you using node.js server-side you could install mongoose node package which allows you to interact with database in OOP style (Why bother to reinvent the wheel, right?).
Mongoose provides a straight-forward, schema-based solution to model
your application data. It includes built-in type casting, validation,
query building, business logic hooks and more, out of the box.
docs: mongoose NPM installation and basic usage
https://www.npmjs.com/package/mongoose
mongoose full documentation http://mongoosejs.com
Mongoose use example (defining schema and inserting data)
var personSchema = new Schema({
name: { type: String, default: 'anonymous' },
age: { type: Number, min: 18, index: true },
bio: { type: String, match: /[a-zA-Z ]/ },
date: { type: Date, default: Date.now },
});
var personModel = mongoose.model('Person', personSchema);
var comment1 = new personModel({
name: 'Witkor',
age: '29',
bio: 'Description',
});
comment1.save(function (err, comment) {
if (err) console.log(err);
else console.log('fallowing comment was saved:', comment);
});
Wrapping up ...
Being able to set schema along with restriction in our code doesn't change the fact that MongoDB itself is schema-less which in some scenarios is actually an advantage. This way if you ever decide to make changes to schema, but you don't bother about backward compatibility, just edit schema in your script, and you are done. This is the basic idea behind the MongoDB to be able to store different sets of data in each document with in the same collection. However, some restriction in code base logic are always desirable.

As of version 3.2, mongodb now provides schema validation at the collection level:
https://docs.mongodb.com/manual/core/schema-validation/

Example for create a schema :
db.createCollection("students", {
validator: {
$jsonSchema: {
bsonType: "object",
required: [ "name", "year", "major", "address" ],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
year: {
bsonType: "int",
minimum: 2017,
maximum: 3017,
description: "must be an integer in [ 2017, 3017 ] and is required"
},
major: {
enum: [ "Math", "English", "Computer Science", "History", null ],
description: "can only be one of the enum values and is required"
},
gpa: {
bsonType: [ "double" ],
description: "must be a double if the field exists"
},
address: {
bsonType: "object",
required: [ "city" ],
properties: {
street: {
bsonType: "string",
description: "must be a string if the field exists"
},
city: {
bsonType: "string",
description: "must be a string and is required"
}
}
}
}
}
}
})

const mongoose = require("mongoose");
const RegisterSchema = mongoose.Schema({
username: {
type: String,
unique: true,
requied: true,
},
email: {
type: String,
unique: true,
requied: true,
},
password: {
type: String,
requied: true,
},
date: {
type: Date,
default: Date.now,
},
});
exports.module = Register = mongoose.model("Register", RegisterSchema);
I watched this tutorial.

You have already been taught that MongoDB is schemaless. However, in practice, we have a kind of "schema", and that is the object space of the object, whose relations a MongoDB database represents. With the ceveat that Ruby is my go-to language, and that I make no claims about exhaustiveness of this answer, I recommend to try two pieces of software:
1. ActiveRecord (part of Rails)
2. Mongoid (standalone MongoDB "schema", or rather, object persistence system in Ruby)
Expect a learning curve, though. I hope that others will point you to solutions in other great languages outside my expertise, such as Python.

1.Install mongoose:
npm install mongoose
2. Set-up connection string and call-backs
// getting-started.js
var mongoose = require('mongoose');
mongoose.connect('mongodb://localhost/test');
//call-backs
var db = mongoose.connection;
db.on('error', console.error.bind(console, 'connection error:'));
db.once('open', function() {
// we're connected!
});
3. Write your schema
var kittySchema = new mongoose.Schema({
name: String
});
4. Model the schema
var Kitten = mongoose.model('Kitten', kittySchema);
5. Create a document
var silence = new Kitten({ name: 'Tom' });
console.log(silence.name); // Prints 'Tom' to console
// NOTE: methods must be added to the schema before compiling it with mongoose.model()
kittySchema.methods.speak = function () {
var greeting = this.name
? "Meow name is " + this.name
: "I don't have a name";
console.log(greeting);
}
enter code here
var Kitten = mongoose.model('Kitten', kittySchema);
Functions added to the methods property of a schema get compiled into the Model prototype and exposed on each document instance:
var fluffy = new Kitten({ name: 'fluffy' });
fluffy.speak(); // "Meow name is fluffy"

Related

Azure vm provisioning and user assigned identity?

I am looking to resolve issue with trying to apply to the vm I am creating an identity using python sdk. The code:
print("Creating VM " + resource_name)
compute_client.virtual_machines.begin_create_or_update(
resource_group_name,
resource_name,
{
"location": "eastus",
"storage_profile": {
"image_reference": {
# Image ID can be retrieved from `az sig image-version show -g $RG -r $SIG -i $IMAGE_DEFINITION -e $VERSION --query id -o tsv`
"id": "/subscriptions/..image id"
}
},
"hardware_profile": {
"vm_size": "Standard_F8s_v2"
},
"os_profile": {
"computer_name": resource_name,
"admin_username": "adminuser",
"admin_password": "somepassword",
"linux_configuration": {
"disable_password_authentication": True,
"ssh": {
"public_keys": [
{
"path": "/home/adminuser/.ssh/authorized_keys",
# Add the public key for a key pair that can get access to SSH to the runners
"key_data": "ssh-rsa …"
}
]
}
}
},
"network_profile": {
"network_interfaces": [
{
"id": nic_result.id
}
]
},
"identity": {
"type": "UserAssigned",
"user_assigned_identities": {
"identity_id": { myidentity }
}
}
}
The last part, identity: I found somewhere on the web, (not sure where), but it is failing with some weird set/get error when I try to use it. The vm will create fine if I comment out the identity: block, but I need the user assigned identity. I spent the better part of today trying to find information on the options for the begin_create_or_update and info on the identity piece, but I have had no luck. I am looking for help on how to apply a user assigned identity with python to the vm I am creating.

The Set and Get error is because you are declaring the identity block in a wrong way.
If you have a existing User Assigned Identity then you can use the identity block as below:
"identity": {
"type": "UserAssigned",
"user_assigned_identities": {
'/subscriptions/948d4068-xxxxxxxxxxxxxxx/resourceGroups/ansumantest/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mi-identity' : {}
}
As you can see, inside the user_assigned_identities it will be :
'User Assigned Identity ResourceID':{}
instead of
"identity_id":{'User Assigned Identity ResourceID'}
Output:

How to iterate through a JSON file in Azure DevOps Pipelines?

Scene:
I am using Azure DevOps pipelines as a security separator, so that my front end is not directly accessing my AKS.
(Above is a business requirement I am not able to avoid or change in any way)
What I got so far:
I am able to put together a html post body with the information that I will be getting from my front end, and I am able to understand it and parse it out as JSON inside the Azure DevOps Pipeline. (using Python)
Issue:
I need to must be able to iterate through each of the object in my JSON and execute actions as indicated.
JSON example:
{
"actions": [
{
"action": "action(0)",
"config": {
"actionType": "Start"
"stage": "test",
"region": "North",
"version": "v756"
"customer": "Hans"
}
},
{
"action": "action(1)",
"config": {
"actionType": "Stop"
"stage": "test",
"region": "East",
"version": "v752"
"customer": "Christian"
}
},
{
"action": "action(2)",
"config": {
"actionType": "Delete"
"stage": "prod",
"region": "South",
"version": "v759"
"customer": "Anderson"
}
}
]
}
** Edited (malformed JSON example)
TypeScript that generates my testing data
const value = {
actionType: "Create",
stage: "test",
region: "North",
version: "v753",
customer: "Hans"
}
interface Action {
readonly action: string;
readonly config: typeof value;
}
const actions: Array<Action> = [];
for (let i = 0; i < 3; i++) actions.push({
action: `action(${i})`,
config: value
})
const result = JSON.stringify({ actions });
const body = {
templateParameters: {
actions: {
value: result
}
}
}
** Edited: Added the TypeScript
Current pipeline
name: Test-Deploy-$(Date:yyyyMMdd)-$(Date:hh-mm)$(Rev:.r)
pool:
vmImage: ubuntu-latest
parameters:
- name: actions
type: object
default: []
stages:
- stage: test_stage
displayName: Test stage
jobs:
- job: test
displayName: Test the values
steps:
- ${{ each action in parameters.actions}}:
- task: PowerShell#2
displayName: Print out the "Action"-variable
inputs:
targetType: 'inline'
script: '"${{action}}"'
** Edited: Added the pipeline as it stands
My current thinking:
I would like to be able to iterate through the actions in a "for-each" fashion. Like in this pseudo pipeline script below:
- ${{ each action in $(actions) }}:
But I am not able to come up with exactly how that would be done in Azure DevOps Pipelines, so I am hoping that someone here can figure it out with me :)

${{ each }} is a template expression. They're intended for use with parameters rather than variables, because they are evaluated at compile time (so $(variablename) can't have a value yet).
Now, I've not actually tried this myself, but the Runs API has a JSON request body element called templateParameters, which takes an object. What you could try is something like this:
Add a parameter to your pipeline, like:
parameters:
- name: actions
type: object
default: []
When submitting the Runs API call to run your pipeline, include something like:
{
"previewRun": true,
"templateParameters": {
"actions": {
... your JSON actions content as generated in your question ...
},
... other parameters to your JSON run request
}
In your pipeline, reference
- ${{ each action in parameters.actions }}:
The previewRun parameter will cause Azure Pipelines to not run the pipeline, but to return the compiled template for debugging and validation purposes. Remove it when you're ready to execute it.
Also, you'll likely need to do some experimenting with templateParameters to get something acceptable to the pipeline, like
declaring it as an array "templateParameters": [ { "actions": [ your actions ] } ] (as I said, I haven't actually done this, but the documentation suggests this might be a good path to explore).

How to return a graphql Union from a python lambda to AWS appsync?

Is it possible to respond with graphql Union from a python lambda? How? It seems possible but I cannot get it over the line.
I am including a __typename attribute but most of the time I get this error:
{'errorType': 'BadRequestException',
'message': "Could not determine the exact type of MealMix. Missing __typename key on value.'"}
MealMix is my union, and the schema looks like:
type Meal { name: String }
type OtherMeal { name: String }
union MealMix = Meal | OtherMeal
type Query {
GetMealMix(identity_id: ID!, date: AWSDate!): [MealMix]
}
The query is:
query test_query {
GetMealMix(identity_id: "this", date: "2020-11-20") {
... on Meal {name}
... on OtherMeal {name}
}
}
From the lambda I am returning:
return [{' __typename': 'Meal',
'name': 'Kiwifruit, Zespri Gold, Raw'},
{' __typename': 'OtherMeal',
'name': 'The meal at the end of the universe'}]
The response template is the default: $util.toJson($ctx.result)
My googling seems to suggest I just need to include the __typename attribute, but there are no explicit Python examples to that effect. I'm asking first, am I flogging a dead horse (meaning this is not implemented and will never work), and second, if it does work, how exactly?

Problem
On one hand, I found examples where __typename, as you reference, may need to be included in your query.
On the other hand, It's also very interesting, because the AWS documentation mentions nothing about including __typename in your query, but does mention the use of interfaces so I wonder if that's the key to getting things working, that all types extend the same interface.
Solution
Option 1
Try including __typename within Meal and OtherMeal fragments (you mentioned using __typename in your queries, but not sure where you were putting it).
Example
query test_query {
GetMealMix(date: "2020-11-20") {
... on Meal {
__typename
name
}
... on OtherMeal {
__typename
name
}
}
}
Option 2
All types included in union use the same interface, as demonstrated in a section of the documentation you shared titled "Type resolution example"
Example
interface MealInterface { name: String }
type Meal implements MealInterface { name: String }
type OtherMeal implements MealInterface { name: String }
Notes
Hardcoded Responses
You return a hardcoded response, but I'm unsure if additional metadata is needed to process the GQL response on AWS Lambda. Try logging the response to determine if __typename is included. If __typename is not included, consider adding typename to id and use transformation recommended in the AWS documentation you shared:
#foreach ($result in $context.result)
## Extract type name from the id field.
#set( $typeName = $result.id.split("-")[0] )
#set( $ignore = $result.put("__typename", $typeName))
#end
$util.toJson($context.result)
identity_id
Also, "this" in "identity_id" parameter may be causing caching issues due to the way GraphQL handles ID types (see: https://graphql.org/learn/schema/#scalar-types)
References
GraphQL __typename query example on AWS: https://github.com/LambdaSharp/AppSync-Challenge
GraphQL Scalar types: https://graphql.org/learn/schema/#scalar-types

You can write in this way as given below.
It's should work for you.
schema {
query: Query
}
type Query {
GetMealMix(identity_id: ID!, date: AWSDate!): [MealMix]
}
union MealMix = Meal | OtherMeal
type Meal {
name: String
}
type OtherMeal {
name: String
}
query {
GetMealMix(identity_id: "this", date: "2020-11-20") {
... on Meal {name}
... on OtherMeal {name}
}
}
You can take reference from the below URL:
Interfaces and unions in GraphQL

Unique index in mongodb using db.command

I am using mongodb with PyMongo and I would like to separate schema definition from the rest of the application. I have file user_schema.jsonwith schema for user collection:
{
"collMod": "user",
"validator": {
"$jsonSchema": {
"bsonType": "object",
"required": ["name"],
"properties": {
"name": {
"bsonType": "string"
}
}
}
}
}
Then in db.py:
with open("user_schema.json", "r") as coll:
data = OrderedDict(json.loads(coll.read())) # Read JSON schema.
name = data["collMod"] # Get name of collection.
db.create_collection(name) # Create collection.
db.command(data) # Add validation to the collection.
Is there any way to add unique index to name field in user collection without changing db.py (only by changing user_schema.json)? I know I can use this:
db.user.create_index("name", unique=True)
however, then I have information about the collection in two places. I would like to have all configuration of the collection in the user_schema.json file. I need something like that:
{
"collMod": "user",
"validator": {
"$jsonSchema": {
...
}
},
"index": {
"name": {
"unique": true
}
}
}

No, you won't be able to do that without changing db.py.
The contents of user_schema.json is an object that can be passed to db.command to run the collmod command.
In order to create an index, you need to run the createIndexes command, or one of the helpers that calls this.
It is not possible to complete both of these actions with a single command.
A simple modification would be to store an array of commands to run in user_schema.json, and have db.py iterate the array and run each command.

ElasticSearch: Index only the fields specified in the mapping

I have an ElasticSearch setup, receiving data to index via a CouchDB river. I have the problem that most of the fields in the CouchDB documents are actually not relevant for search: they are fields internally used by the application (IDs and so on), and I do not want to get false positives because of these fields. Besides, indexing not needed data seems to me a waste of resources.
To solve this problem, I have defined a mapping where I specify the fields which I want to be indexed. I am using pyes to access ElasticSearch. The process that I follow is:
Create the CouchDB river, associated to an index. This apparently creates also the index, and creates a "couchdb" mapping in that index which, as far as I can see, includes all fields, with dynamically assigned types.
Put a mapping, restring it to the fields which I really want to index.
This is the index definition as obtained by:
curl -XGET http://localhost:9200/notes_index/_mapping?pretty=true
{
"notes_index" : {
"default_mapping" : {
"properties" : {
"note_text" : {
"type" : "string"
}
}
},
"couchdb" : {
"properties" : {
"_rev" : {
"type" : "string"
},
"created_at_date" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"note_text" : {
"type" : "string"
},
"organization_id" : {
"type" : "long"
},
"user_id" : {
"type" : "long"
},
"created_at_time" : {
"type" : "long"
}
}
}
}
}
The problem that I have is manyfold:
that the default "couchdb" mapping is indexing all fields. I do not want this. Is it possible to avoid the creation of that mapping? I am confused, because that mapping seems to be the one which is somehow "connecting" to the CouchDB river.
the mapping that I create seems not to have any effect: there are no documents indexed by that mapping
Do you have any advice on this?
EDIT
This is what I am actually doing, exactly as typed:
server="localhost"
# Create the index
curl -XPUT "$server:9200/index1"
# Create the mapping
curl -XPUT "$server:9200/index1/mapping1/_mapping" -d '
{
"type1" : {
"properties" : {
"note_text" : {"type" : "string", "store" : "no"}
}
}
}
'
# Configure the river
curl -XPUT "$server:9200/_river/river1/_meta" -d '{
"type" : "couchdb",
"couchdb" : {
"host" : "localhost",
"port" : 5984,
"user" : "admin",
"password" : "admin",
"db" : "notes"
},
"index" : {
"index" : "index1",
"type" : "type1"
}
}'
The documents in index1 still contain fields other than "note_text", which is the only one that I have specifically mentioned in the mapping definition. Why is that?

The default behavior of CouchDB river is to use a 'dynamic' mapping, i.e. index all the fields that are found in the incoming CouchDB documents. You're right that it can unnecessarily increase the size of the index (your problems with search can be solved by excluding some fields from the query).
To use your own mapping instead of the 'dynamic' one, you need to configure the River plugin to use the mapping you've created (see this article):
curl -XPUT 'elasticsearch-host:9200/_river/notes_index/_meta' -d '{
"type" : "couchdb",
... your CouchDB connection configuration ...
"index" : {
"index" : "notes_index",
"type" : "mapping1"
}
}'
The name of the type that you're specifying in URL while doing mapping PUT overrides the one that you're including in the definition, so the type that you're creating is in fact mapping1. Try executing this command to see for yourself:
> curl 'localhost:9200/index1/_mapping?pretty=true'
{
"index1" : {
"mapping1" : {
"properties" : {
"note_text" : {
"type" : "string"
}
}
}
}
}
I think that if you will get the name of type right, it will start working fine.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Set schema to prevent nested values in mongodb in python [duplicate] - python

As of version 3.2, mongodb now provides schema validation at the collection level: https://docs.mongodb.com/manual/core/schema-validation/

Related

Azure vm provisioning and user assigned identity?

How to iterate through a JSON file in Azure DevOps Pipelines?

How to return a graphql Union from a python lambda to AWS appsync?

Unique index in mongodb using db.command

ElasticSearch: Index only the fields specified in the mapping

Categories

Resources