pyarango driver for arangoDB: validation - python

I am using pyarango driver (https://github.com/tariqdaouda/pyArango) for arangoDB, but I cannot understand how the field validation works. I have set the fields of a collection as in the github example:
import pyArango.Collection as COL
import pyArango.Validator as VAL
from pyArango.theExceptions import ValidationError
import types
class String_val(VAL.Validator) :
def validate(self, value) :
if type(value) is not types.StringType :
raise ValidationError("Field value must be a string")
return True
class Humans(COL.Collection) :
_validation = {
'on_save' : True,
'on_set' : True,
'allow_foreign_fields' : True # allow fields that are not part of the schema
}
_fields = {
'name' : Field(validators = [VAL.NotNull(), String_val()]),
'anything' : Field(),
'species' : Field(validators = [VAL.NotNull(), VAL.Length(5, 15), String_val()])
}
So I was expecting that when I try to add a document into "Humans" collection, if 'name' field is not a string, an error would rise. But it didn't seem to work that easy.
This is how I add documents to the collection:
myjson = json.loads(open('file.json').read())
collection_name = "Humans"
bindVars = {"doc": myjson, '#collection': collection_name}
aql = "For d in #doc INSERT d INTO ##collection LET newDoc = NEW RETURN newDoc"
queryResult = db.AQLQuery(aql, bindVars = bindVars, batchSize = 100)
So if 'name' is not a string I actually don't get any error and is uploaded into the collection.
Does someone knows how can check if a document contains proper fields for that collection using the built-in validation of pyarango?

I don't see anything wrong with your validator, its just that if you're using AQL queries to insert your documents, pyArango has no way of knowing the contents prior to insertion.
Validators only work on pyArango documents if you do:
humans = db["Humans"]
doc = humans.createDocument()
doc["name"] = 101
That should trigger the exception because you've defined:
'on_set': True

ArangoDB as document store itself doesn't enforce schemas, neither do the drivers.
If you need schema validation, this can be done on top of the driver or inside of ArangoDB using a Foxx service (via the joi validation library).
One possible solution for doing this is using JSON Schema with its python implementation on top of the driver in your application:
from jsonschema import validate
schema = {
"type" : "object",
"properties" : {
"name" : {"type" : "string"},
"species" : {"type" : "string"},
},
}
Another real life example using JSON Schema is swagger.io, which is also used to document the ArangoDB REST API and ArangoDB Foxx services.

I don't know yet what was wrong with the code I posted but now seems to work. However I had to convert unicode to utf-8 when reading the json file otherwise it was not able to identify strings. I know ArangoDB as itself does not enforce schemes but I am using that has a built-in validation.
For those interested in a built-in validation of arangoDB using python visit pyarango github.

Related

pymongo converts . variables into a dict

I am inserting the data to mongoDB collection through pymongo. I have logged all the information and data which is being sent to update_one statement.
Data which is logged just before update_one statement :
data = {'a': 'h9421976fc124d5756497d3b', 'b': 1611046532.4558306, 'kw_trigger_thing_name': 'ThingName.a', 'ThingName.a_capability_temperature': 44, 'ThingName.a_capability_humidity': '288', 'ThingName.a_kw_thing_name': 'ThingName.a'}
But when it got inserted into "test" then it got appended like this :
inserted_data = { "_id" : ObjectId("6005d317525e0d67866c564f"), "a" : "h9421976fc124d5756497d3b", "b" : 1611046532.4558306, "ThingName" : { "a_capability_humidity" : "288", "a_capability_temperature" : 44,"a_kw_thing_name" : "ThingName.a"}
Using this to update the document:
collection.update_one({"a": "h9421976fc124d5756497d3b"},{"$set": data},upsert=True,)
So here you'll see parsed data with same prefix keys ThingName. get converted into a dict in mongo collection with key as ThingName.
WHy this is happening and how can we override this?
That's perfectly valid. Because when you update with thing.a:x, then it will store as object thing: { a : x}
Field names restrictions
Though mongo supports dot in latest versions, drivers do not support them yet. Hence the conversion still happens.
Another wonderful post

How do i use SimpleQueryString function of elasticsearch?

I am trying to write a django app and use elasticsearch in it with elasticsearch-dsl library of python. I don't want to create all switch-case statements and then pass search queries and filters accordingly.
I want a function that does the parsing stuff by itself.
For e.g. If i pass "some text url:github.com tags:es,es-dsl,django",
the function should output corresponding query.
I searched for it in elasticsearch-dsl documentation and found a function that does the parsing.
https://github.com/elastic/elasticsearch-dsl-py/search?utf8=%E2%9C%93&q=simplequerystring&type=
However, I dont know how to use it.
I tried s = Search(using=client).query.SimpleQueryString("1st|ldnkjsdb"), but it is showing me parsing error.
Can anyone help me out?
You can just plug the SimpleQueryString in the Search object, instead of a dictionary send the elements as parameters of the object.
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from elasticsearch_dsl.query import SimpleQueryString
client = Elasticsearch()
_search = Search(using=client, index='INDEX_NAME')
_search = _search.filter( SimpleQueryString(
query = "this + (that | thus) -those",
fields= ["field_to_search"],
default_operator= "and"
))
A lot of elasticsearch_dsl simply change the dictionary representation to classes of functions that makes the code look pythonic, and avoid the use of hard-to-read elasticsearch JSONs.
Im guessing you are asking about the usage of elasticsearch-dsl with query string like you are making a request with json data to the elasticsearch api. If that's the case, this is how you are going to use elasticsearch-dsl:
assume you have the query in query variable like this:
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "this AND that OR thus"
}
}
}
and now do this:
es = Elasticsearch(
host=settings.ELASTICSEARCH_HOST_IP, # Put your ES host IP
port=settings.ELASTICSEARCH_HOST_PORT, # Put yor ES host port
)
index = settings.MY_INDEX # Put your index name here
result = es.search(index=index, body=query)

How to bulk update a single field in elasticsearch using python

How to perform a single field bulk update in elastic search. Currently, I am trying with helper function as follows,
from elasticsearch import Elasticsearch, helpers
for doc in collection_1:
try:
uid = int(doc["uid"])
k.append({
"_index": "col1",
"_type" : "col1",
"_id" : uid,
"_source": doc
})
except Exception as err:
print(err)
helpers.bulk(es, k)
E.S version: 2.4
Please show your import and and your Elasticsearch version. Also your code is missing a reference to collection1 variable. If you want to use the bulk helpers, you shall follow the syntax described in the python bulk helpers documentation.
You shall specify a field _op_type in your body and set it to update and then pass your partial document.
You could use script for updating just required fields not all _source document
"script": {
"source": "ctx._source.datetime2=new Date((long)(ctx._source.creationDate+12600000));"
,"lang": "painless"
}

Elasticsearch TransportError(400, u'parse_exception', u'Failed to derive xcontent')

I'm trying to add a new field to an existing document in elasticsearch. I use the elasticsearch python API to do this.
My query is :
addField = { "script" : { "inline" : "ctx._source.langTranslation = test"}}
esclient.bulk(body = addField)
When I try this I get this error :
I search how to avoid this error but not found anything. What is wrong with my resquest ?
Thanks for your answer
I have found a solution, instead of use bulk function I use update function :
esclient.update(index = hit['_index'], doc_type = hit['_type'], id = hit['_id'], body = {"doc": {"langTranslation": translation}})
And it works, I don't have any error message.

Best way to change type of column in dynamoDB

I have the DynamoDB table that is filled from different services/sources. The table has next schema:
{
"Id": 14782,
"ExtId": 1478240974, //pay attention it is Number
"Name": "name1"
}
Sometimes, after services started to work I had found that one service sends data in incorrect format. It looks like:
{
"Id": 14782,
"ExtId": "1478240974", //pay attention it is String
"Name": "name1"
}
DynamoDB is NoSQL database so, now I have millions mixed records that are difficult to query or scan. I understand that my main fault was missed validation.
Now I have to go throw all records and if it is inappropriate type - remove it and add with same data but with the correct format. Is it possible to do in another gracefully way?
So it was pretty easy. It is possible to do with attribute_type method.
First of all, I added imports:
from boto3.dynamodb.conditions import Attr
import boto3
And my code:
attr = Attr('ExtId').attribute_type('S')
response = table.scan(FilterExpression = attr)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(FilterExpression = attr, ExclusiveStartKey = response['LastEvaluatedKey'])
items.extend(response['Items'])
It is possible to find more condition customization in the next article - DynamoDB Customization Reference

Categories

Resources