Postgres/SQLAlchemy querying JSON array of objects - python

I have a table that looks like
class Domain(db.Model):
dns = db.column(db.JSON)
...
and data that looks like
[
{
"type": "CNAME",
"answers": [...]
},
{
"type": "NS",
"answers": [...]
}, ...
]
I want to write a query using SQLAlchemy that returns rows where the array in dns contains at least object where the type key matches some parameters
I've tried the following but got an error saying "AttributeError: 'Comparator' object has no attribute 'any'"
my_values = ['value1', 'value2', 'value3']
query = session.query(Domain).filter(
Domain.dns.any(
text("json_array_elements(dns)->>'type' IN :my_values")
)
).params(my_values=my_values)

Related

Get Single value from MongoDB using Python

I have a Team class with __find method to query MongoDB and get a single record:
class Team:
def __find(self, key):
team_document = self._db.get_single_data(TeamModel.TEAM_COLLECTION, key)
return team_document
A team document will look like this:
{
"_id": {
"$oid": "62291a3deb9a30c9e3cf5d28"
},
"name": "Warriors of Hell",
"race": "wizards",
"matches": [
{
"MTT001": "won"
},
{
"MCH005": "lost"
}]
My __find method gives me a full document if I pass a query parameter like
{'name':'Warriors of Hell'}
Is there a way I can create a query into which I can pass name and match ID and I get ONLY the result back? Something like:
{'name':'Warriors of Hell', ,'match_id':'MTT001'}
and get back
won
I do not know how to implement this query to look inside the team matches. Can any MongoDB/Python guru help me please?

How to append a new array of values to an existing array document in mongodb using pymongo?

An existing collection like as below:
"_id" : "12345",
"vals" : {
"dynamickey1" : {}
}
I need to add
"vals" : {
"dynamickey2" : {}
}
I have tried in python 2.7 with pymongo 2.8:
col.update({'_id': id)},{'$push': {'vals': {"dynamickey2":{"values"}}}})
Error log:
pymongo.errors.OperationFailure: The field 'vals' must be an array but is of type object in document
Expected Output:
"_id" : "12345",
"vals" : {
"dynamickey1" : {},
"dynamickey2" : {}
}
Edited following question edit:
Two options; use $set with the dot notation, or use python dict manipulation.
The first method is more MongoDB native and is one line of code; the second is a bit more work but gives more flexilbility if you use case is more nuanced.
Method 1:
from pymongo import MongoClient
from bson.json_util import dumps
db = MongoClient()['mydatabase']
db.mycollection.insert_one({
"_id": "12345",
"vals": {
"dynamickey1": {},
}
})
db.mycollection.update_one({'_id': '12345'},{'$set': {'vals.dynamickey2':{}}})
print(dumps(db.mycollection.find_one({}), indent=4))
Method 2:
from pymongo import MongoClient
from bson.json_util import dumps
db = MongoClient()['mydatabase']
db.mycollection.insert_one({
"_id": "12345",
"vals": {
"dynamickey1": {},
}
})
record = db.mycollection.find_one({'_id': '12345'})
vals = record['vals']
vals['dynamickey2'] = {}
record = db.mycollection.update_one({'_id': record['_id']}, {'$set': {'vals': vals}})
print(dumps(db.mycollection.find_one({}), indent=4))
Either way gives:
{
"_id": "12345",
"vals": {
"dynamickey1": {},
"dynamickey2": {}
}
}
Previous answer
Your expected output has an object with duplicate fields (vals); this isn't allowed.~
So whatever you are trying to do, it isn't going to work.

how to write validator method to validate the json element data with python

I'm new to python and trying to writing a python script using jsonschema to validate a huge json output file's schema. Want to make sure my json file doesn't have any null values in it.
Wrote a method to read the schema json file and the output json file, now passed them both to validate function. There are many repeating objects in the json file. Then realized that I should write validator function/class to pass each object and keep validating them in a loop.But stuck here not sure how to do that
{
"id": "test",
"name": "name",
"cake_name": "test",
"metric": 0.5,
"anticipations": [
{
"time": "2018-01-01 00:00:00",
"points": 0.49128797804879504,
"top_properties": {
"LA:TB2341": 0.23,
"LA:TB2342": 0.23,
"LA:TB2343": 0.23
},
"status": 0,
"alert": false
},
{
"time": "2018-01-02 00:00:00",
"points": 0.588751186433263,
"top_properties": {
"LA:TB2342": 0.23,
"LA:TB2341": 0.23,
"LA:TB2344": 0.23
},
"status": 0,
"alert": true
}
]
}
PS: corresponding schema file is generated from "https://jsonschema.net/" which is my moduleschema.json and above json is modelout.json
Code I wrote just to read files:
def test_json(self):
with open('/Users/moduleschema.json', 'r') as json_file:
schema = json_file.read()
print(schema)
with open('/Users/modelout.json', 'r') as output_json:
outputfile = output_json.read()
print(outputfile)
strt = jsonschema.validate(outputfile, schema)
jsonschema.Draft4Validator
print(strt)
I want to parse through the json file to make sure all the fields display the right types(ints for ints, strings for string values). I'm a newbie in python forgive me if this is a silly question. thanks!
So I am going to give an answer that relies on a 3rd party package that I really really like. I did not contribute to it but I have used it and it is very useful especially for the type of validation here.
Yes you can create a custom validator like
import json
import typing
# here json_data is the data in your question
def custom_validator(json_data: typing.Dict):
string_attributes = ["id", "name", "cake_name", "status", "time", "LA:TB2342", "LA:TB2341", "LA:TB2344"]
int_attributes = [...]
float_attributes = [...]
validations_errors = []
for attribute in string_attributes:
if attribute in json_data:
if attribute in string_attributes and not isinstance(json_data.get(attribute), str):
validations_errors.append(f"key {attribute} is not a string, got {json_data.get(attribute)}")
...
This can quickly get out of hand. Perhaps you can spend more time to make it pretty etc.
BUT, I highly suggest that you read up on dataclasses and pydantic
Here is the solution I would use
import json
import typing
from pydantic import BaseModel
# if you look closely, this just represents those tiny dictionaries in your list
class Anticipation(BaseModel):
time: str
points: float
top_properties: typing.Dict[str, float]
status: int
alert: bool
# this is the whole thing, note how we say that anticipations is a list of those objects we defined above
class Data(BaseModel):
id: str
name: str
cake_name: "str"
metric: float
anticipations: typing.List[Anticipation]
json_data = """{
"id": null,
"name": "name",
"cake_name": "test",
"metric": 0.5,
"anticipations": [
{
"time": "2018-01-01 00:00:00",
"points": 0.49128797804879504,
"top_properties": {
"LA:TB2341": 0.23,
"LA:TB2342": 0.23,
"LA:TB2343": 0.23
},
"status": 0,
"alert": false
},
{
"time": "2018-01-02 00:00:00",
"points": 0.588751186433263,
"top_properties": {
"LA:TB2342": 0.23,
"LA:TB2341": 0.23,
"LA:TB2344": 0.23
},
"status": null,
"alert": true
}
]
}
"""
data = json.loads(json_data)
data = Data(**data)
I changed id to null and status to null in the last anticipation. If you run this, it will fail and show you this message. Which is fairly useful
pydantic.error_wrappers.ValidationError: 2 validation errors
id
none is not an allowed value (type=type_error.none.not_allowed)
anticipations -> 1 -> status
value is not a valid integer (type=type_error.integer)
Obviously this means that you will have to install a 3rd party package and for a new python coder people would suggest not to do that. In that case, the template below should point you in the right direction
def validate(my_dict: typing.Dict, string_attributes, int_attributes, float_attributes):
validations_errors = []
for attribute in string_attributes:
if attribute in my_dict:
if attribute in string_attributes and not isinstance(my_dict.get(attribute), str):
validations_errors.append(f"key {attribute} is not a string, got {my_dict.get(attribute)}")
if attribute in int_attributes and not isinstance(my_dict.get(attribute), int):
# append to the list of errors
pass
return validations_errors
def custom_validator(json_data: typing.Dict):
string_attributes = ["id", "name", "cake_name", "time", "LA:TB2342", "LA:TB2341", "LA:TB2344"]
int_attributes = [...]
float_attributes = [...]
# now do it for anticipations
validation_errors = validate(json_data, string_attributes, int_attributes, float_attributes)
for i, anticipation in enumerate(json_data.get('anticipations')):
validation_error = validate(anticipation, string_attributes, int_attributes, float_attributes)
if validation_error:
validation_errors.append(f"anticipation -> {i} error: {validation_error}")
return validation_errors
data = json.loads(json_data)
custom_validator(data)
Output: ['key id is not a string, got None']
You can build up on that function

BigQuery external tables with python

How can i create external tables (federated data source) in BigQuery using python (google-cloud-bigquery)?
I know you can use bq commands like this, but that is not how i want to do it:
bq mk --external_table_definition=path/to/json tablename
bq update tablename path/to/schemafile
with external_table_definition as:
{
"autodetect": true,
"maxBadRecords": 9999999,
"csvOptions": {
"skipLeadingRows": 1
},
"sourceFormat": "CSV",
"sourceUris": [
"gs://bucketname/file_*.csv"
]
}
and a schemafile like this:
[
{
"mode": "NULLABLE",
"name": "mycolumn1",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "mycolumn2",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "mycolumn3",
"type": "STRING"
}
]
Thank you for your help!
Lars
table_id = 'table1'
table = bigquery.Table(dataset_ref.table(table_id), schema=schema)
external_config = bigquery.ExternalConfig('CSV')
external_config = {
"autodetect": true,
"options": {
"skip_leading_rows": 1
},
"source_uris": [
"gs://bucketname/file_*.csv"
]
}
table.external_data_configuration = external_config
table = client.create_table(table)
Schema Format is :
schema = [
bigquery.SchemaField(name='mycolumn1', field_type='INTEGER', is_nullable=True),
bigquery.SchemaField(name='mycolumn2', field_type='STRING', is_nullable=True),
bigquery.SchemaField(name='mycolumn3', field_type='STRING', is_nullable=True),
]
I know this is well after the question has been asked and answered, but the above accepted answer does not work. I attempted to do the same thing you are describing and additionally trying to use the same approach to update an existing external table who added some new columns. This would be the correct snippet to use assuming you have that JSON file stored somewhere like /tmp/schema.json
[
{
"mode": "NULLABLE",
"name": "mycolumn1",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "mycolumn2",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "mycolumn3",
"type": "STRING"
}
]
You should simply need to have the following if you already have the API representation of the options you want to add to the external table.
from google.cloud import bigquery
client = bigquery.Client()
# dataset must exist first
dataset_name = 'some_dataset'
dataset_ref = client.dataset(dataset_name)
table_name = 'tablename'
# Or wherever your json schema lives
schema = client.schema_from_json('/tmp/schema.json')
external_table_options = {
"autodetect": True,
"maxBadRecords": 9999999,
"csvOptions": {
"skipLeadingRows": 1
},
"sourceFormat": "CSV",
"sourceUris": [
"gs://bucketname/file_*.csv"
]
}
external_config = client.ExternalConfig.from_api_repr(external_table_options)
table = bigquery.Table(dataset_ref.table(table_name), schema=schema)
table.external_data_configuration = external_config
client.create_table(
table,
# Now you can create the table safely with this option
# so that it does not fail if the table already exists
exists_od=True
)
# And if you seek to update the table's schema and/or its
# external options through the same script then use
client.update_table(
table,
# As a side note, this portion of the code had me confounded for hours.
# I could not for the life of me figure our that "fields" did not point
# to the table's columns, but pointed to the `google.cloud.bigquery.Table`
# object's attributes. IMHO, the naming of this parameter is horrible
# given "fields" are already a thing (i.e. `SchemaField`s).
fields=['schema', 'external_data_configuration'])
)
In addition to setting the external table configuration using the API representation, you can set all of the same attributes by calling the names of those attributes on the bigquery.ExternalConfig object itself. So this would be another approach surrounding just the external_config portion of the code above.
external_config = bigquery.ExternalConfig('CSV')
external_config.autodetect = True
external_config.max_bad_records = 9999999
external_config.options.skip_leading_rows = 1
external_config.source_uris = ["gs://bucketname/file_*.csv"]
I must again however raise some frustration with the Google documentation. The bigquery.ExternalConfig.options attribute claims that it can be set with a dictionary
>>> from google.cloud import bigquery
>>> help(bigquery.ExternalConfig.options)
Help on property:
Optional[Dict[str, Any]]: Source-specific options.
but that is completely false. As you can see above the python object attribute names and the API representation names of those same attributes are slightly different. Either way you try it though, if you had a dict of the source-specific options (e.g. CSVOptions, GoogleSheetsOptions, BigTableOptions, etc...) and attempted to pass that dict as the options attribute, it laughs in your face and says mean things like this.
>>> from google.cloud import bigquery
>>> external_config = bigquery.ExternalConfig('CSV')
>>> options = {'skip_leading_rows': 1}
>>> external_config.options = options
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cant set attribute
>>> options = {'skipLeadingRows': 1}
>>> external_config.options = options
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cant set attribute
>>> options = {'CSVOptions': {'skip_leading_rows': 1}}
>>> external_config.options = options
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cant set attribute
>>> options = {'CSVOptions': {'skipLeadingRows': 1}}
>>> external_config.options = options
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cant set attribute
The workaround was iterating over the options dict and using the __setattr__() method on the options which worked well for me. Pick you favorite approach from above. I have tested all of this code and will be using it for some time.

MongoEngine EmbeddedDocument query with array value

I have an object that looks something like this in the database, with corresponding MongoEngine models:
{
...
"config" : {
"inner_group" : {
"individuals" : [
{
"entity_id" : "54321",
}
],
},
...
}
...
}
I am trying to query this data using the entity_id field in the object which is part of the individuals collection.
I have tried querying according to the MongoEngine docs but I have not been able to pull the data using the following:
data = Model.objects(config__inner_group__individuals__S__entity_id="54321")
How can I query the entire parent based on the entity_id?

Categories

Resources