I'm trying to scrape a database of information, but was having trouble querying. Here's the basic database setup in MongoDB:
{
"ID": 346,
"data": [
{
"number": "23",
"name": "Winnie"
},
{
"number": "12",
"name": "Finn"
},
{
"number": "99",
"name": "Todd"
}
]
}
{
"ID": 346,
"data": [
{
"number": "12",
"name": "Ram"
},
{
"number": "34",
"name": "Greg"
},
{
"number": "155",
"name": "Arnie"
}
]
}
relevant Python code is below:
import pymongo
import json
import io
import sys
from bson.json_util import dumps
from pymongo import MongoClient
stringArr = ['"23"', '"12"', '"155"']
for x in range(0, len(stringArr))
print(collection.find({"data.number" : stringArr[x]}).count())
When I enter collection.find({"data.number" : "23"}).count() I return the correct number of entries that have "23" as the number in data, so I presume my syntax for find in Python to be messed up, likely having to do with the variable being a string, but I'm fairly inexperienced with MongoDB, let alone PyMongo. Any suggestion would be greatly appreciated!
$elemMatch operator is used to match values contained within an array field belonging to BSON document.
According to description as mentioned in above question please try executing following raw query in MongoDB shell.
db.collection.find({
data: {
$elemMatch: {
number: {
$in: ["23", "12", "155"]
}
}
}
})
Related
I am trying to validate the json for required fields using python. I am doing it manually like iterating through the json reading it. Howerver i am looking for more of library / generic solution to handle all scenarios.
For example I want to check in a list, if a particular attribute is available in all the list items.
Here is the sample json which I am trying to validate.
{
"service": {
"refNumber": "abc",
"item": [{
"itemnumber": "1",
"itemloc": "uk"
}, {
"itemnumber": "2",
"itemloc": "us"
}]
}
}
I want to validate if I have refNumber and itemnumber in all the list items.
A JSON Schema is a way to define the structure of JSON.
There are some accompanying python packages which can use a JSON schema to validate JSON (jsonschema).
The JSON Schema for your example would look approximately like this:
{
"type": "object",
"properties": {
"service": {
"type": "object",
"properties": {
"refNumber": {
"type": "string"
},
"item": {
"type": "array",
"items": {
"type": "object",
"properties": {
"itemnumber": {
"type": "string"
},
"itemloc": {
"type": "string"
}
}
}
}
}
}
}
}
i.e., an object containing service, which itself contains a refNumber and a list of items.
Since i dont have enough rep to add a comment i will post this answer.
First i have to say i dont program with python.
According to my google search, you have a jsonschema module available for Python.
from jsonschema import validate
schema = {
"type": "object",
"properties": {
"service": {"object": {
"refNumber": {"type" : "string"},
"item: {"array": []}
},
"required": ["refNumber"]
},
},
}
validate(instance=yourJSON, schema=yourValidationSchema)
This example is not tested, but you can get some idea,
Link to jsonschema docs
Trying to update JSON attribute value in line using JPATH.
Trying a solution in Python but also assessing Snowpark alternative(assuming data loaded in a table in variant column).
Python code is working for Objects but failing in case of arrays involved.
Python code:
json={
"ID": "1",
"NAME": { "FIRST_NAME": "ABC", "LAST_NAME": "XYZ" },
"ADDR": [
{ "TYPE": "HOME", "ADDR_L1": "SDGSG", "CITY": "AFAFA" },
{ "TYPE": "OFFC", "ADDR_L1": "AFASF", "CITY": "SDGSDG" }
],
"CONTACT": { "CONTACTS": [{ "TYPE": "A" }, { "TYPE": "B" }, { "TYPE": "C" }] },
"LEVEL1OBJ": {
"LEVEL2ARR": [{ "LEVEL3OBJ": "A" }, { "LEVEL3OBJ": "B" }],
"LEVEL2ARR_1":[{"LEVEL3ARR":[{"LEVEL4OBJ":"A"},{"LEVEL4OBJ":"B"}]},{"LEVEL3ARR":[{"LEVEL4OBJ":"C"},{"LEVEL4OBJ":"D"}]}],
"LEVEL2OBJ": "GFDB"
}
}
#Below input works
#keys=['NAME','FIRST_NAME']
#Below doesnt work
#keys=['ADDR','0','ADDR_L1']
#keys=['CONTACT','CONTACTS','0','TYPE']
keys=['LEVEL1OBJ','LEVEL2ARR_1','0','LEVEL3ARR','0','LEVEL4OBJ']
from functools import reduce
import operator
def get_by_path(root, items):
return reduce(operator.getitem, items, root)
def set_by_path(root, items, value):
get_by_path(root, items[:-1])[items[-1]] = value
set_by_path(json,keys,'')
print(json)
Does anyone have had experience with this?
What could be the code in Snowpark?
I am using python to dereference JSON from two/more files.
Something like below,
content of file1 (Primary.json):
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://hcmdevblobsa.blob.core.windows.net/order/v2/order.json",
"title": "LCBO order schema",
"description": "Canonical order structure describing various order types",
"definitions": {
"addressDetail": {
"description": "Address Information",
"type": "object",
"properties": {
"name": {
"type": ["string","null"],
"minLength": 1,
"maxLength": 64
},
"age": {
"$ref": "secondary.json#/properties/age"
}
}
}
}
}
And the File2(Secondary.json) :
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://hcmdevblobsa.blob.core.windows.net/enumerations.json",
"title": "Enumerations used for JSON schemas",
"description": "Catalog of allowed values for schema properties",
"properties": {
"age": {
"type": "integer"
}
}
}
My idea is to use jsonref.I tried this library from the answer
json reference extraction in python
but in this case the reference is mentioned in the same file like this,
json_str = """{"real": [1, 2, 3, 4], "ref": {"$ref": "#/real"}}"""
data = jsonref.loads(json_str)
but in my case, the reference is in another file. So I tried to merge two files with jsonmerge and to use jsonref,
I tried using jsonmerge and jsonref with the below code,
import jsonmerge
import jsonref
import pprint
head = open('AltPrimary.json')
tail = open('Secondary.json')
result = jsonmerge.merge(head,tail)
final = jsonref.loads(data3)
this errors out in jsonref.loads because it doesn't know that the second part (after merging) is from 'Secondary.json'. So, it errors out while reading,
$ref": "secondary.json#/properties/age"
I tried by concatenating, 'seondary' to the file 'secondary.json' like,
"Secondary" :{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://hcmdevblobsa.blob.core.windows.net/enumerations.json",
"title": "Enumerations used for JSON schemas",
"description": "Catalog of allowed values for schema properties",
"properties": {
"age": {
"type": "integer"
}
}
}
but during validation it failed. In real world I may need to dereference from multiple files like below,
"person": {
"$ref": "schemas/people/Bruce-Wayne.json"
},
"place": {
"$ref": "schemas/places.yaml#/definitions/Gotham-City"
},
It would be helpful if anyone has any thoughts on this. Many thanks.
I am using a site's REST API's and have been primarily using Python's 'requests' module to GET json responses. The goal of the GET requests are to ultimately pull a user's form response which ends up being a complex json document. To deal with this:
user_form_submission = requests.get('https://www.url/doc.json',
auth = (api_key, secret),
params = params)
python_obj = json.loads(user_form_submission.text)
trimmed_dict = python_obj['key'][0]['keys']
For context, this is what trimmed_dict would look like formatted as .json:
{
"Date": { "value": "2020-04-26", "type": "date" },
"Location": {
"value": "Test ",
"type": "text",
"geostamp": "lat=34.00000, long=-77.00000, alt=17.986118, hAccuracy=65.000000, vAccuracy=10.000000, timestamp=2020-04-26T23:39:56Z"
},
"form": {
"value": [
{
"form_Details": {
"value": [
{
"code": {
"value": "0000000000",
"type": "barcode"
},
"Name": { "value": "bob", "type": "text" }
}
],
"type": "group"
},
"Subtotal": { "value": "4", "type": "decimal" },
"form_detail2": {
"value": [
{
"name": {
"value": "billy",
"type": "text"
},
"code": {
"value": "00101001",
"type": "barcode"
},
"Classification": {
"value": "person",
"type": "select1"
},
"Start_Time": { "value": "19:43:00", "type": "time" },
"time": { "value": "4", "type": "decimal" }
}
],
"type": "subform"}
}
]
}
}
Now I have a portion of the json that contains both the useful and useless. From this point, can I pass this obj in a POST? I've tried every way that I can think of approaching it, and have been shut down.
Understanding how I want to go about this, this is how I thought it would go:
json_post = requests.post(' https://url/api/doc.json',
auth = (api_key, secret),
json = {
"form_id" : 'https://url.form.com/formid',
'payload':{
json.dumps(trimmed_dict)
}})
But, when I do this, I get the following error --
TypeError: Object of type set is not JSON serializable
How can I push this dict through this POST? If there's a more effective way of going about it, I am very open to suggestion.
Try removing the curly braces around json.dumps(trimmed_dict). json.dumps turns your trimmed_dict into a string, which becomes a python set when surrounded with braces.
Additionally you could remove json.dumps and plug the trimmed_dict into the structure directly as the value associated with payload.
Remove the extra {} from the payload. payload itself is a key and json.dumps(trimmed_dict) as a value is enough
json_post = requests.post(' https://url/api/doc.json',
auth = (api_key, secret),
json = {
"form_id" : 'https://url.form.com/formid',
"payload": json.dumps(trimmed_dict)
})
i get many json strings from a mysql DB an should combine them.
For example:
{
"type": "device",
"name": "Lampe",
"controls": [
{
"type": "switch",
"name": "Betrieb",
"topic": "/lampe/schalter"
}
]
}
in combination this devices should get into a array of a json file
{
"name": "Test-System",
"devices": [
{
"type": "device",
"name": "Lampe",
"controls": [
{
"type": "switch",
"name": "Betrieb",
"topic": "/lampe/schalter"
}
]
},
{
other Device
}
]
}
i do not understand how to do this in python
does someone have a idea how to do it ?
The json module can be used.
#!/usr/bin/env python3.5
import json
# Parse each device JSON file.
device1 = json.load(open("device-switch-Lampe.json"))
device2 = json.load(open("device-sensor-Wert.json"))
# more devices ...
obj = {"name": "Test-System", "devices": [device1, device2]}
print(json.dumps(obj))
Output (prettified):
{
"devices": [{
"type": "device",
"controls": [{
"type": "switch",
"topic": "/lampe/schalter",
"name": "Betrieb"
}],
"name": "Lampe"
}, {
"type": "device",
"controls": [{
"type": "sensor",
"topic": "/sensor/wert",
"name": "Wert"
}],
"name": "Sensor"
}],
"name": "Test-System"
}
There are two ways you could do this - by working on strings, or by working with Python-JSON data structures. The former would be something like
# untested code
s = '''{
"name": "Test-System",
"devices": [ '''
while True:
j = get_json_from_DB()
if not j: break # null string or None
s = s + j + ',\n'
s = s[:-2] + ']\n}\n' #[:-2 loses the last ',\n' from the loop
Or if you want to work with Python loaded-JSON then
import json
# untested code
s = {
"name": "Test-System",
"devices": []
}
while True:
j = get_json_from_DB()
if not j: break # null string or None
s['devices'].append( json.loads(j) )
# str = json.dumps(s) # ought to be valid
This latter will validate all your incoming json-strings (json.loads() will throw an exception for any bad JSON) and will be more efficient for large numbers of devices. It's therefore to be preferred unless you are working in a RAM-constrained embedded system with small numbers of devices, where the greater memory footprint of the latter is a problem.