Importing Json to MongoDB with Python - python

I am currently trying to import a lot of json files to Mongodb, some of the jsons are simple with just object:Key:value and those json uploads I can query just fine within python.
Example
[
{
"platform_id": 28,
"mhz": 2400,
"version": "1.1.1l"
}
[
The MongoDB compass shows it like this
Where the problem lies in one of the tools, creates a doc in Mongo, that I can not figure out how to query. The tool creates a json with system information, that's being pushed into the db. Example:
...
{
"systeminfo": [
{
"component": "system board",
"description": "sys board123"
},
{
"component": "bios",
"version": "xyz",
"date": "06/28/2021"
},
{
"component": "processors",
"htt": true,
"turbo": false
},
...
etc for a total of 23 objects.
If I push it directly into Mongo DB it looks like this in compass
So the question is, is there a way to collapse the hardware json one level or a way to query the db. I have found a way to collapse the json, but it moves each value pair into a new dictionary for upload and every parameter is done individually. Not sustainable as the tool is constantly adding new fields and need my app to handle the changes
Here is an example of the hw query, using same pattern works fine for the other collection
db=myclient[('db_name'])]
col = db[(HW_collection]
myquery={"component":"processors"}
mydoc=col.find(myquery)

The followup issue that almost always arises from {"systeminfo.component":"processors"} is that the whole doc will be returned for any array that contains at least one processors entry. Matching does not mean filtering. Below is a slightly more comprehensive solution that includes "collapsing" the info into the top level doc.
Assume input is something like this:
{
"doc":1, "systeminfo": [
{"component": "system board","description": "sys board123"},
{"component": "bios","version": "xyz","date": "06/28/2021"},
{"component": "processors","htt": true,"turbo": false}
]
},{
"doc":2, "systeminfo": [
{"component": "RAM","description": "64G DIMM"},
{"component": "processors","htt": false,"turbo": false},
{"component": "bios","version": "abc","date": "06/28/2018"}
]
},{
"doc":3, "systeminfo": [
{"component": "RAM","description": "32G DIMM"},
{"component": "SCSI","version": "X","date": "01/01/2000"}
]
}
then
db.foo.aggregate([
{$project: {
doc: true, // carry doc num along for ride
// Walk the $systeminfo array and filter for component = processors and
// assign to field P (temporary field, any name is fine):
P: {$filter: {input: "$systeminfo", as: "z",
cond: {$eq:["$$z.component","processors"]} }}
}}
// Remove docs that had no processors:
,{$match: {P: {$ne:[]}}}
// A little complex but read it "backwards" to better understand. The P
// array will be left with 1 entry for processors. "Lift" that doc out of
// the array with $arrayElemAt[0] and merge it with the info in the containing
// top level doc which is $$CURRENT, and then make that merged entity the
// new root (essentially the new $$CURRENT)
,{$replaceRoot: {newRoot: {$mergeObjects: [ {$arrayElemAt:["$P",0]}, "$$CURRENT" ]}} }
// Get rid of the tmp field:
,{$unset: "P"}
]);
yields
{
"component" : "processors",
"htt" : true,
"turbo" : false,
"_id" : ObjectId("61eab547ba7d8bb5090611ee"),
"doc" : 1
}
{
"component" : "processors",
"htt" : false,
"turbo" : false,
"_id" : ObjectId("61eab547ba7d8bb5090611ef"),
"doc" : 2
}

Related

Export part of data filed - MongoDB

I'm using MongoDB Compass to export my data as csv file, but I have only the choice to select which field I want and not elements in a specific field.
MongoDB export data:
Actually, I'm interested to save only the "scores" for object "0,1,2".
Here a ScreenShot from MongDB Compas:
It is something that I should deal with python?
One option could be to "rewrite" "scoreTable" so that there are a maximum of 3 elements in the "scores" array and then "$out" to a new collection that can be exported in full.
db.devicescores.aggregate([
{
"$set": {
"scoreTable": {
"$map": {
"input": "$scoreTable",
"as": "player",
"in": {
"$mergeObjects": [
"$$player",
{"scores": {"$slice": ["$$player.scores", 3]}}
]
}
}
}
}
},
{"$out": "outCollection"}
])
Try it on mongoplayground.net.

JSON jq/python file manipulation with specific key name aggregation

I need to modify the structure of this json file:
[
{
"id":"3333",
"properties":{
"label":"Computer",
"name":"My-Laptop"
}
},
{
"id":"9998",
"type":"file_system",
"properties":{
"mount_point":"/opt",
"name":"/dev/mapper/rhel-opt",
"root_container":"3333"
},
"label":"FileSystem"
},
{
"id":"9999",
"type":"file_system",
"properties":{
"mount_point":"/var",
"name":"/dev/mapper/rhel-var",
"root_container":"3333"
},
"label":"FileSystem"
}
]
in order to have this kind of output:
[
{
"id":"3333",
"properties":{
"label":"Computer",
"name":"My-Laptop",
"file_system":[
"/opt",
"/var"
]
}
}
]
The idea is to have, in the new json structure, the visibility of my laptop with the two file-system partition in an array named "file_system".
As you can see the two partition are related to the first by the id and root_container.
So, imagine to have not only one laptop, bat thousands of laptop, with different id and every one of these have different partition, related to the laptop by the root_container key.
Is there an option to do this with jq functions or python script?
Many thanks
You could employ reduce to iterate over the items while extracting their id, mount_point and root_container. Then, if a root_container was present, delete that entry and add its mount_point to the entry whose id matches their root_container. For convenience, I also employed INDEX on the items' id fields to simplify their access as .[$id] and .[$root_container], which had to be undone at the end using map(.).
jq '
reduce .[] as {$id, properties: {$mount_point, $root_container}} (
INDEX(.id);
if $root_container then
del(.[$id])
| .[$root_container].properties.file_system += [$mount_point]
else . end
)
| map(.)
'
[
{
"id": "3333",
"properties": {
"label": "Computer",
"name": "My-Laptop",
"file_system": [
"/opt",
"/var"
]
}
}
]
Demo

Adding OR condition in VersionOne Query API

I am querying the V1 (/query.v1 API) via Python/Dash to get all stories tagged with certain tags.
The Where criteria for API Body is
"where": {
"TaggedWith":"Search-Module" ,
"Team.ID": "Team:009"
},
but I wanted to add OR criteria (something like assets tagged with "Search-Module OR Result-Module")
"where": {
"TaggedWith":"Search-Module;Result-Module" ,
"Team.ID": "Team:009"
},
The documentation in V1 is very basic and I am not able to find the correct way for additional criteria.
https://community.versionone.com/VersionOne_Connect/Developer_Library/Sample_Code/Tour_of_query.v1
Any pointers are appreciated.
You can set alternative values to a variable in the with property and use that variable within the where or filter property values:
{
"from": "Story",
"select": [
"Name"
],
"where": {
"Team.ID": "Team:009",
"TaggedWith": "$tags"
},
"with": {
"$tags": [
"Search-Module",
"Result-Module"
]
}
}
As an option, you can use , (comma) as a separator:
"with": {
"$tags": "Search-Module,Result-Module"
}
The last example of the multi-value variable (but for the rest-1.v1 endpoint) has been found in the VersionOne Grammar project.

Pymongo include only the fields which are starting with a name

For example, if this is my record
{
"_id":"123",
"name":"google",
"ip_1":"10.0.0.1",
"ip_2":"10.0.0.2",
"ip_3":"10.0.1",
"ip_4":"10.0.1",
"description":""}
I want to get only those fields starting with 'ip_'. Consider I have 500 fields & only 15 of them start with 'ip_'
Can we do something like this to get the output -
db.collection.find({id:"123"}, {'ip*':1})
Output -
{
"ip_1":"10.0.0.1",
"ip_2":"10.0.0.2",
"ip_3":"10.0.1",
"ip_4":"10.0.1"
}
The following aggregate query, using PyMongo, returns documents with the field names starting with "ip_".
Note the various aggregation operators used: $filter, $regexMatch, $objectToArray, $arrayToObject. The aggregation pipeline the two stages $project and $replaceWith.
pipeline = [
{
"$project": {
"ipFields": {
"$filter" : {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$regexMatch": { "input": "$$this.k" , "regex": "^ip" } }
}
}
}
},
{
"$replaceWith": { "$arrayToObject": "$ipFields" }
}
]
pprint.pprint(list(collection.aggregate(pipeline)))
I am unaware of a way to specify an expression that would decide which hash keys would be projected. MongoDB has projection operators but they deal with arrays and text search.
If you have a fixed possible set of ip fields, you can simply request all of them regardless of which fields are present in a particular document, e.g. project with
{ip_1: true, ip_2: true, ...}

get filtered embeded elements from all class on mongoengine

I got two class on Mongoengine:
class UserPoints(EmbeddedDocument):
user = ReferenceField(User, verbose_name='user')
points = IntField(verbose_name='points', required=True)
def __unicode__(self):
return self.points
And
class Local(Document):
token = StringField(max_length=250,verbose_name='token_identifier',unique=True)
points = ListField(EmbeddedDocumentField(UserPoints),required=False)
def __unicode__(self):
return self.name
If i do something like: "LP = Local.objects.filter(points__user=user)" I got all the locals with userpoints from my user. But i Want all the UserPoints from a User. How can i?
I try also: "lUs = UserPoints.objects.filter(user=user)" but i got an empty Array.
PD: I do something like this to solve the problem, but it's not efficient.
LDPoints = []
LP = Local.objects.filter(points__user=user)
print 'List P: '+str(len(LP))
for local in LP:
for points in local.points:
if points.user == user:
dPoints = parsePoints(points)
lDPoints.append(dPoints)
Adding to the original and getting venerable answer is that the aggregation framework has $filter now for some time, which is a lot cleaner that the $map and $setDifference method used in the original answer.
Local._get_collection().aggregate([
{ "$match": { "points.user": user } },
{ "$project": {
"token": 1,
"points": {
"$filter": {
"input": "$points",
"as": "el",
"cond": { "$eq": [ "$$el.user", user ] }
}
}
}}
])
The same principles apply though for obtaining "multiple" matches from an array in the collection you use the aggregate() method of the underlying driver, as called from _get_collection().
Original
The answer to avoid "filtering" your embedded documents for the selected "user" only is to use the aggregation framework. This allows you to manipulate the "array content" on the database server rather than filtering the results in your client code.
Aggregation is done with the raw pymongo driver methods, but since Mongoengine is built on top of this driver you access the raw collection object from your class with the ._get_collection() method:
Local._get_collection().aggregate([
# Match the documents that have the required user
{ "$match": {
"points.user": user
}},
# unwind the embedded array to de-normalize
{ "$unwind": "$points" },
# Matching now filters the elements
{ "$match": {
"points.user": user
}},
# Group back as an array
{ "$group": {
"_id": "$_id",
"token": { "$first": "$token" },
"points": { "$push": "$points" }
}}
])
If you have MongoDB 2.6 or greater on your server and your "user/points" combination is always unique you can alternately filter without the $unwind|$match|$group cycle using the $map and $setDifference operators available there:
Local._get_collection().aggregate([
# Match the documents that have the required user
{ "$match": {
"points.user": user
}},
# Filter the array in place
{ "$project": {
"token": 1,
"points": {
"$setDifference": [
{
"$map": {
"input": "$points",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.user", user ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
In the second case there the $cond is a ternary operator which takes a logical expression as it's first argument and the values to return when that expression is either true or false as it's other arguments. Inside the $map, each element is tested to see if the condition is true, in this case "is the user field equal to the selected user".
Either the content of that array position is returned or otherwise false. The $setDifference takes the resulting array and "filters" the false values out, so only the matching elements are returned.
In the legacy approach, the $unwind pipeline operator is used to effectively turn each array element into it's own document with all other parent properties. This allows you to apply the same $match condition, which unlike the initial query actually removes the documents which now as single elements no longer match your condition. You always want the first stage as there is no point processing this $unwind|$match combination on all of the documents that might not contain your matching condition.
The $group stage brings everything back into line per document. Using the $first option to return all other fields that were essentially duplicated by the $unwind and the $push operator to rebuild the array with the matching elements.
So while there no "built-in" methods to MongoEngine to do this sort of query, you can do this the MongoDB way by accessing the raw driver.
Also note that if you only expected one element to match in any array for your given "user" or other query, then you could alternately use the field projection form available to the raw driver as well. But the aggregation method is required for any more than one matching element of the array.

Categories

Resources