Querying for values of embedded documents in MongoDB with PyMongo - python

I have a document in MongoDB that looks like this
{
"_id": 0,
"cash_balance": 50,
"holdings": [
{
"name": "item1",
"code": "code1",
"quantity": 300
},
{
"name": "item2",
"code": "code2",
"quantity": 100
}
]
}
I would like to query for this particular document and get the quantity value of the object inside the holdings array whose code matches "code1". It can be assumed that there will be a match.
data = collection.find_one({"_id": 0, "holdings.code": "code1"}, {"holdings.$.quantity": 1})
{ "_id": 0, "holdings": [{"name": "item1", "code": "code1", "quantity": 300}] }
Running the above code gives me this output and I can get the quantity value by using:
data["holdings"][0]["quantity]
300
However this seems to be a rather roundabout way of getting a single value. Is there a way I can query for the value of a particular key matching the code query without getting the holdings array containing the required object?

try to use the aggregate method with $unwind.
$unwind does the following:
Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
MongoDB documentation for $unwind
I created a playground example for you.

Related

How do I access nested elements inside a json array in python

I want to iterate over the below json array to extract all the referenceValues and the corresponding paymentIDs into one
{
"payments": [{
"paymentID": "xxx",
"externalReferences": [{
"referenceKind": "TRADE_ID",
"referenceValue": "xxx"
}, {
"referenceKind": "ID",
"referenceValue": "xxx"
}]
}, {
"paymentID": "xxx",
"externalReferences": [{
"referenceKind": "ID",
"referenceValue": "xxx"
}]
}]
}
The below piece only extracts in case of a single payment and single externalreferences. I want to be able to do it for multiple payments and multiple externalreferences as well.
payment_ids = []
for notification in notifications:
payments= [(payment[0], payment["externalReferences"][0]["referenceValue"])
for payment in notification[0][0]]
if payments[0][1] in invoice_ids:
payment_ids.extend([payment[0] for payment in payments])
Looking at your structure, first you have to iterate through every dictionary in payments, then iterate through their external references. So the below code should extract all reference values and their payment IDs to a dictionary (and append to a list)
refVals = [] # List of all reference values
for payment in data["payments"]:
for reference in payment["externalReferences"]:
refVals.append({ # Dictionary of the current values
"referenceValue": reference["referenceValue"], # The current reference value
"paymentID": payment["paymentID"] # The current payment ID
})
print(refVals)
This code should output a list of a dictionary with all reference values and their payment IDs, in the data dictionary (assuming you read your data into the data variable)

How do I update fields for multiple elements in an array with different values in MongoDB?

I have data of the form:
{
'_id': asdf123b51234
'field2': 0
'array': [
0: {
'unique_array_elem_id': id
'nested_field': {
'new_field_i_want_to_add': value
}
}
...
]
}
I have been trying to update like this:
for doc in update_dict:
collection.find_one_and_update(
{'_id':doc['_id']},
{'$set': {
'array.$[elem].nested_field.new_field_i_want_to_add':doc['new_field_value']
}
},
array_filters=[{'elem.unique_array_elem_id':doc['unique_array_elem_id']}]
But it is painfully slow. Updating all of my data will take several days running continuously. Is there a way to update this nested field for all array elements for a given document at once?
Thanks a lot

Multi-level Python Dict to Pandas DataFrame only processes one level out of many

I'm parsing some XML data, doing some logic on it, and trying to display the results in an HTML table. The dictionary, after filling, looks like this:
{
"general_info": {
"name": "xxx",
"description": "xxx",
"language": "xxx",
"prefix": "xxx",
"version": "xxx"
},
"element_count": {
"folders": 23,
"conditions": 72,
"listeners": 1,
"outputs": 47
},
"external_resource_count": {
"total": 9,
"extensions": {
"jar": 8,
"json": 1
},
"paths": {
"/lib": 9
}
},
"complexity": {
"over_1_transition": {
"number": 4,
"percentage": 30.769
},
"over_1_trigger": {
"number": 2,
"percentage": 15.385
},
"over_1_output": {
"number": 4,
"percentage": 30.769
}
}
}
Then I'm using pandas to convert the dictionary into a table, like so:
data_frame = pandas.DataFrame.from_dict(data=extracted_metrics, orient='index').stack().to_frame()
The result is a table that is mostly correct:
While the first and second levels seem to render correctly, those categories with a sub-sub category get written as a string in the cell, rather than as a further column. I've also tried using stack(level=1) but it raises an error "IndexError: Too many levels: Index has only 1 level, not 2". I've also tried making it into a series with no luck. It seems like it only renders "complete" columns. Is there a way of filling up the empty spaces in the dictionary before processing?
How can I get, for example, external_resource_count -> extensions to have two daughter rows jar and json, with an additional column for the values, so that the final table looks like this:
Extra credit if anyone can tell me how to get rid of the first row with the index numbers. Thanks!
The way you load the dataframe is correct but you should rename the 0 to a some column name.
# this function extracts all the keys from your nested dicts
def explode_and_filter(df, filterdict):
return [df[col].apply(lambda x:x.get(k) if type(x)==dict else x).rename(f'{k}')
for col,nested in filterdict.items()
for k in nested]
data_frame = pd.DataFrame.from_dict(data= extracted_metrics, orient='index').stack().to_frame(name='somecol')
#lets separate the rows where a dict is present & explode only those rows
mask = data_frame.somecol.apply(lambda x:type(x)==dict)
expp = explode_and_filter(data_frame[mask],
{'somecol':['jar', 'json', '/lib', 'number', 'percentage']})
# here we concat the exploded series to a frame
exploded_df = pd.concat(expp, axis=1).stack().to_frame(name='somecol2').reset_index(level=2)\.rename(columns={'level_2':'somecol'})
# and now we concat the rows with dict elements with the rows with non dict elements
out = pd.concat([data_frame[~mask], exploded_df])
The output dataframe looks like this

how to check whether the comma-separated value in the database is present in JSON data or not using python

I just have to check the JSON data on the basis of comma-separated e_code in the table.
how to filter only that data where users e_codes are available
in the database:
id email age e_codes
1. abc#gmail 19 123456,234567,345678
2. xyz#gmail 31 234567,345678,456789
This is my JSON data
[
{
"ct": 1,
"e_code": 123456,
},
{
"ct": 2,
"e_code": 234567,
},
{
"ct": 3,
"e_code": 345678,
},
{
"ct": 4,
"e_code": 456789,
},
{
"ct": 5,
"e_code": 456710,
}
]
If efficiency is not an issue, you could loop through the table, split the values to a list by using case['e_codes'].split(',') and then, for each code loop through the JSON to see whether it is present.
This might be a little inefficient if your data, JSON, or number of values are long.
It might be better to first create a lookup dictionary in which the codes are the keys:
lookup={}
for e in my_json:
lookup[e['e_code']] = 1
You can then check how many of the codes in your table are actually in the JSON:
## Let's assume that the "e_codes" cell of the
## current line is data['e_codes'][i], where i is the line number
for i in lines:
match = [0,0]
for code in data['e_codes'][i].split(','):
try:
match[0]+=lookup[code]
match[1]+=1
except:
match[1]+=1
if match[1]>0: share_present=match[0]/match[1]
For each case, you get a share_present, which is 1.0 if all codes appear in the JSON, 0.0 if none of them do and some value between to indicate the share of codes that were present. Depending on your threshold for keeping a case you can set a filter to True or False depending on this value.

SQLite JSON query to count number of items in nested list

I'm trying to query a database in which a column contains JSON data. I am using python and the json1 extension to pull the queries into my python code.
A sample of the data within the json inside the database is below.
"perf": {
"timestamp": 1575933555,
"frame": 0,
"type": "BEST",
"azimuth": 0
},
"dots": [{
"a": -1.6,
"b": -6.4,
"c": -0.1,
"int": 72
}, {
"a": -1.9,
"b": -6.4,
"c": 0.0,
"int": 60
}]
}
I am attempting to count the number of items within the "dots" nested list. In this case there are two, although there could be more or less depending on the data within the row.
SELECT COUNT(json_extract(json, ("$.dots"))) FROM json_database;
Returns the total number of rows within the database. It does not dive into the "dots" list to count the number of items with it. Trying to index the $.dots[i] does not return the total number of elements either.
How does one count the total number of items within a nested list as sqlite query?
You missed seeing json_array_length() in the documentation. Try this:
SELECT json_array_length(json, '$.dots') AS count
FROM json_database;

Categories

Resources