MongoDB search for each dict in list in collection - python

I have a collection containing a list of dicts and I want to search if any dict contains two specific key:values.
So for example I want to find_one where a dict contains a specific first and last names. This is my collection:
{
"names": [
{
"firstName": "bob",
"lastName": "jones",
"age": "34",
"gender": "m"
},
{
"firstName": "alice",
"lastName": "smith",
"age": "56",
"gender": "f"
},
{
"firstName": "bob",
"lastName": "smith",
"age": "19",
"gender": "m"
},
]
}
I want to see if there is a record with bob smith as first and last names, I am searching this as:
first = 'bob'
last = 'smith'
nameExists = db.user.find_one({'$and':[{'names.firstName':first,'names.lastName':last}]})
Would this query retrieve the one record for bob smith?

While it is mentioned that indeed the $and operator is not required, in either form this is not the query that you want. Consider the following:
db.user.find_one({ 'names.firstName': 'alice','names.lastName': 'jones' })
This in fact does match the given record as there are both elements with "firstName" equal to "alice" and "lastName" values equal to "jones". But of course the problem here is simple in that there is no actual element in the array that has a sub-document for both of those values.
In order to match where an array element contains "both" the criteria given, you need to use the $elemMatch operator. This applies the query condition to the "elements" of the array.
db.user.find_one({
'names': { '$elemMatch': { 'firstName': 'alice','lastName': 'smith' }
})
And of course if you tried "alice" and "jones" then that would not match as no element actually contains that operation.

Almost!
nameExists = db.user.find_one({'$and':[{'names.firstName':first},{'names.lastName':last}]})
You need to separate the asks into separate {} brackets.

You don´t even need to add the $and parameter. In mongoDB, comma separated fields inside a query are joined by an implicit AND operator, so using simply {'names.firstName':first,'names.lastName':last} inside the find_one will work.
Anyway, that´s only a "clean code" fix; Your code will work properly as you are doing an "and" operation with just one element (note that the list used for the parameter $and contains only one dictionary).

Related

SQLalchemy +Postgresql accessing Array of jsonb elements

I have a query that looks like this.
query = db.query(status_cte.c.finding_status_history)
the finding_status_history column is of type array when I check its .type. It's an array of jsonb objects I can easily change it to be json instead if it's easier. I've also tested this out with it as json.
[
{
"data": [
{
"status": "closed",
"created_at": "2023-01-27T18:05:27.579817",
"previous_status": "open"
},
{
"status": "open",
"created_at": "2023-01-27T18:05:28.694352",
"previous_status": "closed"
}
]
},
...
]
I'm trying to access the first dictionary nested inside data and access the status column.
I've tried to grab it using query = db.query(status_cte.c.finding_status_history[0]) but this returns a list of empty dictionaries like so.
[
{},
{},
{},
{},
{},
{},
{}
]
I'm not sure why that doesn't work as its my impression that i should grab the first entry. I'm assuming i need to access "data" some how first but i've also tried...
query = db.query(status_cte.c.finding_status_history.op('->>')('data')
Which gives me jsonb[] ->> unknown operator doesn't exist. I've tried to type cast data to be that of String and i get the same error but jsonb[] ->> String etc etc
Also when looping through the items for item in query.all() i'm seeing that [0] results in (None,) and [1] results in
({
"status": "closed",
"created_at": "2023-01-27T18:05:27.579817",
"previous_status": "open"
},)
as a tuple...
The secret was that [0] is not the first element. [1] is also noted that [-1] doesn't appear to give me the last element so i also had to order my aggregated json objects.

Filter JSON using Jmespath and return value if expression exist, if it doesn't return None/Null (python)

How can I get JMESPath to only return the value in a json if it exists, if it doesn't exist return none/null. I am using JMESPath in a python application, below is an example of a simple JSON data.
{
"name": "Sarah",
"region": "south west",
"age": 21,
"occupation": "teacher",
"height": 145,
"education": "university",
"favouriteMovie": "matrix",
"gender": "female",
"country": "US",
"level": "medium",
"tags": [],
"data": "abc",
"moreData": "xyz",
"logging" : {
"systemLogging" : [ {
"enabled" : true,
"example" : [ "this", "is", "an", "example", "array" ]
} ]
}
}
For example I want it to check if the key "occupation" contains the word "banker" if it doesn't return null.
In this case if I do jmespath query "occupation == 'banker'" I would get false. However for more complicated jmespath queries like "logging.systemLogging[?enabled == `false`]" this would result in an empty array [] because it doesn't exist, which is what I want.
The reason I want it to return none or null is because in another part of the application (my base class) I have code that checks if the dictionary/json data will return a value or not, this piece of code iterates through an array of dictionaries/ json data like the one above.
One thing I've noticed with JMESPath is that it is inconsistent with its return value. In more complicated dictionaries I am able to achieve what I want but from simple dictionaries I can't, also If you used a methods, e.g starts_with, it returns a boolean but if you just use an expression it returns the value you are looking for if it exists otherwise it will return None or an empty array.
This is traditionally accomplished by:
dictionary = json.loads(my_json)
dictionary.get(key, None) # None is the default value that is returned.
That will work if you know the exact structure to expect from the json. Alternatively you can make two calls to JMESpath, using one to try to get the value / None / empty list, and one to run the query you want.
The problem is that JMESpath is trying to answer your query: Does this structure contain this information pattern? It makes sense that the result of such a query should be True/False. If you want to get something like an empty list back, you need to modify your query to ask "Give me back all instances where this structure contains the information I'm looking for" or "Give me back the first instance where this structure contains the information I'm looking for."
Filters in JMESPath do apply to arrays (or list, to speak in Python).
So, indeed, your case is not a really common one.
This said, you can create an array out of a hash (or dictionary, to speak in Python again) using the to_array function.
Then, since you do know you started from a hash, you can select back the first element of the created array, and indeed, if the array ends up being empty, it will return a null.
To me, at least, it looks consistant, an array can be empty [], but an empty object is a null.
To use this trick, though, you will also have to reset the projection you created out of the array, with the pipe expression:
Projections are an important concept in JMESPath. However, there are times when projection semantics are not what you want. A common scenario is when you want to operate of the result of a projection rather than projecting an expression onto each element in the array.
For example, the expression people[*].first will give you an array containing the first names of everyone in the people array. What if you wanted the first element in that list? If you tried people[*].first[0] that you just evaluate first[0] for each element in the people array, and because indexing is not defined for strings, the final result would be an empty array, []. To accomplish the desired result, you can use a pipe expression, <expression> | <expression>, to indicate that a projection must stop.
Source: https://jmespath.org/tutorial.html#pipe-expressions
And so, with all this, the expression ends up being:
to_array(#)[?occupation == `banker`]|[0]
Which gives
null
On your example JSON, while the expression
to_array(#)[?occupation == `teacher`]|[0]
Would return your existing object, so:
{
"name": "Sarah",
"region": "south west",
"age": 21,
"occupation": "teacher",
"height": 145,
"education": "university",
"favouriteMovie": "matrix",
"gender": "female",
"country": "US",
"level": "medium",
"tags": [],
"data": "abc",
"moreData": "xyz",
"logging": {
"systemLogging": [
{
"enabled": true,
"example": [
"this",
"is",
"an",
"example",
"array"
]
}
]
}
}
And following this trick, all your other test will probably start to work e.g.
to_array(#)[?starts_with(occupation, `tea`)]|[0]
will give you back your object
to_array(#)[?starts_with(occupation, `ban`)]|[0]
will give you a null
And if you only need the value of the occupation property, as you are falling back to a hash now, it is as simple as doing, e.g.
to_array(#)[?starts_with(occupation, `tea`)]|[0].occupation
Which gives
"teacher"
to_array(#)[?starts_with(occupation, `ban`)]|[0].occupation
Which gives
null

How to extract data from json and add additional values to the extracted values using python?

I want to parse the value from json response using python and assign additional value to the list
{ "form": [{ "box": [60,120,260,115], "text": "hello", "label": "question", "words": [{ "box": [90,190,160,215 ],"text": "hello"} ], "linking": [[0,13]],"id": 0 }]}
I am trying to parse the value and assign to a variable using python. What I am trying to achieve is:
If the actual output is ([60,120,260,115],hello) I wanted to add few more values to the list: Thus expected output should be:
([60,120,260,120,260,115,60,115],hello)
try this:
tmp_json = { "form": [{ "box": [60,120,260,115], "text": "hello", "label": "question", "words": [{ "box": [90,190,160,215 ],"text": "hello"} ], "linking": [[0,13]],"id": 0 }]}
# Then do whatever you need to do with the list by accessing it as follows
# tmp_json["form"][0]["box"]
you can iterate through all elements of list here and if each item matches required condition extend the existing list with required values.
# Pseudocode
for item in data["form"]:
# check each item's box attribute has all such elements i.e 60,120,260,115
# AND item's text attribute has value "hello"
# If matches then to add extra values to box list you can use <list>.extend([115, 120 etc])
# e.g item["box"].extend([120, 115, 260])

Assigning python dictionary's nested value without mentioning the immediate key

I have dozens of lines to update values in nested dictionary like this:
dictionary["parent-key"]["child-key"] = [whatever]
And that goes with different parent-key for each lines, but it always has the same child-keys.
Also, the [whatever] part is written in unique manner for each lines, so the simple recursion isn't the option here. (Although one might suggest to make a separate lists of value to be assigned, and assign them to each dictionary entry later on.)
Is there a way do the same but in even shorter manner to avoid duplicated part of the code?
I'd be happy if it could be written something like this:
update_child_val("parent-key") = [whatever]
By the way, that [whatever] part that I'm assigning will be a long and complicated code, therefore I don't wish to use function such as this:
def update_child_val(parent_key, child_val):
dictionary[parent_key]["child-key"] = child_val
update_child_val("parent-key", [whatever])
Specific Use Case:
I'm making ETL to convert database's table into CSV, and this is the part of the process. I wrote some bits of example below.
single_item_template = {
# Unique values will be assigned in place of `None`later
"name": {
"id": "name",
"name": "Product Name",
"val": None
},
"price": {
"id": "price",
"name": "Product Price (pre-tax)",
"val": None
},
"tax": {
"id": "tax",
"name": "Sales Tax",
"val": 10
},
"another column id": {
"id": "another column id",
"name": "another 'name' for this column",
"val": "another 'val' for this column"
},
..
}
And I have a separate area to assign values to the copy of the dictionary single_item_template for the each row of source database table.
for table_row in table:
item = Item(table_row)
Item class here will return the copy of dictionary single_item_template with updated values assigned for item[column][val]. And each of vals will involve unique process for changing values in setter function within the given class such as
self._item["name"]["val"] = table_row["prod_name"].replace('_', ' ')
self._item["price"]["val"] = int(table_row["price_0"].replace(',', ''))
..
etcetera, etcetera.
In above example, self._item can be shortened easily by assigning it to variable, but I was wondering if I could also save the last five character ["val"].
(..or putting the last logic part as a string and eval later, which I really really do not want to do.)
(So basically all I'm saying here is that I'm lazy typing out ["val"], but I don't bother doing it either. Although I was still interested if there's such thing while I'm not even sure such thing exists in programming in general..)
While you can't get away from doing the work, you can abstract it away in a couple of different ways.
Let's say you have a mapping of parent IDs to intended value:
values = {
'name': None,
'price': None,
'tax': 10,
'[another column id]': "[another 'val' for this column]"
}
Setting all of these at once is only two lines of code:
for parent, val in values.items():
dictionary[parent]['val'] = val
Unfortunately there isn't an easy or legible way to transform this into a dict comprehension. You can easily put this into a utility function that will turn it into a one-line call:
def set_children(d, parents, values, child='val'):
for parent, values in zip(parents, values):
d[parent][child] = value
set_children(dictionary, values.keys(), values.values())
In this case, your values mapping will encode the transformations you want to perform:
values = {
'name': table_row["prod_name"].replace('_', ' '),
'price': int(table_row["price_0"].replace(',', '')),
...
}

how do I compare 2 json files and fetch the difference of only 2 key/value pairs and print them using python

I have 2 similar json files like below with the same keys. I need to find the difference of only one key in both the files (id_number) and store the name if there is a difference. Is there any way to do that?
[
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
Load the two files into dicts, step through them with a loop and on each iteration compare the id_number of each. If they're different, output the name field.
set(x.keys()) ^ set(y.keys())
something like that it will eliminate you the different key

Categories

Resources