Best Match Search Json File - python

I have a json file with something like this:
{
"Cars": [
{
"name": 'Honda'
"color": 'Black'
"age": 'Old'
},
{
"name": 'Ford'
"color": ' Red'
"age": 'Old'
},
{
"name": 'Mazda'
"color": 'Black'
"age": 'New'
}
]
}
I want to write a function that will prompt the user to provide a name, color, and age, and then it will search through the json and return the objects that best match the search parameters. I also want the user to be able to only enter some of the parameters.
Examples:
User enters {'name':'Honda', 'color':None, 'age':None}, returns all the information for the Honda
User enters {'name':None, 'color':'Black', 'age':None}, returns all the information for the Honda and the Mazda
User enters {'name':None, 'color':'Black, 'age':'New'}, returns all of the information for the Mazda
I can't think of any programmatic ways to do this that aren't incredibly inefficient. The best thing I can think of is basically searching the whole Json for match's to the first key, then searching those for matches of the second key, then searching those for matches of the third key, then printing the resultant. Skipping keys that dont exist in the query payload would help, but that still seems really inefficient.
I'm not sure if I'm allowed to ask technique questions on here, so if the answer is that I should go build said inefficient function and then come back for info on how to improve it, that's fine, but I thought I might as well ask if there were any common examples first.

#lets say your json is stored in variable json_var
car_name=input("car name")
car_color=input("car color")
car_age=input("car_age")
car_list=[]
for cars in json_var["cars"]:
if (cars["name"]==car_name or cars["color"]==car_color or cars["age"]==car_age):
car_list.append(cars["name"])
print(car_list)

Related

Filter JSON using Jmespath and return value if expression exist, if it doesn't return None/Null (python)

How can I get JMESPath to only return the value in a json if it exists, if it doesn't exist return none/null. I am using JMESPath in a python application, below is an example of a simple JSON data.
{
"name": "Sarah",
"region": "south west",
"age": 21,
"occupation": "teacher",
"height": 145,
"education": "university",
"favouriteMovie": "matrix",
"gender": "female",
"country": "US",
"level": "medium",
"tags": [],
"data": "abc",
"moreData": "xyz",
"logging" : {
"systemLogging" : [ {
"enabled" : true,
"example" : [ "this", "is", "an", "example", "array" ]
} ]
}
}
For example I want it to check if the key "occupation" contains the word "banker" if it doesn't return null.
In this case if I do jmespath query "occupation == 'banker'" I would get false. However for more complicated jmespath queries like "logging.systemLogging[?enabled == `false`]" this would result in an empty array [] because it doesn't exist, which is what I want.
The reason I want it to return none or null is because in another part of the application (my base class) I have code that checks if the dictionary/json data will return a value or not, this piece of code iterates through an array of dictionaries/ json data like the one above.
One thing I've noticed with JMESPath is that it is inconsistent with its return value. In more complicated dictionaries I am able to achieve what I want but from simple dictionaries I can't, also If you used a methods, e.g starts_with, it returns a boolean but if you just use an expression it returns the value you are looking for if it exists otherwise it will return None or an empty array.
This is traditionally accomplished by:
dictionary = json.loads(my_json)
dictionary.get(key, None) # None is the default value that is returned.
That will work if you know the exact structure to expect from the json. Alternatively you can make two calls to JMESpath, using one to try to get the value / None / empty list, and one to run the query you want.
The problem is that JMESpath is trying to answer your query: Does this structure contain this information pattern? It makes sense that the result of such a query should be True/False. If you want to get something like an empty list back, you need to modify your query to ask "Give me back all instances where this structure contains the information I'm looking for" or "Give me back the first instance where this structure contains the information I'm looking for."
Filters in JMESPath do apply to arrays (or list, to speak in Python).
So, indeed, your case is not a really common one.
This said, you can create an array out of a hash (or dictionary, to speak in Python again) using the to_array function.
Then, since you do know you started from a hash, you can select back the first element of the created array, and indeed, if the array ends up being empty, it will return a null.
To me, at least, it looks consistant, an array can be empty [], but an empty object is a null.
To use this trick, though, you will also have to reset the projection you created out of the array, with the pipe expression:
Projections are an important concept in JMESPath. However, there are times when projection semantics are not what you want. A common scenario is when you want to operate of the result of a projection rather than projecting an expression onto each element in the array.
For example, the expression people[*].first will give you an array containing the first names of everyone in the people array. What if you wanted the first element in that list? If you tried people[*].first[0] that you just evaluate first[0] for each element in the people array, and because indexing is not defined for strings, the final result would be an empty array, []. To accomplish the desired result, you can use a pipe expression, <expression> | <expression>, to indicate that a projection must stop.
Source: https://jmespath.org/tutorial.html#pipe-expressions
And so, with all this, the expression ends up being:
to_array(#)[?occupation == `banker`]|[0]
Which gives
null
On your example JSON, while the expression
to_array(#)[?occupation == `teacher`]|[0]
Would return your existing object, so:
{
"name": "Sarah",
"region": "south west",
"age": 21,
"occupation": "teacher",
"height": 145,
"education": "university",
"favouriteMovie": "matrix",
"gender": "female",
"country": "US",
"level": "medium",
"tags": [],
"data": "abc",
"moreData": "xyz",
"logging": {
"systemLogging": [
{
"enabled": true,
"example": [
"this",
"is",
"an",
"example",
"array"
]
}
]
}
}
And following this trick, all your other test will probably start to work e.g.
to_array(#)[?starts_with(occupation, `tea`)]|[0]
will give you back your object
to_array(#)[?starts_with(occupation, `ban`)]|[0]
will give you a null
And if you only need the value of the occupation property, as you are falling back to a hash now, it is as simple as doing, e.g.
to_array(#)[?starts_with(occupation, `tea`)]|[0].occupation
Which gives
"teacher"
to_array(#)[?starts_with(occupation, `ban`)]|[0].occupation
Which gives
null

Assigning python dictionary's nested value without mentioning the immediate key

I have dozens of lines to update values in nested dictionary like this:
dictionary["parent-key"]["child-key"] = [whatever]
And that goes with different parent-key for each lines, but it always has the same child-keys.
Also, the [whatever] part is written in unique manner for each lines, so the simple recursion isn't the option here. (Although one might suggest to make a separate lists of value to be assigned, and assign them to each dictionary entry later on.)
Is there a way do the same but in even shorter manner to avoid duplicated part of the code?
I'd be happy if it could be written something like this:
update_child_val("parent-key") = [whatever]
By the way, that [whatever] part that I'm assigning will be a long and complicated code, therefore I don't wish to use function such as this:
def update_child_val(parent_key, child_val):
dictionary[parent_key]["child-key"] = child_val
update_child_val("parent-key", [whatever])
Specific Use Case:
I'm making ETL to convert database's table into CSV, and this is the part of the process. I wrote some bits of example below.
single_item_template = {
# Unique values will be assigned in place of `None`later
"name": {
"id": "name",
"name": "Product Name",
"val": None
},
"price": {
"id": "price",
"name": "Product Price (pre-tax)",
"val": None
},
"tax": {
"id": "tax",
"name": "Sales Tax",
"val": 10
},
"another column id": {
"id": "another column id",
"name": "another 'name' for this column",
"val": "another 'val' for this column"
},
..
}
And I have a separate area to assign values to the copy of the dictionary single_item_template for the each row of source database table.
for table_row in table:
item = Item(table_row)
Item class here will return the copy of dictionary single_item_template with updated values assigned for item[column][val]. And each of vals will involve unique process for changing values in setter function within the given class such as
self._item["name"]["val"] = table_row["prod_name"].replace('_', ' ')
self._item["price"]["val"] = int(table_row["price_0"].replace(',', ''))
..
etcetera, etcetera.
In above example, self._item can be shortened easily by assigning it to variable, but I was wondering if I could also save the last five character ["val"].
(..or putting the last logic part as a string and eval later, which I really really do not want to do.)
(So basically all I'm saying here is that I'm lazy typing out ["val"], but I don't bother doing it either. Although I was still interested if there's such thing while I'm not even sure such thing exists in programming in general..)
While you can't get away from doing the work, you can abstract it away in a couple of different ways.
Let's say you have a mapping of parent IDs to intended value:
values = {
'name': None,
'price': None,
'tax': 10,
'[another column id]': "[another 'val' for this column]"
}
Setting all of these at once is only two lines of code:
for parent, val in values.items():
dictionary[parent]['val'] = val
Unfortunately there isn't an easy or legible way to transform this into a dict comprehension. You can easily put this into a utility function that will turn it into a one-line call:
def set_children(d, parents, values, child='val'):
for parent, values in zip(parents, values):
d[parent][child] = value
set_children(dictionary, values.keys(), values.values())
In this case, your values mapping will encode the transformations you want to perform:
values = {
'name': table_row["prod_name"].replace('_', ' '),
'price': int(table_row["price_0"].replace(',', '')),
...
}

How to remove same condition for updating dictionary

How can I remove the if condition which is repeating again and again?
if input["custom_fields"].get("billing_notes", None):
billing_notes.update({"value": input["custom_fields"]["billing_notes"]})
work_order_number = {
"name": "work_order_number",
"label": "Work Order Number",
}
if input["custom_fields"].get("work_order_number", None):
work_order_number.update({"value": input["custom_fields"]["work_order_number"]})
contact_name_for_billing = {
"name": "contact_name_for_billing",
"label": "Contact Name For Billing",
}
if input["custom_fields"].get("contact_name_for_billing", None):
contact_name_for_billing.update({"value": input["custom_fields"]["contact_name_for_billing"]})
Here in each dictionary the name and label keys will be always there but if the user entered the value for its related dictionary then only at that time it should be updated but in this case, the same logic is repeating again and again so how can I do this without repeating the same code
One way you can perform the above action is to have a dictionary of dictionaries like so,
update_dict = {
"billing_notes": {...}
"work_order_number": {...},
"contact_name_for_billing": {...}
}
And later, you can just loop through them and update, further considering you are using text and actual variable names at places, this could be beneficial in fact.
for (key, udict) in update_dict.items():
if input["custom_fields"].get(key, None):
udict.update({"value": input["custom_fields"][key]})
In my knowledge, I do not see any other plausible simpler way to do this. Hope you find this answer useful. Do drop questions in the comments.

Creating a nested lists of lists in python ... (really csv -> json conversion)

I've just been pounding at this problem which should be easy -- I'm just very new to Python which is required in this case.
I'm readying in a .csv file and trying to created a nested structure so that json.dumps gives me a pretty nice nested .json file.
The result json is actually six levels deep but I thought if I could get the bottom two working the rest would be the same. The input is working just great as I've ended up with job['fieldname'] for building the structure. The problem is getting the result to nest.
Ultimately I want:
"PAYLOAD": {
"TEST": [
{
"JOB_ONE": {
"details": {
"customerInformation": {
"lastName": "Chun",
"projectName": "N Pacific Recovery",
"firstName": "Wally",
"secondaryPhoneNumber": ""
},
"description": "N Pacific Garbage Sweep",
"productType": "Service Generation",
"address": {
"city": "Bristol",
"zipCodePlusSix": "",
"stateName": "",
"zipCode": "53104",
"line1": "12709 789441th Ave",
"county": "",
"stateCode": "WI",
"usage": "NA",
"zipCodePlusFour": "",
"territory": "",
}
}
}
},
{
"JOB_TWO": {
"details": {
.... similar to JOB_ONE ....
}
}
}
}],
"environment": "N. Pacific",
"requestorName": "Waldo P Rossem",
"requestorEmail": "waldo# no where.com",
However, with the code below, which only deals with the "details section", I end up with a stack of all addresses, followed by all of the customer information. So, the loop is processing all the csv records and appending the addresses, and then looping csv records and appending the info.
for job in csv.DictReader(csv_file):
if not job['Cancelled']:
# actually have no idea how to get these two to work
details['description']: job['DESCRIBE']
details['projectType']: job['ProjectType']
# the following cycle through the customerInformation and then
# appends the addresses. So I end up with a large block of customer
# records and then a second block of their addresses
details['customerInformation'].append({
'lastName': "job[Lastname]",
'firstName': job['FirstName'],
'projectName':"N Pacific Prototype",
})
details['address'].append({
'city': job['City'],
'zipCode': job['Zip'],
'line1': job['Address'],
'stateCode': job['State'],
'market': job['Market']
})
What I am trying to understand is how to fix this loop and get the description and project type to appear in the right place AND setup the data structure so that the bottom flags are also properly structure for the final json dump.
This is largely due to my lack of experience with Python but unfortunately, its a requirement -- otherwise, I could have had it done hours ago using gawk!
Requested CSV follows:
Sure... took me a while to dummy it up as the above is an abbreviated snippet.
JobNumber,FirstName,Lastname,secondaryPhoneNumber,Market,Address,City,State,Zip,requestorName,requestorEmail,environment
22056,Wally,Fruitvale,,N. Pacific,81 Stone Church Rd,Little Compton,RI,17007,Waldo P Rossem,waldo# no where.com,N. Pacific
22057,William,Stevens,,Southwest,355 Vt Route 8a,Jacksonville,VT,18928,Waldo P Rossem,waldo# no where.com,N. Pacific
22058,Wallace,Chen,,Northeast,1385 Jepson Rd,Stamford,VT,19403,Waldo P Rossem,waldo# no where.com,N.
You can create the details dict as a literal vs. create and key assignment:
data = []
for job in csv.DictReader(csv_file):
if job['Cancelled']:
continue
details = {
'description': job['DESCRIBE'],
'projectType': job['ProjectType'],
'customerInformation' : {
'lastName': job['Lastname'],
'firstName': job['FirstName'],
...
},
...
}
data.append(details)
json_str = json.dumps(data)
I think all you need for your puzzle is to know a few basic things about dictionaries:
Initial assignment:
my_dict = {
"key1": "value1",
"key2": "value2",
...
}
Writing key/value pairs to an already initialized dict:
my_dict["key2"] = "new value"
Reading:
my_dict["key2"]
prints> "new value"
Looping keys:
for key in my_dict:
print(key)
prints> "key1"
prints> "key2"
Looping both key and value:
for key, value in my_dict.items():
...
Looping values only:
for value in my_dict.values():
...
If all you want is a JSON compatible dict, then you won't need much else than this, without me going into defaultdicts, tuple keys and so on - just know that it's worth reading up on that once you've figured out basic dicts, lists, tuples and sets.
Edit: One more thing: Even when new I think it's worth trying Jupyter notebook to explore your ideas in Python. I find it to be much faster to try things out and get the results back immediately, since you don't have to switch between editor and console.
You're not far off.
You first need to initialise details as a dict:
details = {}
Then add the elements you want:
details['description'] = job['DESCRIBE']
details['projectType'] = job['ProjectType']
Then for the nested ones:
details['customerInformation'] = {
'lastName': job['Lastname'],
'firstName': job['FirstName'],
'projectName':"N Pacific Prototype",
}
For more details on how to use dict: https://docs.python.org/3/library/stdtypes.html?highlight=dict#dict.
Then you can get the JSON with JSON.dumps(details) (documentation here: https://docs.python.org/3/library/json.html?highlight=json#json.dumps).
Or you can first gather all the details in a list, and then turn the list into a JSON string:
all_details = []
for job in ...:
(build details dict)
all_details.append(details)
output = JSON.dumps(all_details)

Python: TypeError in referencing item in JSON feed

First, here is a sample JSON feed that I want to read in Python 2.7 with either simplejson or the built in JSON decoder. I am loading the .json file in Python and then searching for a key like "Apple" or "Orange" and when that key is found, I want to bring in the information for it like the types and quantities.
Right now there is only 3 items, but I want to be able to search one that may have up to 1000 items. Here is the code:
{
"fruits": [
{
"Apple": [
{
"type": "Gala",
"quant": 5
},
{
"type": "Honeycrisp",
"quant": 10
},
{
"type": "Red Delicious",
"quant": 4
}
]
},
{
"Banana": [
{
"type": "Plantain",
"quant": 5
}
]
},
{
"Orange": [
{
"type": "Blood",
"quant": 3
},
{
"type": "Navel",
"quant": 20
}
]
}
]
}
My sample Python code is as follows:
import simplejson as json
# Open file
fjson = open('/home/teg/projects/test/fruits.json', 'rb')
f = json.loads(fjson.read())
fjson.close()
# Search for fruit
if 'Orange' in json.dumps(f):
fruit = f['fruits']['Orange']
print(fruit)
else:
print('Orange does not exist')
But whenever I test it out, it gives me this error:
TypeError: list indices must be integers, not str
Was it wrong to have me do a json.dumps and instead should I have just checked the JSON feed as-is from the standard json.loads? I am getting this TypeError because I am not specifying the list index, but what if I don't know the index of that fruit?
Do I have to first search for a fruit and if it is there, get the index and then reference the index before the fruit like this?
fruit = f['fruits'][2]['Orange']
If so, how would I get the index of that fruit if it is found so I could then pull in the information? If you think the JSON is in the wrong format as well and is causing this issue, then I am up for that suggestion as well. I'm stuck on this and any help you guys have would be great. :-)
Your f type is list, it's a list of dictionary's with sub dictionary.
if 'Orange' in json.dumps(f): Will iterate the list and look at each item for Orange.
The problem is that f['fruits'] is a list so it expects an int number (place)
and not a dictionary key like ['Orange']
I think you should check your structure like #kindall said, if you still want to extract Orange this code will do the trick:
for value in f['fruits']:
if 'Orange' in value:
print value['Orange']
The problem is that the data structure has a list enclosing the dictionaries. If you have any control over the data source, that's the place to fix it. Otherwise, the best course is probably to post-process the data after parsing it to eliminate these extra list structures and merge the dictionaries in each list into a single dictionary. If you use an OrderedDict you can even retain the ordering of the items (which is probably why the list was used).
The square bracket in the line "fruits": [ should tell you that the item associated with fruits is (in Python parlance) a list rather than a dict and so cannot be indexed directly with a string like 'Oranges'. It sounds like you want to create a dict of fruits instead. You could do this by reformatting the input.
Or, if the input format is fixed: each item in your fruits list currently has a very specific format. Each item is a dict with exactly one key, and those keys are not duplicated between items. If those rules can be relied upon, it's pretty easy to write a small search routine—or the following code will convert a list-of-dicts into a dict:
fruits = dict(sum([x.items() for x in f['fruits']], []))
print fruits['Orange']

Categories

Resources