I have a JSON file that looks like this:
{
"returnCode": 200,
"message": "OK",
“people”: [
{
“details: {
"first": “joe”,
“last”: doe,
“id”: 1234567,
},
“otheDetails”: {
“employeeNum”: “0000111222”,
“res”: “USA”,
“address”: “123 main street”,
},
“moreDetails”: {
“family”: “yes”,
“siblings”: “no”,
“home”: “USA”,
},
},
{
“details: {
"first": “jane”,
“last”: doe,
“id”: 987654321,
},
“otheDetails”: {
“employeeNum”: “222333444”,
“res”: “UK”,
“address”: “321 nottingham dr”,
},
“moreDetails”: {
“family”: “yes”,
“siblings”: “yes”,
“home”: “UK,
},
}
This shows two entries, but really there are hundreds or more. I do not know the number of entries at the time the code is run.
My goal is to iterate through each entry and get the 'id' under "details". I load the JSON into a python dict named 'data' and am able to get the first 'id' by:
data['people'][0]['details']['id']
I can then get the second 'id' by incrementing the '0' to '1'. I know I can set i = 0 and then increment i, but since I do not know the number of entries, this does not work. Is there a better way?
Less pythonic then a list comprehension, but a simple for loop will work here.
You can first calculate the number of people in the people list and then loop over the list, pulling out each id at each iteration:
id_list = []
for i in range(len(data['people'])):
id_list.append(data['people'][i]['details']['id'])
You can use dict.get method in a list comprehension to avoid getting a KeyError on id. This way, you can fill dictionaries without ids with None:
ids = [dct['details'].get('id') for dct in data['people']]
If you still get KeyError, then that probably means some dcts in data['people'] don't have details key. In that case, it might be better to wrap this exercise in try/except. You may also want to identify which dcts don't have details key, which can be gathered using error_dct list (which you can uncomment out from below).
ids = []
#error_dct = []
for dct in data['people']:
try:
ids.append(dct['details']['id'])
except KeyError:
ids.append(None)
#error_dct.append(dct)
Output:
1234567
987654321
Related
I am getting data from an API and storing it in json format. The data I pull is in a list of dictionaries. I am using Python. My task is to only grab the information from the dictionary that matches the ticker symbol.
This is the short version of my data printing using json dumps
[
{
"ticker": "BYDDF.US",
"name": "BYD Co Ltd-H",
"price": 25.635,
"change_1d_prc": 9.927101200686117
},
{
"ticker": "BYDDY.US",
"name": "BYD Co Ltd ADR",
"price": 51.22,
"change_1d_prc": 9.843448423761526
},
{
"ticker": "TSLA.US",
"name": "Tesla Inc",
"price": 194.7,
"change_1d_prc": 7.67018746889343
}
]
Task only gets the dictionary for ticker = TSLA.US. If possible, only get the price associated with this ticker.
I am unaware of how to reference "ticker" or loop through all of them to get the one I need.
I tried the following, but it says that its a string, so it doesn't work:
if "ticker" == "TESLA.US":
print(i)
Try (mylist is your list of dictionaries)
for entry in mylist:
print(entry['ticker'])
Then try this to get what you want:
for entry in mylist:
if entry['ticker'] == 'TSLA.US':
print(entry)
This is a solution that I've seen divide the python community. Some say that it's a feature and "very pythonic"; others say that it's a bad design choice we're stuck with now, and bad practice. I'm personally not a fan, but it is a way to solve this problem, so do with it what you will. :)
Python function loops aren't a new scope; the loop variable persists even after the loop. So, either of these are a solution. Assuming that your list of dictionaries is stored as json_dict:
for target_dict in json_dict:
if target_dict["ticker"] == "TESLA.US":
break
At this point, target_dict will be the dictionary you want.
It is possible to iterate through a list of dictionaries using a for loop.
for stock in list:
if stock["ticker"] == "TSLA.US":
return stock["price"]
This essentially loops through every item in the list, and for each item (which is a dictionary) looks for the key "ticker" and checks if its value is equal to "TSLA.US". If it is, then it returns the value associated with the "price" key.
I have dozens of lines to update values in nested dictionary like this:
dictionary["parent-key"]["child-key"] = [whatever]
And that goes with different parent-key for each lines, but it always has the same child-keys.
Also, the [whatever] part is written in unique manner for each lines, so the simple recursion isn't the option here. (Although one might suggest to make a separate lists of value to be assigned, and assign them to each dictionary entry later on.)
Is there a way do the same but in even shorter manner to avoid duplicated part of the code?
I'd be happy if it could be written something like this:
update_child_val("parent-key") = [whatever]
By the way, that [whatever] part that I'm assigning will be a long and complicated code, therefore I don't wish to use function such as this:
def update_child_val(parent_key, child_val):
dictionary[parent_key]["child-key"] = child_val
update_child_val("parent-key", [whatever])
Specific Use Case:
I'm making ETL to convert database's table into CSV, and this is the part of the process. I wrote some bits of example below.
single_item_template = {
# Unique values will be assigned in place of `None`later
"name": {
"id": "name",
"name": "Product Name",
"val": None
},
"price": {
"id": "price",
"name": "Product Price (pre-tax)",
"val": None
},
"tax": {
"id": "tax",
"name": "Sales Tax",
"val": 10
},
"another column id": {
"id": "another column id",
"name": "another 'name' for this column",
"val": "another 'val' for this column"
},
..
}
And I have a separate area to assign values to the copy of the dictionary single_item_template for the each row of source database table.
for table_row in table:
item = Item(table_row)
Item class here will return the copy of dictionary single_item_template with updated values assigned for item[column][val]. And each of vals will involve unique process for changing values in setter function within the given class such as
self._item["name"]["val"] = table_row["prod_name"].replace('_', ' ')
self._item["price"]["val"] = int(table_row["price_0"].replace(',', ''))
..
etcetera, etcetera.
In above example, self._item can be shortened easily by assigning it to variable, but I was wondering if I could also save the last five character ["val"].
(..or putting the last logic part as a string and eval later, which I really really do not want to do.)
(So basically all I'm saying here is that I'm lazy typing out ["val"], but I don't bother doing it either. Although I was still interested if there's such thing while I'm not even sure such thing exists in programming in general..)
While you can't get away from doing the work, you can abstract it away in a couple of different ways.
Let's say you have a mapping of parent IDs to intended value:
values = {
'name': None,
'price': None,
'tax': 10,
'[another column id]': "[another 'val' for this column]"
}
Setting all of these at once is only two lines of code:
for parent, val in values.items():
dictionary[parent]['val'] = val
Unfortunately there isn't an easy or legible way to transform this into a dict comprehension. You can easily put this into a utility function that will turn it into a one-line call:
def set_children(d, parents, values, child='val'):
for parent, values in zip(parents, values):
d[parent][child] = value
set_children(dictionary, values.keys(), values.values())
In this case, your values mapping will encode the transformations you want to perform:
values = {
'name': table_row["prod_name"].replace('_', ' '),
'price': int(table_row["price_0"].replace(',', '')),
...
}
I am brand new to python and have hit a roadblock I can't seem to figure out.
I have a list of values. This list could have 1 value or many.
['9589503164607', '9589503197375']
I need to output this in a json format. My current output looks like this:
"line_items": {"id": ["9589503164607", "9589503197375"]}
I need this:
{"line_items":[{"id":9589503164607},{"id":9589503197375}]}
Currently, I am using a dictionary for this value and the rest that go with this line. However, due to having duplicate keys ("id"), I feel this may be the wrong approach.
shop_data = {
"fulfillment": {
"location_id": cleanslid,
"tracking_number": trackingnumber,
"line_items": {
"id": iteminvids,
}
}
}
iteminvids is the list I referenced.
If anyone could point me in the right direction I would be so grateful!
Use a list comprehension to create a list of dictionaries.
"line_items": [{"id": item} for item in list_of_values]
If your original list is in lst, you can do
json.dumps({"line_items": [{"id": i} for i in lst}]})
I've just been pounding at this problem which should be easy -- I'm just very new to Python which is required in this case.
I'm readying in a .csv file and trying to created a nested structure so that json.dumps gives me a pretty nice nested .json file.
The result json is actually six levels deep but I thought if I could get the bottom two working the rest would be the same. The input is working just great as I've ended up with job['fieldname'] for building the structure. The problem is getting the result to nest.
Ultimately I want:
"PAYLOAD": {
"TEST": [
{
"JOB_ONE": {
"details": {
"customerInformation": {
"lastName": "Chun",
"projectName": "N Pacific Recovery",
"firstName": "Wally",
"secondaryPhoneNumber": ""
},
"description": "N Pacific Garbage Sweep",
"productType": "Service Generation",
"address": {
"city": "Bristol",
"zipCodePlusSix": "",
"stateName": "",
"zipCode": "53104",
"line1": "12709 789441th Ave",
"county": "",
"stateCode": "WI",
"usage": "NA",
"zipCodePlusFour": "",
"territory": "",
}
}
}
},
{
"JOB_TWO": {
"details": {
.... similar to JOB_ONE ....
}
}
}
}],
"environment": "N. Pacific",
"requestorName": "Waldo P Rossem",
"requestorEmail": "waldo# no where.com",
However, with the code below, which only deals with the "details section", I end up with a stack of all addresses, followed by all of the customer information. So, the loop is processing all the csv records and appending the addresses, and then looping csv records and appending the info.
for job in csv.DictReader(csv_file):
if not job['Cancelled']:
# actually have no idea how to get these two to work
details['description']: job['DESCRIBE']
details['projectType']: job['ProjectType']
# the following cycle through the customerInformation and then
# appends the addresses. So I end up with a large block of customer
# records and then a second block of their addresses
details['customerInformation'].append({
'lastName': "job[Lastname]",
'firstName': job['FirstName'],
'projectName':"N Pacific Prototype",
})
details['address'].append({
'city': job['City'],
'zipCode': job['Zip'],
'line1': job['Address'],
'stateCode': job['State'],
'market': job['Market']
})
What I am trying to understand is how to fix this loop and get the description and project type to appear in the right place AND setup the data structure so that the bottom flags are also properly structure for the final json dump.
This is largely due to my lack of experience with Python but unfortunately, its a requirement -- otherwise, I could have had it done hours ago using gawk!
Requested CSV follows:
Sure... took me a while to dummy it up as the above is an abbreviated snippet.
JobNumber,FirstName,Lastname,secondaryPhoneNumber,Market,Address,City,State,Zip,requestorName,requestorEmail,environment
22056,Wally,Fruitvale,,N. Pacific,81 Stone Church Rd,Little Compton,RI,17007,Waldo P Rossem,waldo# no where.com,N. Pacific
22057,William,Stevens,,Southwest,355 Vt Route 8a,Jacksonville,VT,18928,Waldo P Rossem,waldo# no where.com,N. Pacific
22058,Wallace,Chen,,Northeast,1385 Jepson Rd,Stamford,VT,19403,Waldo P Rossem,waldo# no where.com,N.
You can create the details dict as a literal vs. create and key assignment:
data = []
for job in csv.DictReader(csv_file):
if job['Cancelled']:
continue
details = {
'description': job['DESCRIBE'],
'projectType': job['ProjectType'],
'customerInformation' : {
'lastName': job['Lastname'],
'firstName': job['FirstName'],
...
},
...
}
data.append(details)
json_str = json.dumps(data)
I think all you need for your puzzle is to know a few basic things about dictionaries:
Initial assignment:
my_dict = {
"key1": "value1",
"key2": "value2",
...
}
Writing key/value pairs to an already initialized dict:
my_dict["key2"] = "new value"
Reading:
my_dict["key2"]
prints> "new value"
Looping keys:
for key in my_dict:
print(key)
prints> "key1"
prints> "key2"
Looping both key and value:
for key, value in my_dict.items():
...
Looping values only:
for value in my_dict.values():
...
If all you want is a JSON compatible dict, then you won't need much else than this, without me going into defaultdicts, tuple keys and so on - just know that it's worth reading up on that once you've figured out basic dicts, lists, tuples and sets.
Edit: One more thing: Even when new I think it's worth trying Jupyter notebook to explore your ideas in Python. I find it to be much faster to try things out and get the results back immediately, since you don't have to switch between editor and console.
You're not far off.
You first need to initialise details as a dict:
details = {}
Then add the elements you want:
details['description'] = job['DESCRIBE']
details['projectType'] = job['ProjectType']
Then for the nested ones:
details['customerInformation'] = {
'lastName': job['Lastname'],
'firstName': job['FirstName'],
'projectName':"N Pacific Prototype",
}
For more details on how to use dict: https://docs.python.org/3/library/stdtypes.html?highlight=dict#dict.
Then you can get the JSON with JSON.dumps(details) (documentation here: https://docs.python.org/3/library/json.html?highlight=json#json.dumps).
Or you can first gather all the details in a list, and then turn the list into a JSON string:
all_details = []
for job in ...:
(build details dict)
all_details.append(details)
output = JSON.dumps(all_details)
First, here is a sample JSON feed that I want to read in Python 2.7 with either simplejson or the built in JSON decoder. I am loading the .json file in Python and then searching for a key like "Apple" or "Orange" and when that key is found, I want to bring in the information for it like the types and quantities.
Right now there is only 3 items, but I want to be able to search one that may have up to 1000 items. Here is the code:
{
"fruits": [
{
"Apple": [
{
"type": "Gala",
"quant": 5
},
{
"type": "Honeycrisp",
"quant": 10
},
{
"type": "Red Delicious",
"quant": 4
}
]
},
{
"Banana": [
{
"type": "Plantain",
"quant": 5
}
]
},
{
"Orange": [
{
"type": "Blood",
"quant": 3
},
{
"type": "Navel",
"quant": 20
}
]
}
]
}
My sample Python code is as follows:
import simplejson as json
# Open file
fjson = open('/home/teg/projects/test/fruits.json', 'rb')
f = json.loads(fjson.read())
fjson.close()
# Search for fruit
if 'Orange' in json.dumps(f):
fruit = f['fruits']['Orange']
print(fruit)
else:
print('Orange does not exist')
But whenever I test it out, it gives me this error:
TypeError: list indices must be integers, not str
Was it wrong to have me do a json.dumps and instead should I have just checked the JSON feed as-is from the standard json.loads? I am getting this TypeError because I am not specifying the list index, but what if I don't know the index of that fruit?
Do I have to first search for a fruit and if it is there, get the index and then reference the index before the fruit like this?
fruit = f['fruits'][2]['Orange']
If so, how would I get the index of that fruit if it is found so I could then pull in the information? If you think the JSON is in the wrong format as well and is causing this issue, then I am up for that suggestion as well. I'm stuck on this and any help you guys have would be great. :-)
Your f type is list, it's a list of dictionary's with sub dictionary.
if 'Orange' in json.dumps(f): Will iterate the list and look at each item for Orange.
The problem is that f['fruits'] is a list so it expects an int number (place)
and not a dictionary key like ['Orange']
I think you should check your structure like #kindall said, if you still want to extract Orange this code will do the trick:
for value in f['fruits']:
if 'Orange' in value:
print value['Orange']
The problem is that the data structure has a list enclosing the dictionaries. If you have any control over the data source, that's the place to fix it. Otherwise, the best course is probably to post-process the data after parsing it to eliminate these extra list structures and merge the dictionaries in each list into a single dictionary. If you use an OrderedDict you can even retain the ordering of the items (which is probably why the list was used).
The square bracket in the line "fruits": [ should tell you that the item associated with fruits is (in Python parlance) a list rather than a dict and so cannot be indexed directly with a string like 'Oranges'. It sounds like you want to create a dict of fruits instead. You could do this by reformatting the input.
Or, if the input format is fixed: each item in your fruits list currently has a very specific format. Each item is a dict with exactly one key, and those keys are not duplicated between items. If those rules can be relied upon, it's pretty easy to write a small search routine—or the following code will convert a list-of-dicts into a dict:
fruits = dict(sum([x.items() for x in f['fruits']], []))
print fruits['Orange']