Hello there I have a JSON dataset with a whole bunch of entries like that:
There are multiple entries for each date that look the same, just different dates with multiple entries of people added to that date.
So I could get information by using
{json_content['12-1-2021'][0]['name']}
{'12-1-2021': [{'initials': 'IS',
'name': 'Sam',
'age': 23,
'address': 'Freedom Dr',
'city': 'Seattle',
'state': 'WA'},
{'initials': 'SD',
'name': 'Sed',
'age': 21,
'address': 'Washington Dr',
'city': 'Phoenix',
'state': 'AZ'}]}
I want to iterate somehow through the dataset and select for instance all the people who live in Seattle without the date(maybe add the date later- not sure the requirement on that yet). But I can't do that without specifying the date since the beginning.
Thank you!
You definitely can. Since you don't care about the keys of the dictionary just go through the values:
names = [
person['name']
for date in json_content.values()
for person in date if person['city'] == 'Seattle'
]
If you don't want to make any assumptions about how valid the structure of the json is, you can check it explicitly along the way in addition to checking the city:
[
person['name']
for date in json_content.values() if isinstance(date, list)
for person in date if all([
isinstance(person, dict),
'name' in person,
'city' in person,
person['city'] == 'Seattle'])
]
Both of these get you ['Sam'] for your sample json.
If you want a complete person record and you are having future scope for a date then I will suggest the below code.
for person_records in json_content.values():
if isinstance(person_records, list):
for person in person_records:
if person.get("city") == "Seattle":
print(person)
Related
I am looking for converting a nested json into flat json using python.
I have the data coming from an API response, the number of keys/columns can be upto 100, and the rows/overall count of elements can be 100k
[{"Name":"John", "Location":{"City":"Los Angeles","State":"CA"}},{"Name":"Sam", "Location":{"City":"Chicago","State":"IL"}}]
I did came across this
(Python flatten multilevel JSON)
but this flattens the whole JSON completely, as a result everything falls under one line which I am not looking for currently. I also thought of using this on one the data one array at a time in loop but that is causing a lot of load on the system
[{"Name":"John", "City":"Los Angeles","State":"CA"},{"Name":"Sam", "City":"Chicago","State":"IL"}]
Use unpacking with dict.pop:
[{**d.pop("Location"), **d} for d in l]
Output:
[{'City': 'Los Angeles', 'Name': 'John', 'State': 'CA'},
{'City': 'Chicago', 'Name': 'Sam', 'State': 'IL'}]
I need to compare the values of the items in two different dictionaries.
Let's say that dictionary RawData has items that represent phone numbers and number names.
Rawdata for example has items like: {'name': 'Customer Service', 'number': '123987546'} {'name': 'Switchboard', 'number': '48621364'}
Now, I got dictionary FilteredData, which already contains some items from RawData: {'name': 'IT-support', 'number': '32136994'} {'name': 'Company Customer Service', 'number': '123987546'}
As you can see, Customer Service and Company Customer Service both have the same values, but different keys. In my project, there might be hundreds of similar duplicates, and we only want unique numbers to end up in FilteredData.
FilteredData is what we will be using later in the code, and RawData will be discarded.
Their names(keys) can be close duplicates, but not their numbers(values)**
There are two ways to do this.
A. Remove the duplicate items in RawData, before appending them into FilteredData.
B. Append them into FilteredData, and go through the numbers(values) there, removing the duplicates. Can I use a set here to do that? It would work on a list, obviously.
I'm not looking for the most time-efficient solution. I'd like the most simple and easy to learn one, if and when someone takes over my job someday. In my project it's mandatory for the next guy working on the code to get a quick grip of it.
I've already looked at sets, and tried to face the problem by nesting two for loops, but something tells me there gotta be an easier way.
Of course I might have missed the obvious solution here.
Thanks in advance!
I hope I understands your problem here:
data = [{'name': 'Customer Service', 'number': '123987546'}, {'name': 'Switchboard', 'number': '48621364'}]
newdata = [{'name': 'IT-support', 'number': '32136994'}, {'name': 'Company Customer Service', 'number': '123987546'}]
def main():
numbers = set()
for entry in data:
numbers.add(entry['number'])
for entry in newdata:
if entry['number'] not in numbers:
data.append(entry)
print data
main()
Output:
[{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'},
{'name': 'IT-support', 'number': '32136994'}]
What you can do is take a dict.values(), create a set of those to remove duplicates and then go through the old dictionary and find the first key with that value and add it to a new one. Keep the set around because when you get the next dict entry, try adding the element to that set and see if the length of the set is longer that before adding it. If it is, it's a unique element and you can add it to the dict.
If you're willing on changing how FilteredData is currently, you can just use a dict and use the number as your key:
RawData = [
{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'}
]
# Change how FilteredData is structured
FilteredDataMap = {
'32136994':
{'name': 'IT-support', 'number': '32136994'},
'123987546':
{'name': 'Company Customer Service', 'number': '123987546'}
}
for item in RawData:
number = item.get('number')
if number not in FilteredDataMap:
FilteredDataMap[number] = item
# If you need the list of items
FilteredData = list(FilteredDataMap.values())
You can just pull the actual list from the Map using .values()
I take the numbers are unique. Then, another solution would be taking advantage of the uniqueness of dictionary keys. This means converting each list of dictionary to a dictionary of 'number:name' pairs. Then, you simple need to update RawData with FilteredData.
RawData = [
{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'}
]
FilteredData = [
{'name': 'IT-support', 'number': '32136994'},
{'name': 'Company Customer Service', 'number': '123987546'}
]
def convert_list(input_list):
return {item['number']:item['name'] for item in input_list}
def unconvert_dict(input_dict):
return [{'name':val, 'number': key} for key, val in input_dict.items()]
NewRawData = convert_list(RawData)
NewFilteredData = conver_list(FilteredData)
DesiredResultConverted = NewRawData.update(NewFilteredData)
DesuredResult = unconvert_dict(DesiredResultConverted)
In this example, the variables will have the following values:
NewRawData = {'123987546':'Customer Service', '48621364': 'Switchboard'}
NewFilteredData = {'32136994': 'IT-support', '123987546': 'Company Customer Service'}
When you update NewRawData with NewFilteredData, Company Customer Service will overwrite Customer Service as the value associated with the key 123987546. So,
DesiredResultConverted = {'123987546':'Company Customer Service', '48621364': 'Switchboard', '32136994': 'IT-support'}
Then, if you still prefer the original format, you can "unconvert" back.
I've got a json format list with some dictionaries within each list, it looks like the following:
[{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
The amount of entries within the list can be up to 100. I plan to present the 'name' for each entry, one result at a time, for those that have London as a town. The rest are of no use to me. I'm a beginner at python so I would appreciate a suggestion in how to go about this efficiently. I initially thought it would be best to remove all entries that don't have London and then I can go through them one by one.
I also wondered if it might be quicker to not filter but to cycle through the entire json and select the names of entries that have the town as London.
You can use filter:
data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
london_dicts = filter(lambda d: d['venue']['town'] == 'London', data)
for d in london_dicts:
print(d)
This is as efficient as it can get because:
The loop is written in C (in case of CPython)
filter returns an iterator (in Python 3), which means that the results are loaded to memory one by one as required
One way is to use list comprehension:
>>> data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
>>> [d for d in data if d['venue']['town'] == 'London']
[{'id': 17,
'name': 'Alfred',
'venue': {'id': 456, 'town': 'London'},
'month': 'February'},
{'id': 17,
'name': 'Mary',
'venue': {'id': 56, 'town': 'London'},
'month': 'December'}]
I have some data like this:
{'cities': [{'abbrev': 'NY', 'name': 'New York'}, {'abbrev': 'BO', 'name': 'Boston'}]}
From my scarce knowledge of Python this looks like a dictionary within a dictionary.
But either way how can I use "NY" as a key to fetch the value "New York"?
It's a dictionary with one key-value pair. The value is a list of dictionaries.
d = {'cities': [{'abbrev': 'NY', 'name': 'New York'}, {'abbrev': 'BO', 'name': 'Boston'}]}
To find the name for an abbreviation you should iterate over the dictionaries in the list and then compare the abbrev-value for a match:
for city in d['cities']: # iterate over the inner list
if city['abbrev'] == 'NY': # check for a match
print(city['name']) # print the matching "name"
Instead of the print you can also save the dictionary containing the abbreviation, or return it.
When you've got a dataset not adapted to your need, instead of using it "as-is", you can build another dictionary from that one, using a dictionary comprehension with key/values as values of your sub-dictionaries, using the fixed keys.
d = {'cities': [{'abbrev': 'NY', 'name': 'New York'}, {'abbrev': 'BO', 'name': 'Boston'}]}
newd = {sd["abbrev"]:sd["name"] for sd in d['cities']}
print(newd)
results in:
{'NY': 'New York', 'BO': 'Boston'}
and of course: print(newd['NY']) yields New York
Once the dictionary is built, you can reuse it as many times as you need with great lookup speed. Build other specialized dictionaries from the original dataset whenever needed.
Use next and filter the sub dictionaries based upon the 'abbrev' key:
d = {'cities': [{'abbrev': 'NY', 'name': 'New York'},
{'abbrev': 'BO', 'name': 'Boston'}]}
city_name = next(city['name'] for city in d['cities']
if city['abbrev'] == 'NY')
print city_name
Output:
New York
I think that I understand your problem.
'NY' is a value, not a key.
Maybe you need something like {'cities':{'NY':'New York','BO':'Boston'}, so you could type: myvar['cities']['NY'] and it will return 'New York'.
If you have to use x = {'cities': [{'abbrev': 'NY', 'name': 'New York'}, {'abbrev': 'BO', 'name': 'Boston'}]} you could create a function:
def search(abbrev):
for cities in x['cities']:
if cities['abbrev'] == abbrev:
return cities['name']
Output:
>>> search('NY')
'New York'
>>> search('BO')
'Boston'
PD: I use python 3.6
Also with this code you could also find abbrev:
def search(s, abbrev):
for cities in x['cities']:
if cities['abbrev'] == abbrev: return cities['name'], cities['abbrev']
if cities['name'] == abbrev: return cities['name'], cities['abbrev']
What is a good way to get an ordered dictionary from a regular dictionary? I need the keys (and these keys are known ahead of time) to be in a certain order. I will be "dump"ing a list of these dictionaries into a JSON file and need things ordered a certain way.
--- Edited and added the following
For instance i have a dictionary ...
employee = { 'phone': '1234567890', 'department': 'HR', 'country': 'us', 'name': 'Smith' }
when i dump it into JSON format, i would like for it to print out as
{ 'name': 'Smith', 'department': 'HR', 'country': 'us', 'phone': '1234567890'}
Sort your dict items and create an OrderedDict from the sorted elements making sure to pass reverse=True to sort from highest to lowest:
from collections import OrderedDict
order = ("name","department","country","phone")
employee = { 'phone': '1234567890', 'department': 'HR', 'country': 'us', 'name': 'Smith' }
od = OrderedDict((k, employee[k]) for k in order)
But if you dump to a json file and load again the order will not be maintained and you will not get an OrderedDict back, when you dump it will look like:
{"name": "Smith", "department": "HR", "country": "us", "phone": "1234567890"}
But loading will will not be in the same order because normal dicts have no order like below:
{'phone': '1234567890', 'name': 'Smith', 'country': 'us', 'department': 'HR'}
If you are trying to just store the dicts to use again and want to maintain order you can pickle:
import pickle
with open("foo.pkl","wb") as f:
pickle.dump(od,f)
with open("foo.pkl","rb") as f:
d = pickle.load(f)
print(d)
You could do something like the following ... you collect the keys in order in a list of String, traverse through the list and look up in the dictionary, and create an ordered dictionary
def makeOrderedDict(dictToOrder, keyOrderList):
tupleList = []
for key in keyOrderList:
tupleList.append((key, dictToOrder[key]))
return OrderedDict(tupleList)