Python group by a object key gives errros - python

Does anyone know of a (itertools.groupby if possible too) way in Python to group an array of objects by an object key then create a new array of objects based on the grouping? For example, I have an array of car objects:
cars = [
{
'make': 'audi',
'model': 'r8',
'year': '2012'
}, {
'make': 'audi',
'model': 'rs5',
'year': '2013'
}, {
'make': 'ford',
'model': 'mustang',
'year': '2012'
}, {
'make': 'ford',
'model': 'fusion',
'year': '2015'
}, {
'make': 'kia',
'model': 'optima',
'year': '2012'
},
]
I want to make a new array of car objects that's grouped by make:
newCarsList = {
'audi': [
{
'model': 'r8',
'year': '2012'
}, {
'model': 'rs5',
'year': '2013'
},
],
'ford': [
{
'model': 'mustang',
'year': '2012'
}, {
'model': 'fusion',
'year': '2015'
}
],
'kia': [
{
'model': 'optima',
'year': '2012'
}
]
}
I tried with but this groupby:
def key_func(k):
return k['make']
newCarsList = []
for key, value in groupby(cars, key_func):
newCarsList.append({key: value})
print(newCarsList)
but this returns: [{'audi': <itertools._grouper object at 0x7f9b2c2bdd30>}, ...] and can't find how to fix.
Can someone help please?

A simple, mostly-functional solution:
from operator import itemgetter
from itertools import groupby
from functools import partial
def del_ret(d, key):
del d[key]
return d
dict(map(lambda k_v: (k_v[0], tuple(map(partial(del_ret, key="make"), k_v[1]))),
groupby(cars, itemgetter("make"))))
Change tuple to list to get identical output to what you want. But I assume you aren't going to modify those, so always use tuple in those circumstances…

The itertools._grouper is an iterable object. You can extract the values by iterating over it. For example, instead of appending {key: value}, you can pull the elements with a list comprehension: {key: [item for item in value]}.
It seems that your desired output is a dict though, not a list. You can get your pattern with
result = {}
for key, value in groupby(cars, key_func):
result[key] = [item for item in value]
for item in result[key]:
del item['make']
Edit: It's nicer to not add items that we're going to delete anyway. That can be done like this:
result = {}
for key, value in groupby(cars, key_func):
result[key] = [{subkey: subval for (subkey, subval)
in make.items() if subkey != 'make'}
for make in value]

df=pd.DataFrame(cars)
print(df)
for mytuple in df.itertuples():
print(mytuple)

Related

How to get multiple key value pairs from list of JSON dictionaries

I have a JSON file called "hostnames" formatted like below
{
'propertyName': 'www.property1.com',
'propertyVersion': 1,
'etag': 'jbcas6764023nklf78354',
'rules': {
'name': 'default',
'children': [{
'name': 'Route',
'children': [],
'behaviors': [{
'name': 'origin',
'options': {
'originType': 'CUSTOMER',
'hostname': 'www.origin1.com',
and I wanted to get the values of keys "propertyName" and "hostname" and have a new JSON file like below
'properties': [{
'propertyName': 'www.property1.com',
'hostnames': ['www.origin1.com', 'www.origin2.com']
}, {
'propertyName': 'www.property1.com',
'hostnames': ['www.origin1.com', 'www.origin2.com']
}]
my code looks like this
hostnames = result.json()
hostnameslist = [host['hostname'] for host in hostnames['rules']['children']['behaviors']['options']]
print(hostnameslist)
but I'm getting the error
TypeError: list indices must be integers or slices, not str
You are trying to access a list elements with a string index ('behaviors').
Try:
hostnames = result.json()
hostnameslist = []
for child in hostnames['rules']['children']:
for behavior in child['behaviors']:
if behavior['name'] == 'origin':
hostnameslist.append(behavior['options']['hostname'])
properties = [{
'propertyName': hostnames['propertyName'],
'hostnames': hostnameslist
}]
Making an assumption about how the OP's data might be structured.
Recursive navigation of the dictionary to find all/any values associated with a dictionary key of 'hostname' appears to be well-suited here.
Doing it this way obviates the need for knowledge about the depth of the dictionary or indeed any of the dictionary key names except (obviously) 'hostname'.
Of course, there may be other dictionaries within the "master" dictionary that contain a 'hostname' key. If that's the case then this function may return values that are not needed/wanted.
data = {
'propertyName': 'www.property1.com',
'propertyVersion': 1,
'etag': 'jbcas6764023nklf78354',
'rules': {
'name': 'default',
'children': [{
'name': 'Route',
'children': [],
'behaviors': [{
'name': 'origin',
'options': {
'originType': 'CUSTOMER',
'hostname': 'www.origin1.com'
}
},
{
'name': 'origin',
'options': {
'originType': 'CUSTOMER',
'hostname': 'www.origin2.com'
}
}
]
}
]
}
}
def get_hostnames(d):
def _get_hostnames(_d, _l):
if isinstance(_d, dict):
if 'hostname' in _d:
_l.append(_d['hostname'])
else:
for _v in _d.values():
_get_hostnames(_v, _l)
else:
if isinstance(_d, list):
for _v in _d:
_get_hostnames(_v, _l)
return _l
return _get_hostnames(d, [])
result = {'properties': [{'propertyName': data.get('propertyName'), 'hostnames': get_hostnames(data)}]}
print(result)
Output:
{'properties': [{'propertyName': 'www.property1.com', 'hostnames': ['www.origin1.com', 'www.origin2.com']}]}

creating a dictionary by partitioning a dictionary with new keys in python

I have the a dictionary like this:
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
I want to create another list as follows:
[{"label":{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"},"value":
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}}]
I have tried some methods with .items() but none of them gives the desired result.
Is that what you want?
dict_ = {"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
output = [{"label": dict_ , "value": dict_ }]
print(output)
[{"label":{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"},"value":
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}}] == [{"label": dict_ , "value": dict_ }]
Gives True
Following my comment, below is the code I would go through assuming key and output:
# Could be the keys would get from somewhere
vals = ["1","2","3","4"]
# Probably same coming from external sources
example_op =
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
#Global list
item_list = []
temp_dict = {}
for key in vals:
temp_dict[key] = example_op
item_list.append(temp_dict)
Final output of the list would be as:
Out[9]:
[{'1': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'2': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'3': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'4': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'}}]

Flattening a dict

I have the following array of dicts (there's only one dict):
[{
'RuntimeInMinutes': '21',
'EpisodeNumber': '21',
'Genres': ['Animation'],
'ReleaseDate': '2005-02-05',
'LanguageOfMetadata': 'EN',
'Languages': [{
'_Key': 'CC',
'Value': ['en']
}, {
'_Key': 'Primary',
'Value': ['EN']
}],
'Products': [{
'URL': 'http://www.hulu.com/watch/217566',
'Rating': 'TV-Y',
'Currency': 'USD',
'SUBSCRIPTION': '0.00',
'_Key': 'US'
}, {
'URL': 'http://www.hulu.com/d/217566',
'Rating': 'TV-Y',
'Currency': 'USD',
'SUBSCRIPTION': '0.00',
'_Key': 'DE'
}],
'ReleaseYear': '2005',
'TVSeriesID': '5638#TVSeries',
'Type': 'TVEpisode',
'Studio': '4K Media'
}]
I would like to flatten the dict as follows:
[{
'RuntimeInMinutes': '21',
'EpisodeNumber': '21',
'Genres': ['Animation'],
'ReleaseDate': '2005-02-05',
'LanguageOfMetadata': 'EN',
'Languages._Key': ['CC', 'Primary'],
'Languages.Value': ['en', 'EN'],
'Products.URL': ['http://www.hulu.com/watch/217566', 'http://www.hulu.com/d/217566'],
'Products.Rating': ['TV-Y', 'TV-Y'],
'Products.Currency': ['USD', 'USD'],
'Products.SUBSCRIPTION': ['0.00', '0.00'],
'Products._Key': ['US', 'DE'],
'ReleaseYear': '2005',
'TVSeriesID': '5638#TVSeries',
'Type': 'TVEpisode',
'Studio': '4K Media'
}]
In other words, anytime a dict is encountered, it need to convert to either a string, number, or list.
What I currently have is something along the lines of the following, which uses a while loop to iterate through all the subpaths of the json.
while True:
for key in copy(keys):
val = get_sub_object_from_path(obj, key)
if isinstance(val, dict):
FLAT_OBJ[key.replace('/', '.')] = val
else:
keys.extend(os.path.join(key, _nextkey) for _nextkey in val.keys())
keys.remove(key)
if (not keys) or (n > 5):
break
else:
n += 1
continue
You can use recursion with a generator:
from collections import defaultdict
_d = [{'RuntimeInMinutes': '21', 'EpisodeNumber': '21', 'Genres': ['Animation'], 'ReleaseDate': '2005-02-05', 'LanguageOfMetadata': 'EN', 'Languages': [{'_Key': 'CC', 'Value': ['en']}, {'_Key': 'Primary', 'Value': ['EN']}], 'Products': [{'URL': 'http://www.hulu.com/watch/217566', 'Rating': 'TV-Y', 'Currency': 'USD', 'SUBSCRIPTION': '0.00', '_Key': 'US'}, {'URL': 'http://www.hulu.com/d/217566', 'Rating': 'TV-Y', 'Currency': 'USD', 'SUBSCRIPTION': '0.00', '_Key': 'DE'}], 'ReleaseYear': '2005', 'TVSeriesID': '5638#TVSeries', 'Type': 'TVEpisode', 'Studio': '4K Media'}]
def get_vals(d, _path = []):
for a, b in getattr(d, 'items', lambda :{})():
if isinstance(b, list) and all(isinstance(i, dict) or isinstance(i, list) for i in b):
for c in b:
yield from get_vals(c, _path+[a])
elif isinstance(b, dict):
yield from get_vals(b, _path+[a])
else:
yield ['.'.join(_path+[a]), b]
results = [i for b in _d for i in get_vals(b)]
_c = defaultdict(list)
for a, b in results:
_c[a].append(b)
result = [{a:list(b) if len(b) > 1 else b[0] for a, b in _c.items()}]
import json
print(json.dumps(result, indent=4))
Output:
[
{
"RuntimeInMinutes": "21",
"EpisodeNumber": "21",
"Genres": [
"Animation"
],
"ReleaseDate": "2005-02-05",
"LanguageOfMetadata": "EN",
"Languages._Key": [
"CC",
"Primary"
],
"Languages.Value": [
[
"en"
],
[
"EN"
]
],
"Products.URL": [
"http://www.hulu.com/watch/217566",
"http://www.hulu.com/d/217566"
],
"Products.Rating": [
"TV-Y",
"TV-Y"
],
"Products.Currency": [
"USD",
"USD"
],
"Products.SUBSCRIPTION": [
"0.00",
"0.00"
],
"Products._Key": [
"US",
"DE"
],
"ReleaseYear": "2005",
"TVSeriesID": "5638#TVSeries",
"Type": "TVEpisode",
"Studio": "4K Media"
}
]
Edit: wrapping solution in outer function:
def flatten_obj(data):
def get_vals(d, _path = []):
for a, b in getattr(d, 'items', lambda :{})():
if isinstance(b, list) and all(isinstance(i, dict) or isinstance(i, list) for i in b):
for c in b:
yield from get_vals(c, _path+[a])
elif isinstance(b, dict):
yield from get_vals(b, _path+[a])
else:
yield ['.'.join(_path+[a]), b]
results = [i for b in data for i in get_vals(b)]
_c = defaultdict(list)
for a, b in results:
_c[a].append(b)
return [{a:list(b) if len(b) > 1 else b[0] for a, b in _c.items()}]
EDIT
This now appears to be fixed:
As #panda-34 correctly points out (+1), the currently accepted
solution loses data, specifically Genres and Languages.Value when
you run the posted code.
Unfortunately, #panda-34's code modifies Genres:
'Genres': 'Animation',
rather than leaving it alone as in the OP's example:
'Genres': ['Animation'],
Below's my solution which attacks the problem a different way. None of the keys in the original data contains a dictionary as a value, only non-containers or lists (e.g. lists of dictionaries). So a primary a list of dictionaries will becomes a dictionary of lists (or just a plain dictionary if there's only one dictionary in the list.) Once we've done that, then any value that's now a dictionary is expanded back into the original data structure:
def flatten(container):
# A list of dictionaries becomes a dictionary of lists (unless only one dictionary in list)
if isinstance(container, list) and all(isinstance(element, dict) for element in container):
new_dictionary = {}
first, *rest = container
for key, value in first.items():
new_dictionary[key] = [flatten(value)] if rest else flatten(value)
for dictionary in rest:
for key, value in dictionary.items():
new_dictionary[key].append(value)
container = new_dictionary
# Any dictionary value that's a dictionary is expanded into original dictionary
if isinstance(container, dict):
new_dictionary = {}
for key, value in container.items():
if isinstance(value, dict):
for sub_key, sub_value in value.items():
new_dictionary[key + "." + sub_key] = sub_value
else:
new_dictionary[key] = value
container = new_dictionary
return container
OUTPUT
{
"RuntimeInMinutes": "21",
"EpisodeNumber": "21",
"Genres": [
"Animation"
],
"ReleaseDate": "2005-02-05",
"LanguageOfMetadata": "EN",
"Languages._Key": [
"CC",
"Primary"
],
"Languages.Value": [
[
"en"
],
[
"EN"
]
],
"Products.URL": [
"http://www.hulu.com/watch/217566",
"http://www.hulu.com/d/217566"
],
"Products.Rating": [
"TV-Y",
"TV-Y"
],
"Products.Currency": [
"USD",
"USD"
],
"Products.SUBSCRIPTION": [
"0.00",
"0.00"
],
"Products._Key": [
"US",
"DE"
],
"ReleaseYear": "2005",
"TVSeriesID": "5638#TVSeries",
"Type": "TVEpisode",
"Studio": "4K Media"
}
But this solution introduces a new apparent inconsistency:
'Languages.Value': ['en', 'EN'],
vs.
"Languages.Value": [["en"], ["EN"]],
However, I believe this is tied up with the Genres inconsistency mentioned earlier and the OP needs to define a consistent resolution.
Ajax1234's answer loses values of 'Genres' and 'Languages.Value'
Here's a bit more generic version:
def flatten_obj(data):
def flatten_item(item, keys):
if isinstance(item, list):
for v in item:
yield from flatten_item(v, keys)
elif isinstance(item, dict):
for k, v in item.items():
yield from flatten_item(v, keys+[k])
else:
yield '.'.join(keys), item
res = []
for item in data:
res_item = defaultdict(list)
for k, v in flatten_item(item, []):
res_item[k].append(v)
res.append({k: (v if len(v) > 1 else v[0]) for k, v in res_item.items()})
return res
P.S. "Genres" value is also flattened. It is either an inconsistency in the OP requirements or a separate problem which is not addressed in this answer.

There are 2 dictionaries, I want to append the matching keys from second dictionaries and store them as a list f dictionaries in the parent dictionary

For example, I have below 2 dictionaries,
dict1 = [{'id': 1, 'name': 'BOB'}, {'id': 2, 'name': 'DOD'}]
dict2 = [{'idd': 1, 'comp': 'BB', }, {'idd': 1, 'work': 'pent'}, {'idd': 2, 'comp': 'DD'}]
And I want below output -
dict1 = [
{
'id': 1,
'name': 'BOB',
'Details:[
{
'idd': 1,
'comp': 'BB'
},
{
'idd': 1,
'work': 'pent'
}
]
},
{
'id': 2,
'name': 'DOD',
'Details':[
{
'idd': 2,
'comp': 'DD'
}
]
}
]
I want to get the above result, using dictionary zip or ordereddict
Convert the dict1 to a real dict, with the id as key, and add an empty Details list to each entry. Then, iterate dict2 and and the missing elements.
dict1 = {item['id']: {**item, **{'Details': []}} for item in dict1}
for item in dict2:
item = dict(item)
_id = item.pop('idd')
temp[_id]['Details'].append(item)
dict1 = [item for item in dict1.values()]

How to retrieve values from list: python

Summary: I have a list
list = [
{
'id': '1',
'elements': 'A',
'table': 'maps/partials/a.html',
'chart': 'maps/partials/charts/a.html',
},
{
'id': '2',
'elements': 'B',
'table': 'maps/partials/census/b.html',
'chart': 'maps/partials/charts/b.html',
},
{
'id': '3',
'elements': ('C','D','E','F'), //i believe it is wrong
'table': 'maps/partials/census/common.html',
'chart': 'maps/partials/charts/common.html',
},]
some_arr = ['E','2011','English','Total']
I want to compare elements with items in some_arr.
and i am doing this to get list and compare it with some_arr.
for data in list:
for i in range(len(some_arr)):
if data['elements'] == some_arr[i]:
print(data['id'])
As you can see in 'id':3 section_title has 4 values. How do i compare elements here with some_arr.
If you want to find any matches:
lst = [
{
'id': '1',
'elements': 'A',
'table': 'maps/partials/a.html',
'chart': 'maps/partials/charts/a.html',
},
{
'id': '2',
'elements': 'B',
'table': 'maps/partials/census/b.html',
'chart': 'maps/partials/charts/b.html',
},
{
'id': '3',
'elements': ('C','D','E','F'),
'table': 'maps/partials/census/common.html',
'chart': 'maps/partials/charts/common.html',
}]
some_arr = ['E','2011','English','Total']
st = set(some_arr)
from collections import Iterable
for d in lst:
val = d["elements"]
if isinstance(val, Iterable) and not isinstance(val, str):
if any(ele in st for ele in val):
print(d["id"])
else:
if val in st:
print(d["id"])
Making a set of all the elements in some_arr will give you O(1) lookups, using isinstance(val, Iterable) and not isinstance(val, str) will catch any iterable value like a list, tuple etc.. and avoid iterating over a string which could give you false positives as "F" is in "Foo".
any will short circuit on the first match so if you actually want to print id for every match then use a for loop. Lastly if you are using python2 use basestring instead of str.
You can use set.intersection to get the intersection between them but before you need to check that if it's a tuple then use intersection.
Also in the first case you can use in to check the membership to check if data['elememnt'] in in some_arr then print data['id']:
for data in list:
d=data['elements']
if isinstance(d,tuple):
if set(d).intersection(some_arr):
print(data['id'])
if d in some_arr:
print(data['id'])

Categories

Resources