suppose I have a list of dicts (where each dict has the same keys) like this:
list_of_dicts = [
{'Id': 4726, 'Body': 'Hello from John', 'Title': None, 'Comments': 'Dallas. '},
{'Id': 4726, 'Body': 'Hello from Mary', 'Title': None, 'Comments': "Austin"},
{'Id': 4726, 'Body': 'Hello from Dylan', 'Title': None, 'Comments': "Boston"},
]
I need to concat only the Body, Title and Comments part and return a single dict, like this:
{'Id': 4726, 'Body': 'Hello from John Hello from Mary Hello from Dylan', 'Title': None, 'Comments': 'Dallas. Austin Boston'}
Please note, Title is None. So, we have to be careful there. This is what I have done so far...but, failing somewhere...I cannot see where...
keys = set().union(*list_of_dicts)
print(keys)
k_value = list_of_dicts[0]['Id']
d_dict = {k: " ".join(str(dic.get(k, '')) for dic in list_of_dicts) for k in keys if k != 'Id'}
merged_dict = {'Id': k_value}
merged_dict.update(d_dict)
But, the above returns this ...which I do not like:
Final Merged Dict: {'Id': 4726, 'Body': 'Hello from John Hello from Mary Hello from Dylan', 'Title': 'None None None', 'Comments': 'Dallas. Austin Boston'}
First, I'd remove Id from keys to avoid having to skip it in the dictionary comprehension, and use a simple assignment rather than .update() at the end.
In the argument to join, filter out when dic[k] is None. And if the join results in an empty string (because all the values are None), convert that to None in the final result.
keys = set().union(*list_of_dicts)
keys.remove('Id')
print(keys)
k_value = list_of_dicts[0]['Id']
d_dict = {k: (" ".join(str(dic[k]) for dic in list_of_dicts if k in dic and dic[k] is not None) or None) for k in keys}
d_dict['Id'] = k_value
print(d_dict)
DEMO
As you parse your list of dictionaries, you can store the intermediate results in defaultdict objects to hold a list of the string values. Once all the dictionaries have been parsed, you can then join together the strings.
from collections import defaultdict
dd_body = defaultdict(list)
dd_comments = defaultdict(list)
dd_titles = defaultdict(list)
for row in list_of_dicts:
dd_body[row['Id']].append(row['Body'])
dd_comments[row['Id']].append(row['Comments'])
dd_titles[row['Id']].append(row['Title'] or '') # Effectively removes `None`.
result = []
for id_ in dd_body: # All three dictionaries have the same keys.
body = ' '.join(dd_body[id_]).strip()
comments = ' '.join(dd_comments[id_]).strip()
titles = ' '.join(dd_titles[id_]).strip() or None
result.append({'Id': id_, 'Body': body, 'Title': titles, 'Comments': comments})
>>> result
[{'Id': 4726,
'Body': 'Hello from John Hello from Mary Hello from Dylan',
'Title': None,
'Comments': 'Dallas. Austin Boston'}]
Less Pythonic then other answers but I like to think that it is easy to understand.
body, title, comments = "", "", ""
list_of_dicts=[
{'Id': 4726, 'Body': 'Hello from John', 'Title': None, 'Comments': 'Dallas. '},
{'Id': 4726, 'Body': 'Hello from Mary', 'Title': None, 'Comments': "Austin"},
{'Id': 4726, 'Body': 'Hello from Dylan', 'Title': None, 'Comments': "Boston"},
]
id = list_of_dicts[0]['Id']
for dict in list_of_dicts:
if dict['Body'] is not None:
body=body + dict['Body']
if dict['Title'] is not None:
title=title + dict['Title']
if dict ['Comments'] is not None:
comments=comments + dict['Comments']
if title == "":
title = None
if body == "":
body = None
if comments == "":
comments = None
record = {'Id': id, 'Body': body, 'Title': title, 'Comments': comments}
If only Title field has a option of being None then it can be shortened by removing the checks on the other fields.
body, title, comments = "", "", ""
list_of_dicts=[
{'Id': 4726, 'Body': 'Hello from John', 'Title': None, 'Comments': 'Dallas. '},
{'Id': 4726, 'Body': 'Hello from Mary', 'Title': None, 'Comments': "Austin"},
{'Id': 4726, 'Body': 'Hello from Dylan', 'Title': None, 'Comments': "Boston"}]
id = list_of_dicts[0]['Id']
for dict in list_of_dicts:
body=body + dict['Body']
comments=comments + dict['Comments']
if dict['Title'] is not None:
title=title + dict['Title']
if title == "":
title = None
record = {'Id': id, 'Body': body, 'Title': title, 'Comments': comments}
For this type of data manipulations pandas is your friend.
import pandas as pd
# Your list of dictionaries.
list_of_dicts = [
{'Id': 4726, 'Body': 'Hello from John', 'Title': None, 'Comments': 'Dallas. '},
{'Id': 4726, 'Body': 'Hello from Mary', 'Title': None, 'Comments': "Austin"},
{'Id': 4726, 'Body': 'Hello from Dylan', 'Title': None, 'Comments': "Boston"},
]
# Can be read into a pandas dataframe
df = pd.DataFrame(list_of_dicts)
# Do a database style groupby() and apply the function that you want to each group
group_transformed_df = df.groupby('Id').agg(lambda x: ' '.join(x)).reset_index() # I do reset_index to get a normal DataFrame back.
# DataFrame() -> dict
output_dict = group_transformed_df.to_dict('records')
There are many types of dicts you can get from a DataFrame. You want the records option.
Related
I have this code the call an api and get an answer.
I need to perform a call for ever issue_id, and store the issue id with the correct response:
issues_ids = [10495]
def get_changelog(issue_id: int):
url = f'{base_url}/{issue_id}/changelog'
response = requests.request("GET",url,headers=headers,auth=auth)
return (response.json())
def parse_json(response):
keylsit = []
for item in response['values']:
key = {
'id': item['id'],
'items': item['items']
}
(keylsit.append(key))
return keylsit
mainlist = []
for i in issues_ids:
print(i)
mainlist.extend(parse_json(get_changelog(i)))
print(mainlist)
current print :
10495
[{'id': '13613', 'items': [{'field': 'Organization text', 'fieldtype': 'custom', 'fieldId': 'customfield_10039', 'from': None, 'fromString': None, 'to': None, 'toString': 'Jabil'}]}]
I need to add the 10495 with in this array as a new key
[{issue_id: 10495, 'id': '13613', 'items': [{'field': 'Organization text', 'fieldtype': 'custom', 'fieldId': 'customfield_10039', 'from': None, 'fromString': None, 'to': None, 'toString': 'Jabil'}]}]
I tried different methods such as insert, append...
I have the following telegram export JSON dataset:
import pandas as pd
df = pd.read_json("data/result.json")
>>>df.colums
Index(['name', 'type', 'id', 'messages'], dtype='object')
>>> type(df)
<class 'pandas.core.frame.DataFrame'>
# Sample output
sample_df = pd.DataFrame({"messages": [
{"id": 11, "from": "user3984", "text": "Do you like soccer?"},
{"id": 312, "from": "user837", "text": ['Not sure', {'type': 'hashtag', 'text': '#confused'}]},
{"id": 4324, "from": "user3984", "text": ['O ', {'type': 'mention', 'text': '#user87324'}, ' really?']}
]})
Within df, there's a "messages" column, which has the following output:
>>> df["messages"]
0 {'id': -999713937, 'type': 'service', 'date': ...
1 {'id': -999713936, 'type': 'service', 'date': ...
2 {'id': -999713935, 'type': 'message', 'date': ...
3 {'id': -999713934, 'type': 'message', 'date': ...
4 {'id': -999713933, 'type': 'message', 'date': ...
...
22377 {'id': 22102, 'type': 'message', 'date': '2022...
22378 {'id': 22103, 'type': 'message', 'date': '2022...
22379 {'id': 22104, 'type': 'message', 'date': '2022...
22380 {'id': 22105, 'type': 'message', 'date': '2022...
22381 {'id': 22106, 'type': 'message', 'date': '2022...
Name: messages, Length: 22382, dtype: object
Within messages, there's a particular key named "text", and that's the place I want to focus. Turns out when you explore the data, text column can have:
A single text:
>>> df["messages"][5]["text"]
'JAJAJAJAJAJAJA'
>>> df["messages"][22262]["text"]
'No creo'
But sometimes it's nested. Like the following:
>>> df["messages"][22373]["text"]
['O ', {'type': 'mention', 'text': '#user87324'}, ' really?']
>>> df["messages"][22189]["text"]
['The average married couple has sex roughly once a week. ', {'type': 'mention', 'text': '#googlefactss'}, ' ', {'type': 'hashtag', 'text': '#funfact'}]
>>> df["messages"][22345]["text"]
[{'type': 'mention', 'text': '#user817430'}]
In case for nested data, if I want to grab the main text, I can do the following:
>>> df["messages"][22373]["text"][0]
'O '
>>> df["messages"][22189]["text"][0]
'The average married couple has sex roughly once a week. '
>>>
From here, everything seems ok. However, the problem arrives when I do the for loop. If I try the following:
for item in df["messages"]:
tg_id = item.get("id", "None")
tg_type = item.get("type", "None")
tg_date = item.get("date", "None")
tg_from = item.get("from", "None")
tg_text = item.get("text", "None")
print(tg_id, tg_from, tg_text)
A sample output is:
21263 user3984 jajajajaja
21264 user837 ['Not sure', {'type': 'hashtag', 'text': '#confused'}]
21265 user3984 What time is it?✋
MY ASK: How to flatten the rows? I need the following (and store that in a data frame):
21263 user3984 jajajajaja
21264 user837 Not sure
21265 user837 type: hashtag
21266 user837 text: #confused
21267 user3984 What time is it?✋
I tried to detect "text" type like this:
for item in df["messages"]:
tg_id = item.get("id", "None")
tg_type = item.get("type", "None")
tg_date = item.get("date", "None")
tg_from = item.get("from", "None")
tg_text = item.get("text", "None")
if type(tg_text) == list:
tg_text = tg_text[0]
print(tg_id, tg_from, tg_text)
With this I only grab the first text, but I'm expecting to grab the other fields as well or to 'flatten' the data.
I also tried:
for item in df["messages"]:
tg_id = item.get("id", "None")
tg_type = item.get("type", "None")
tg_date = item.get("date", "None")
tg_from = item.get("from", "None")
tg_text = item.get("text", "None")
if type(tg_text) == list:
tg_text = tg_text[0]
tg_second = tg_text[1]["text"]
print(tg_id, tg_from, tg_text, tg_second)
But no luck because indices are variable, length from messages are variable too.
In addition, even if the output weren't close of my desired solution, I also tried:
for item in df["messages"]:
tg_text = item.get("text", "None")
if type(tg_text) == list:
for i in tg_text:
print(item, i)
mydict = {}
for k, v in df.items():
print(k, v)
mydict[k] = v
# Used df["text"].explode()
# Used json_normalize but no luck
Any thoughts?
Assuming a dataframe like the following:
df = pd.DataFrame({"messages": [
{"id": 21263, "from": "user3984", "text": "jajajajaja"},
{"id": 21264, "from": "user837", "text": ['Not sure', {'type': 'hashtag', 'text': '#confused'}]},
{"id": 21265, "from": "user3984", "text": ['O ', {'type': 'mention', 'text': '#user87324'}, ' really?']}
]})
First, expand the messages dictionaries into separate id, from and text columns.
expanded = pd.concat([df.drop("messages", axis=1), pd.json_normalize(df["messages"])], axis=1)
Then explode the dataframe to have a row for each entry in text:
exploded = expanded.explode("text")
Then expand the dictionaries that are in some of the entries, converting them to lists of text:
def convert_dict(entry):
if type(entry) is dict:
return [f"{k}: {v}" for k, v in entry.items()]
else:
return entry
exploded["text"] = exploded["text"].apply(convert_dict)
Finally, explode again to separate the converted dicts to separate rows.
final = exploded.explode("text")
The resulting output should look like this
id from text
0 21263 user3984 jajajajaja
1 21264 user837 Not sure
1 21264 user837 type: hashtag
1 21264 user837 text: #confused
2 21265 user3984 O
2 21265 user3984 type: mention
2 21265 user3984 text: #user87324
2 21265 user3984 really?
Just to share some ideas to flatten your list,
def flatlist(srclist):
flatlist=[]
if srclist: #check if srclist is not None
for item in srclist:
if(type(item) == str): #check if item is type of string
flatlist.append(item)
if(type(item) == dict): #check if item is type of dict
for x in item:
flatlist.append(x + ' ' + item[x]) #combine key and value
return flatlist
for item in df["messages"]:
tg_text = item.get("text", "None")
flat_list = flatlist(tg_text) # get the flattened list
for tg in flat_list: # loop through the list and get the data you want
tg_id = item.get("id", "None")
tg_from = item.get("from", "None")
print(tg_id, tg_from, tg)
[[{'text': '\n ', 'category': 'cooking', 'title': {'text': 'Everyday
Italian', 'lang': 'en'}, 'author': {'text': 'Giada De Laurentiis'}, 'year':
{'text': '2005'}, 'price': {'text': '30.00'}},
{'text': '\n ', 'category': 'children', 'title': {'text': 'Harry Potter',
'lang': 'en'}, 'author': {'text': 'J K. Rowling'}, 'year': {'text':
'2005'}, 'price': {'text': '29.99'}}, {'text': '\n ', 'category':
'web', 'title': {'text': 'XQuery Kick Start', 'lang': 'en'}, 'author':
[{'text': 'James McGovern'}, {'text': 'Per Bothner'}, {'text': 'Kurt
Cagle'}, {'text': 'James Linn'}, {'text': 'Vaidyanathan Nagarajan'}],
'year': {'text': '2003'}, 'price': {'text': '49.99'}}, {'text': '\n ',
'category': 'web', 'cover': 'paperback', 'title': {'text': 'Learning XML',
'lang': 'en'}, 'author': {'text': 'Erik T. Ray'}, 'year': {'text': '2003'},
'price': {'text': '39.95'}}]]
output format:
category : cooking,
title : ['Everyday Italian', 'lang': 'en'],
author : Giada De Laurentiis,
year : '2005',
price : '30.00'
category : children,
title : ['Harry Potter', 'lang': 'en'],
author : 'J K. Rowling',
year : '2005',
price : '29.99'
category : web,
title : [ 'XQuery Kick Start''lang': 'en'],
author :[ 'James McGovern' , 'Per Bothner','Kurt Cagle','James Linn', 'Vaidyanathan Nagarajan'],
year : '2003',
price : '49.99'
category : web,
cover : paperback,
title : [ 'Learning XML','lang': 'en'],
author : 'Erik T. Ray',
year : '2003',
price : '39.95'
A simple loop like the following should get the output you require.
for entry in data[0]:
for k, v in entry.items():
print(k, ':', v)
def printBook(d):
del d['text']
for i in d:
if type(d[i]) == dict:
if len(d[i])==1:
d[i] = list(d[i].values())[0]
else:
d[i] = [('' if j=='text' else (j+':')) + d[i][j] for j in d[i]]
s = '\n'
for i,j in d.items():
s += f' {i} : {j} ,\n'
print(s)
try these it prints individual dictionary into your described format
Take a look at the pprint module which provides a nice way of printing data structures without the need for writing your own formatter.
Thanks for the coding excercise. Man, that output format was specific! :D Tricky, as the strings are not quoted if being standalone, quoted if coming from text attribute. Also tricky, that stuff must be thrown into [] if not just text. Also it is a little bit underspecified, because hey, what if there is no text at all, yet other keys?
Output format disclaimer:
I think there is a missing , after 'XQuery Kick Start'
I think Giada De Laurentiis should have been quoted as it is coming from 'text'
import copy
def transform_value(stuff):
if type(stuff) is str:
return stuff
if type(stuff) is dict:
elements = []
text = stuff.pop("text", "")
if text:
elements.append(f"'{text}'") # we quote only if there was really a text entry
if not stuff: # dict just got empty
return elements[0] # no more attributes, so no [] needed around text
elements.extend(f"'{key}': '{transform_value(value)}'" for key, value in stuff.items())
if type(stuff) is list:
elements = [transform_value(e) for e in stuff]
# this will obviously raise an exception if stuff is not one of str, dict or list
return f"[{', '.join(elements)}]"
def transform_pub(d: dict):
d = copy.deepcopy(d) # we are gonna delete keys, so we don't mess with the outer data
tail = d.pop("text", "")
result = ",\n".join(f'{key} : {transform_value(value)}' for key, value in d.items())
return result + tail
if __name__ == "__main__":
for sublist in data:
for pub in sublist:
print(transform_pub(pub))
I first wanted to use somehow the same mechanism for the publications themselves via some recursion. But then the code become too complicated as the text field is appended for publications, while it is coming first for attributes.
Once I let go of the fully structured solution, I started out with a test for the publication printer:
import pytest
from printing import transform_value
#pytest.mark.parametrize('input,output', [
("cooking", "cooking"),
({"text": "J K. Rowling"}, "'J K. Rowling'"),
({"lang": "en", "text": "Everyday Italian"},
"['Everyday Italian', 'lang': 'en']"),
([{"text": "James McGovern"},
{"text": "Per Bothner"},
{"text": "Kurt Cagle"},
{"text": "James Linn"},
{"text": "Vaidyanathan Nagarajan"}],
"['James McGovern', 'Per Bothner', 'Kurt Cagle', 'James Linn', 'Vaidyanathan Nagarajan']"
),
([{"text": "Joe"}], "['Joe']"),
({"a": "1"}, "['a': '1']"),
])
def test_transform_value(input, output):
assert transform_value(input) == output
How do I merge a specific value from one array of dicts into another array of dicts if a single specific value matches between them?
I have an array of dicts that represent books
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else'}]
and I have an array of dicts that represents authors
authors = [{'roles': ['author'], 'profile_picture': None, 'author_id': '123-456-789', 'name': 'Pat'}, {'roles': ['author'], 'profile_picture': None, 'author_id': '999-121-223', 'name': 'May'}]
I want to take the name from authors and add it to the dict in books where the books writer_id matches the authors author_id.
My end result would ideally change the book array of dicts to be (notice the first dict now has the value of 'name': 'Pat' and the second book has 'name': 'May'):
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow', 'name': 'Pat'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else', 'name': 'May'}]
My current solution is:
for book in books:
for author in authors:
if book['writer_id'] == author['author_id']:
book['author_name'] = author['name']
And this works. However, the nested statements bother me and feel unwieldy. I also have a number of other such structures so I end up with a function that has a bunch of code resembling this in it:
for book in books:
for author in authors:
if book['writer_id'] == author['author_id']:
book['author_name'] = author['name']
books_with_foo = []
for book in books:
for thing in things:
if something:
// do something
for blah in books_with_foo:
for book_foo in otherthing:
if blah['bar'] == stuff['baz']:
// etc, etc.
Alternatively, how would you aggregate data from multiple database tables into one thing... some of the data comes back as dicts, some as arrays of dicts?
Pandas is almost definitely going to help you here. Convert your dicts to DataFrames for easier manipulation, then merge them:
import pandas as pd
authors = [{'roles': ['author'], 'profile_picture': None, 'author_id': '123-456-789', 'name': 'Pat'}, {'roles': ['author'], 'profile_picture': None, 'author_id': '999-121-223', 'name': 'May'}]
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else'}]
df1 = pd.DataFrame.from_dict(books)
df2 = pd.DataFrame.from_dict(authors)
df1['author_id'] = df1.writer_id
df1 = df1.set_index('author_id')
df2 = df2.set_index('author_id')
result = pd.concat([df1, df2], axis=1)
you may find this page helpful for different ways of combining (merging, concatenating, etc) separate DataFrames.
I need to get index (number) of item in list, which contains a string c = "Return To Sender (Matrey Remix)". Then get information from this index. But I got numbers of all items in list. No errors
demo = json.loads(raw)
c = "Return To Sender (Matrey Remix)"
for i in (i for i, tr in enumerate(demo['tracks']) if str(tr['title']).find(c)):
print(i)
dict = demo['tracks'][i]
For example I have 7 track titles in result of code:
for tr in demo['tracks']:
print(tr['title'])
Track titles:
Return To Sender (Original Mix)
Return To Sender (Matrey Remix)
Return To Sender (Matrey Remix)
Return To Sender (Matrey Remix)
Return To Sender (Original Mix)
Return To Sender (Original Mix)
Return To Sender (Original Mix)
But output is empty
The demo object:
{
'mixes': [],
'packs': [],
'stems': [],
'tracks': [{
'id': 7407969,
'mix': 'Original Mix',
'name': 'Return To Sender',
'title': 'Return To Sender (Original Mix)',
}, {
'id': 7407971,
'mix': 'Matrey Remix',
'name': 'Return To Sender',
'title': 'Return To Sender (Matrey Remix)',
}, {
'id': 9011142,
'mix': 'Matrey Remix',
'name': 'Return To Sender',
'title': 'Return To Sender (Matrey Remix)',
}, {
'id': 7846774,
'mix': 'Matrey Remix',
'name': 'Return To Sender',
'title': 'Return To Sender (Matrey Remix)',
}, {
'id': 7407969,
'mix': 'Original Mix',
'name': 'Return To Sender',
'title': 'Return To Sender (Original Mix)',
}, {
'id': 9011141,
'mix': 'Original Mix',
'name': 'Return To Sender',
'type': 'track',
}, {
'id': 7789328,
'mix': 'Original Mix',
'name': 'Return To Sender',
'title': 'Return To Sender (Original Mix)',
}]
}
str.find() returns 0 when the text is found at the start:
>>> 'foo bar'.find('foo')
0
That is considered a false value in a boolean context:
>>> if 0:
... print('Found at position 0!')
...
>>>
If the text is not there, str.find() returns -1 instead. From the str.find() documentation:
Return the lowest index in the string where substring sub is found [...]. Return -1 if sub is not found.
This means that only if the text is at the start will your code not print anything. In all other cases (including not finding the title), the tracks will be printed.
Don't use str.find(). Use in to get True if the text is there, False if it is not:
for i in (i for i, tr in enumerate(demo['tracks']) if c in tr['title']):
Demo using your json data:
>>> c = "Return To Sender (Matrey Remix)"
>>> for i in (i for i, tr in enumerate(demo['tracks']) if c in tr['title']):
... print(i)
...
1
2
3