I have two list of dictionaries called Reviewers_dicts and Products_dicts
The keys are the following:
Products_dicts = ['ProductID', 'sku', 'name_title', 'description', 'list_price', 'sale_price', 'category', 'category_tree', 'average_product_rating', 'product_url', 'product_image_urls', 'brand', 'total_number_reviews', 'Reviews', 'Bought With']
Reviewers_dicts = ['Username', 'DOB', 'State', 'Reviewed_ProductID']
I want to write a generator function that takes a username and yields all the reviews the person with that username has written, one at a time.
So far I have tried:
def find_reviews(val):
for dict_key in reviewers_dicts
if dict_key["Username"] == val1:
if dict_key["Reviewed"] in reviewers_dicts == dict_key["ProductD"]:
print(products_dicts["Reviews"])
Eaxmple entry for
prodcuts_dicts = {'uniq_id': 'b6c0b6bea69c722939585baeac73c13d','total_number_reviews': 8, 'Reviews': [{'User': 'fsdv4141', 'Review': 'You never have to worry about the fit...Alfred Dunner clothing sizes are true to size and fits perfectly. Great value for the money.', 'Score': 2}, {'User': 'krpz1113', 'Review': 'Good quality fabric. Perfect fit. Washed very well no iron.', 'Score': 4}, {'User': 'mbmg3241', 'Review': 'I do not normally wear pants or capris that have an elastic waist, but I decided to try these since they were on sale and I loved the color. I was very surprised at how comfortable they are and wear really well even wearing all day. I will buy this style again!', 'Score': 4}, {'User': 'zeqg1222', 'Review': 'I love these capris! They fit true to size and are so comfortable to wear. I am planning to order more of them.', 'Score': 1}, {'User': 'nvfn3212', 'Review': 'This product is very comfortable and the fabric launders very well', 'Score': 1}, {'User': 'aajh3423', 'Review': 'I did not like the fabric. It is 100% polyester I thought it was different.I bought one at the store apprx two monts ago, and I thought it was just like it', 'Score': 5}, {'User': 'usvp2142', 'Review': 'What a great deal. Beautiful Pants. Its more than I expected.', 'Score': 3}, {'User': 'yemw3321', 'Review': 'Alfred Dunner has great pants, good fit and very comfortable', 'Score': 1}], 'Bought With': ['898e42fe937a33e8ce5e900ca7a4d924', '8c02c262567a2267cd207e35637feb1c', 'b62dd54545cdc1a05d8aaa2d25aed996', '0da4c2dcc8cfa0e71200883b00d22b30', '90c46b841e2eeece992c57071387899c']}
Example entry for
Reviewers_Dicts = [{'Username': 'bkpn1412', 'DOB': '31.07.1983', 'State': 'Oregon', 'Reviewed': ['cea76118f6a9110a893de2b7654319c0']}, {'Username': 'gqjs4414', 'DOB': '27.07.1998', 'State': 'Massachusetts', 'Reviewed': ['fa04fe6c0dd5189f54fe600838da43d3']}]
Ok, assuming your products_dicts (typo fixed) actually is a list of several dicts, this code will do what you want:
def find_reviews(user):
reviews = []
for entry in Reviewers_Dicts:
if entry['Username'] == user:
prod_list = entry['Reviewed']
break
for prod_id in prod_list:
for entry in products_dicts:
if entry['uniq_id'] == prod_id:
for rev in entry['Reviews']:
if rev['User'] == user:
yield rev['Review']
break
break
That said, as I commented earlier, your data structure (lists of dicts, applying to products_dicts, Reviewers_Dicts, Reviews (in products_dicts)) is far for optimal, forcing the code to loop through each list to find the relevant entry, and would be better replaced with actual dicts fo dicts.
Related
I have retrieved a JSON object from an API. The JSON object looks like this:
{'copyright': 'Copyright (c) 2020 The New York Times Company. All Rights '
'Reserved.',
'response': {'docs': [{'_id': 'nyt://article/e3e5e5e5-1b32-5e2b-aea7-cf20c558dbd3',
'abstract': 'LEAD: RESEARCHERS at the Brookhaven '
'National Laboratory are employing a novel '
'model to study skin cancer in humans: '
'they are exposing tiny tropical fish to '
'ultraviolet radiation.',
'byline': {'organization': None,
'original': 'By Eric Schmitt',
'person': [{'firstname': 'Eric',
'lastname': 'Schmitt',
'middlename': None,
'organization': '',
'qualifier': None,
'rank': 1,
'role': 'reported',
'title': None}]},
'document_type': 'article',
'headline': {'content_kicker': None,
'kicker': None,
'main': 'Tiny Fish Help Solve Cancer '
'Riddle',
'name': None,
'print_headline': 'Tiny Fish Help Solve '
'Cancer Riddle',
'seo': None,
'sub': None},
'keywords': [{'major': 'N',
'name': 'organizations',
'rank': 1,
'value': 'Brookhaven National '
'Laboratory'},
{'major': 'N',
'name': 'subject',
'rank': 2,
'value': 'Ozone'},
{'major': 'N',
'name': 'subject',
'rank': 3,
'value': 'Radiation'},
{'major': 'N',
'name': 'subject',
'rank': 4,
'value': 'Cancer'},
{'major': 'N',
'name': 'subject',
'rank': 5,
'value': 'Research'},
{'major': 'N',
'name': 'subject',
'rank': 6,
'value': 'Fish and Other Marine Life'}],
'lead_paragraph': 'RESEARCHERS at the Brookhaven '
'National Laboratory are employing a '
'novel model to study skin cancer in '
'humans: they are exposing tiny '
'tropical fish to ultraviolet '
'radiation.',
'multimedia': [],
'news_desk': 'Science Desk',
'print_page': '3',
'print_section': 'C',
'pub_date': '1989-12-26T05:00:00+0000',
'section_name': 'Science',
'snippet': '',
'source': 'The New York Times',
'type_of_material': 'News',
'uri': 'nyt://article/e3e5e5e5-1b32-5e2b-aea7-cf20c558dbd3',
'web_url': 'https://www.nytimes.com/1989/12/26/science/tiny-fish-help-solve-cancer-riddle.html',
'word_count': 870},
{'_id': 'nyt://article/32a2431d-623a-525b-a21d-d401be865818',
'abstract': 'LEAD: Clouds, even the ones formed by '
...and continues like that, too long to show all of it here.
Now, when I want to list just one headline, I use:
pprint(articles['response']['docs'][0]['headline']['print_headline'])
And I get the output
'Tiny Fish Help Solve Cancer Riddle'
The problem is when I want to pick out all of the headlines from this JSON object, and make a list of them. I tried:
index = 0
for headline in articles:
headlineslist = ['response']['docs'][index]['headline']['print_headline'].split("''")
index = index + 1
headlineslist
But I get the error TypeError: list indices must be integers or slices, not str
In other words, it worked when I "listed" just one headline, at index [0], but not when I try to repeat the process over each index. How do I iterate through each index to get a list of outputs like the first one?
To iterate over the document list you can just do the following:
for doc in (articles['response']['docs']):
print(doc['headline']['print_headline'])
This would print all headlines.
I have a data frame that has different data types (list, dictionary, list of dictionary, strings, etc).
df = pd.DataFrame([{'category': [{'id': 1, 'name': 'House Targaryen'}],
'connection': ['Rhaena Targaryen', 'Aegon Targaryen'],
'description': 'Jon Snow, born Aegon Targaryen, is the son of Lyanna Stark '
'and Rhaegar Targaryen, the late Prince of Dragonstone',
'name': 'Jon Snow'},
{'category': [{'id': 2, 'name': 'House Stark'},
{'id': 3, 'name': 'Nights Watch'}],
'connection': ['Robb Stark', 'Sansa Stark', 'Arya Stark', 'Bran Stark'],
'description': 'After successfully capturing a wight and presenting it to '
'the Lannisters as proof that the Army of the Dead are real, '
'Jon pledges himself and his army to Daenerys Targaryen.',
'name': 'Jon Snow'}])
I want to merge these two rows by Jon Snow and combine all other fields together so it looks like
name category description connection
Jon Snow ['House Targaryen','House Stark','Nights Watch'] Jon Snow, born ...... his army to Daenerys Targaryen. ['Rhaena Targaryen',...,'Bran Stark']
It might be a little tricky with list of dictionaries, since this is a toy example, it only contains two rows, and it's easy to explode it and combine two rows of category together. But I don't think it's practical to do that in my actual data set.
I also thought about using df.groupby('name').aggregate('category': func1,'description':func2, 'connection':func3) but I'm not sure if there's a build-in function for what I need.
Thank yall for helping!
Looking at your data, it might be possible to first do a simple groupby and sum. Then deal with the categories using list comprehension:
import pandas as pd
df = pd.DataFrame([{'category': [{'id': 1, 'name':'House Targaryen'}],
'name': 'Jon Snow',
'description':'Jon Snow, born Aegon Targaryen, is the son of Lyanna Stark and Rhaegar Targaryen, the late Prince of Dragonstone',
'connection':['Rhaena Targaryen', 'Aegon Targaryen']},
{'category': [{'id': 2, 'name': 'House Stark'},{'id': 3, 'name': 'Nights Watch'}],
'name': 'Jon Snow',
'description': 'After successfully capturing a wight and presenting it to the Lannisters as proof that the Army of the Dead are real, '
'Jon pledges himself and his army to Daenerys Targaryen.',
'connection':['Robb Stark', 'Sansa Stark', 'Arya Stark', 'Bran Stark']},
{"category":[{"id":4,"name":"Some house"}],
"name": "Some name",
"description": "some desc",
"connection":["connection 1"]}])
result = df.groupby("name").sum()
result["category"] = [[item.get("name") for item in i] for i in result["category"]]
result.reset_index(inplace=True)
print (result)
#
name category description connection
0 Jon Snow [House Targaryen, House Stark, Nights Watch] Jon Snow, born Aegon Targaryen, is the son of ... [Rhaena Targaryen, Aegon Targaryen, Robb Stark...
1 Some name [Some house] some desc [connection 1]
How do I merge a specific value from one array of dicts into another array of dicts if a single specific value matches between them?
I have an array of dicts that represent books
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else'}]
and I have an array of dicts that represents authors
authors = [{'roles': ['author'], 'profile_picture': None, 'author_id': '123-456-789', 'name': 'Pat'}, {'roles': ['author'], 'profile_picture': None, 'author_id': '999-121-223', 'name': 'May'}]
I want to take the name from authors and add it to the dict in books where the books writer_id matches the authors author_id.
My end result would ideally change the book array of dicts to be (notice the first dict now has the value of 'name': 'Pat' and the second book has 'name': 'May'):
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow', 'name': 'Pat'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else', 'name': 'May'}]
My current solution is:
for book in books:
for author in authors:
if book['writer_id'] == author['author_id']:
book['author_name'] = author['name']
And this works. However, the nested statements bother me and feel unwieldy. I also have a number of other such structures so I end up with a function that has a bunch of code resembling this in it:
for book in books:
for author in authors:
if book['writer_id'] == author['author_id']:
book['author_name'] = author['name']
books_with_foo = []
for book in books:
for thing in things:
if something:
// do something
for blah in books_with_foo:
for book_foo in otherthing:
if blah['bar'] == stuff['baz']:
// etc, etc.
Alternatively, how would you aggregate data from multiple database tables into one thing... some of the data comes back as dicts, some as arrays of dicts?
Pandas is almost definitely going to help you here. Convert your dicts to DataFrames for easier manipulation, then merge them:
import pandas as pd
authors = [{'roles': ['author'], 'profile_picture': None, 'author_id': '123-456-789', 'name': 'Pat'}, {'roles': ['author'], 'profile_picture': None, 'author_id': '999-121-223', 'name': 'May'}]
books = [{'writer_id': '123-456-789', 'index': None, 'title': 'Yellow Snow'}, {'writer_id': '888-888-777', 'index': None, 'title': 'Python for Dummies'}, {'writer_id': '999-121-223', 'index': 'Foo', 'title': 'Something Else'}]
df1 = pd.DataFrame.from_dict(books)
df2 = pd.DataFrame.from_dict(authors)
df1['author_id'] = df1.writer_id
df1 = df1.set_index('author_id')
df2 = df2.set_index('author_id')
result = pd.concat([df1, df2], axis=1)
you may find this page helpful for different ways of combining (merging, concatenating, etc) separate DataFrames.
I have a problem when I try to merge two dictionaries to fit for doing a post later. For some reason the get seems to be nested and Im not sure how to clean it up. Would be great to get some tips on optimizing the code as well, right now it looks a bit messy.
for network in networks:
post_dict = {e1:e2 for e1,e2 in network['extattrs'].iteritems() if e1 not in keys }
pprint (post_dict['Stuff-Name']['value'])
post_dict['name'] = post_dict.pop('Stuff-Name')
post_dict['sid'] = post_dict.pop('Stuff-id')
dict_to_post = merge_two_dicts(post_dict, default_keys)
network:
{u'_ref': u'ref number',
u'comment': u'Name of object',
u'extattrs': {u'Network-Type': {u'value': u'Internal'},
u'Stuff-Id': {u'value': 110},
u'Stuff-Name': {u'value': u'Name of object'}},
u'network': u'Subnet-A',
u'network_view': u'default'}
default_keys:
default_keys = {'status':'Active',
'group':None,
'site':'City-A',
'role':'Production',
'description':None,
'custom_fields':None,
'tenant':None}
post_dict:
{'name': {u'value': u'Name of object'},
'sid': {u'value': 110}}
So what I want to achive is to get rid of the nested keys (within key "name" and "sid" so the key and value pair should be "name: Name of object" and "sid: 110"
The post function is not yet defined.
In my understanding, you case is really specific and I would probably go for a easy & dirty solution. First of all have you tried this:
post_dict['name'] = (post_dict.pop('Stuff-Name'))['value']
Secondly, how about making use of the "filter and renaming" and collapse the indexing there? This is not advisable, but if you are trying to do a lazy work-around it will suffice. I recommend you go with my first suggestion, as I'm pretty confident that it will solve your issue.
To get this first value of any nested dictionary you could use this
d = {'custom_fields': None, 'description': None, 'group': None, 'name':
{'value': 'Name of object'}, 'role': 'Production', 'site': 'City-A',
'status': 'Active', 'tenant': None, 'sid': {'value': 110}}
for key in d.keys():
if type(d[key]) == dict:
d[key] = d[key].popitem()[1]
It returns
{'custom_fields': None, 'description': None, 'group': None, 'name': 'Name of
object', 'role': 'Production', 'site': 'City-A', 'status': 'Active',
'tenant': None, 'sid': 110}
I think it's this step that's causing the dictionaries to be nested in the first place
post_dict['name'] = post_dict.pop('Stuff-Name')
post_dict['sid'] = post_dict.pop('Stuff-id')
You could try popitem()[1] here if you'll only ever need value of that dictionary and not the key.
This is a response i get from my wit.ai app using a python client.
All i want to do is extract:
intent value field.
The entity type.
The value field of the entity.
{'msg_id': '0KqBWZaeY9qKeVvdv3n', '_text': 'what is the temperature', 'entities': {'on_off': [{'confidence': 0.98730879525862, 'value': 'on'}], 'intent': [{'confidence': 0.99846661176623, 'value': 'get_temperature'}]}}
Please note that the message can be different each time. Hard-coding locations in the dictionary might not be a great idea.
{'msg_id': '0GN7pJRwYincs2p7xCo', '_text': 'turn light 1 off', 'entities': {'number': [{'confidence': 1, 'value': 1, 'type': 'value'}], 'on_off' [{'confidence': 0.96433768880251, 'value': 'off'}], 'intent': [{'confidence': 0.99552821331643, 'value': 'lights'}]}}
Assuming you have this output stored in a variable, like so:
dictionary = {'msg_id': '0KqBWZaeY9qKeVvdv3n', '_text': 'what is the temperature', 'entities': {'on_off': [{'confidence': 0.98730879525862, 'value': 'on'}], 'intent': [{'confidence': 0.99846661176623, 'value': 'get_temperature'}]}}
Intent value would be here:
intentValue = dictionary['entities']['intent'][0]['value']
Entity value would be here:
entityValue = dictionary['entities']['on_off'][0]['value']
I don't understand what you mean by entity type.