List Comprehension returns empty list - python

I'm trying to query a MongoDB database and throw the two sets of results ('_id' and 'Team') into two separate lists.
import pymongo
client = pymongo.MongoClient('localhost:27017')
db = client['db_name']
query = {'Team': {'$exists': 1}}
projection = {'_id': 1, 'Team': 1}
data = db['collection_name'].find(query, projection) # line 9
id_list = [value for dict in data for key, value in dict.iteritems() if key == '_id']
teams_list = [value for dict in data for key, value in dict.iteritems() if key == 'Team']
print id_list
print teams_list
client.close()
For the code above, the 'id_list' is as expected but 'teams_list' is empty. When I put 'teams_list' before 'id_list' I get the expected 'teams_list' output and 'id_list' is empty. And when I repeat the data call (line 9) in between the two list comprehensions I get the expected output for both lists.
Any idea why this is happening?

You need to define your data as:
data = list(db['collection_name'].find(query, projection))
As find() returns the generator. Once you iterate the values, those are lost. You need to store them as list. Here list() does that i.e. stores the items returns by generator as list.
Instead of iterating the list twice, better way will be two do it single loop as:
id_list, teams_list = [], []
# v `dict` is in-built data type, you should not be using it as variable
for d in data:
for key, value in d.iteritems():
if key == '_id':
id_list.append(value)
elif key == 'Team':
teams_list.append(value)
Refer Generator wiki for more information related to generators

As already mentioned the culprit here is the find() method which returns a Cursor object which is consumed when you iterate it the first time.
But you are using the wrong method for the job. You need to use the .aggregate() method.
query = {'Team': {'$exists': 1}}
cursor = db['collection_name'].aggregate([
{'$match': query }
{ '$group': {
'_id': None,
'id_list': {'$push': '$_id'},
'teams_list': {'$push': '$Team'}
}}
])
The .aggregate() method like his partner in crime .find() returns a CommandCursor over the result set which is a generator like object.
Because we are grouping by None, iterating the cursor will yield a single document which means that you can safely do:
print list(cursor)[0] # return a dictionary
or
result = list(cursor)[0]
print result['id_list']
print result['teams_list']

Related

Update Python dictionary while looping

I was trying to iterate over a list of values, craft a dictionary to save each value in a structured way, and then append the dictionary to a new list of results, but I found an unexpected behavior.
Below is an example:
values_list = [1,2,3]
# Basic dict
result_dict = {
'my_value': ''
}
# Iterate, craft a dictionary, and append result
dicts_list = []
for value in values_list:
result_dict.update({'my_value': value})
dicts_list.append(result_dict)
print(dicts_list)
As you can see, first I create a basic dictionary, then I'm iterating over the list of values and updating the dictionary, at the end I'm appending the crafted dictionary to a separate list of results (dicts_list).
As a result I was expecting:
[{'my_value': 1}, {'my_value': 2}, {'my_value': 3}]
but instead I was getting:
[{'my_value': 3}, {'my_value': 3}, {'my_value': 3}]
It looks like every iteration is not only updating the basic dictionary – which is expected – but also the dictionaries already appended to the list of results on the previous iteration.
To fix the issue, I nested the basic dictionary under the for loop:
values_list = [1,2,3]
# Iterate, craft a dictionary, and append result
dicts_list = []
for value in values_list:
result_dict = {'my_value': ''}
result_dict.update({'my_value': value})
dicts_list.append(result_dict)
print(dicts_list)
Can anyone explain what is wrong with the first approach? How is the loop causing the list of appended dictionaries to be updated?
Thanks for any advice! :)
Franz
As explained in the comment, you're appending the same dictionary in each iteration because update() modifies the result_dict rather than returning a copy. So, the only change you need to do is to append a copy of the crafted dictionary. For example:
values_list = [1,2,3]
# Basic dict
result_dict = {
'my_value': ''
}
# Iterate, craft a dictionary, and append result
dicts_list = []
for value in values_list:
result_dict.update({'my_value': value})
dicts_list.append(dict(result_dict)) # <--- this is the only change
print(dicts_list)
To gain understanding of How is the loop causing the list of appended dictionaries to be updated? you can use Python Tutor: Visualize code in Python with the code you provided in your question to see the effect of executing the code line by line with the final result being following visualization:
I suggest you read also Facts and myths about Python names and values.

python updating dictionary with list as the value type

I'm trying to iterate through data extracted from a file and store them in a dictionary based on each data's id
These are the id (str) for the data : "sensor", "version", "frame", "function"
And the data are in hexadecimal string.
What I bascially start with is a huge list of tuples in a form of id and data (that i extracted from a file)
example_list = [("sensor", 245), ("frame", 455), ("frame", 77)] and so on
This example_list stores all the data, so it has information of data for all the id.
I want to make a dictionary with id as key and list of data as value so when done iterating through the example_list, I have list of values for specific id (so I can iterate through the value list to get all the data for a specific id (the key))
To start, all values (list) will start with an empty list
my_dict = {"sensor": [], "frame": [], "version": [], "function": []}
Then, as I iterate through example_list, if the id is in my_dict as a key, I append the value to the values list in my_dict
for itm in example_list:
if itm[0] in my_dict:
tmp = my_dict[itm[0]] # since itm[0] is the id
tmp.append(itm[1])
my_dict[itm[0]] = tmp # update the list
When I tried this, it seems like the final my_dict's value list has the value of the lastest data
What I mean by this is if
example_list = [("sensor", 245), ("frame", 455), ("frame", 77)]
then
my_dict = {"sensor": [245], "frame": [77], "version": [], "function": []}
I may be wrong about this interpretation (since the data I'm reading is really big), but when I printed my_dict in the end of function, each value list had only one data inside, which is far off from what I expected (list of data instead of just one)
I tried searching and people used update function to update the dictionary but that one also didn't seem to work and gave me somehting unhashable error/warning.
Any way to implement what I want to do?
try doing it like so:
for itm in example_list:
if itm[0] in my_dict:
my_dict[itm[0]].append(itm[1])
Your code is working as required. To simplify, as you've already instantiated the dict with empty lists:
for i,j in example_list:
my_dict[i].append(j)
print(my_dict)
Output:
{'sensor': [245], 'frame': [455, 77], 'version': [], 'function': []}
What you want to do is:
for itm in example_list:
if itm[0] in my_dict.keys(): # have to look if keys match
my_dict[itm[0]].append(itm[1]) # # directly access the key-value pair
Your problem was that you created a new list and appended your item to it each time the loop was run, therefore the old data was deleted everytime.

Trying to nest a dictionary from a previous dictionary

so I have the following scenario:
dictionary=[
{category1:clothes, category2:cheap, category3:10},
{category1:clothes, category2:normal, category3:20}]
I need a dictionary that goes {clothes:{cheap:10, normal:20}}
All I have figured out is something that prints them individually
for i in range(len(dictionary)):
print({dictionary[i]['category1']:{dictionary[i][category2],dictionary[i][category3]}}
But it prints them individually, and I can't figure out how to nest them together since this just gives me two dictionaries with the format I want, but the nested dictionary just has either the values from the first list or the second. I have also tried
[{item['category1']: {'Attribute': attr_key, 'Value': item[attr_key]}}
for item in dictionary for attr_key in item if attr_key != 'category1']
It is the same, it gives more lines whereas I just need one dictionary with cat1 and the other ones nested in its dictionary.
raw = {}
for item in dictionary:
value1 = item.get('category2')
value2 = item.get('category3')
raw.update({value1:value2})
data = {}
data[dictionary[0].get('category1')] = raw
Output:
{'clothes': {'cheap': 10, 'normal': 20}}
This should do it.
import collections
dictionary=[
{'category1':'clothes', 'category2':'cheap', 'category3':10},
{'category1':'clothes', 'category2':'normal', 'category3':20}
]
newdict = collections.defaultdict(dict)
for item in dictionary:
newdict[item['category1']].update({item['category2']: item['category3']})
print(newdict)

Convert a list-of-dictionaries to a dictionary

I have this list of dictionaries I want to convert to one dictionary
vpcs = [{'VPCRegion': 'us-east-1', 'VPCId': '12ededd4'},
{'VPCRegion': 'us-east-1', 'VPCId': '9847'},
{'VPCRegion': 'us-west-2', 'VPCId': '99485003'}]
I want to convert it to
{'us-east-1': '12ededd4', 'us-east-1': '9847', 'us-west-2': '99485003'}
I used this function
def convert_dict(tags):
return {tag['VPCRegion']:tag['VPCId'] for tag in tags}
but get this output it doesn't convert the first dictionary in the list
{'us-east-1': '9847', 'us-west-2': '99485003'}
Perhaps a list of dictionary may fit your need - see code below:
[{'us-east-1': '12ededd4'}, {'us-east-1': '9847'}, {'us-west-2': '99485003'}]
To elaborate on what other commented about dictionary key has to be unique, you can see that in the commented line which zip up the list_dict would result error if the 'vpcs' has 2 duplicate 'VPCRegion': 'us-east-1' and successfully create new dict if you take out one of the 'VPCRegion': 'us-east-1'.
vpcs = [{'VPCRegion': 'us-east-1', 'VPCId': '12ededd4'},
{'VPCRegion': 'us-east-1', 'VPCId': '9847'},
{'VPCRegion': 'us-west-2', 'VPCId': '99485003'}]
def changekey(listofdict):
new_dict = {}
new_list = []
for member in listofdict:
new_key = member['VPCRegion']
new_val = member['VPCId']
new_dict.update({new_key:new_val})
new_list.append({new_key:new_val})
return new_dict, new_list
dict1,list_dict=changekey(vpcs)
print(dict1)
print(list_dict)
#dict4=dict(zip(*[iter(list_dict)]*2))
#print(dict4)
Since your output must group several values under the same name, your output will be a dict of lists, not a dict of strings.
One way to quickly do it:
import collections
def group_by_region(vpcs):
result = collections.defaultdict(list)
for vpc in vpcs:
result[vpc['VPCRegion']].append(vpc['VPCId'])
return result
The result of group_by_region(vpcs) will be {'us-east-1': ['12ededd4', '9847'], 'us-west-2': ['99485003']}).
As an entertainment, here's a cryptic but efficient way to get this in one expression:
import itertools
{key: [rec['VPCId'] for rec in group]
for (key, group) in itertools.groupby(vpcs, lambda vpc: vpc['VPCRegion'])}

Update dictionary if in list

I'm running through an excel file reading line by line to create dictionaries and append them to a list, so I have a list like:
myList = []
and a dictionary in this format:
dictionary = {'name': 'John', 'code': 'code1', 'date': [123,456]}
so I do this: myList.append(dictionary), so far so good. Now I'll go into the next line where I have a pretty similar dictionary:
dictionary_two = {'name': 'John', 'code': 'code1', 'date': [789]}
I'd like to check if I already have a dictionary with 'name' = 'John' in myList so I check it with this function:
def checkGuy(dude_name):
return any(d['name'] == dude_name for d in myList)
Currently I'm writing this function to add the guys to the list:
def addGuy(row_info):
if not checkGuy(row_info[1]):
myList.append({'name':row_info[1],'code':row_info[0],'date':[row_info[2]]})
else:
#HELP HERE
in this else I'd like to dict.update(updated_dict) but I don't know how to get the dictionary here.
Could someone help so dictionary appends the values of dictionary_two?
I would modify checkGuy to something like:
def findGuy(dude_name):
for d in myList:
if d['name'] == dude_name:
return d
else:
return None # or use pass
And then do:
def addGuy(row_info):
guy = findGuy(row_info[1])
if guy is None:
myList.append({'name':row_info[1],'code':row_info[0],'date':[row_info[2]]})
else:
guy.update(updated_dict)
This answer suggestion is pasted on the comments where it was suggested that if "name" is the only criteria to search on then it could be used as a key in a dictionary instead of using a list.
master = {"John" : {'code': 'code1', 'date': [123,456]}}
def addGuy(row_info):
key = row_info[1]
code = row_info[0]
date = row_info[2]
if master.get(key):
master.get(key).update({"code": code, "date": date})
else:
master[key] = {"code": code, "date": date}
If you dict.update the existing data each time you see a repeated name, your code can be reduced to a dict of dicts right where you read the file. Calling update on existing dicts with the same keys is going to overwrite the values leaving you with the last occurrence so even if you had multiple "John" dicts they would all contain the exact same data by the end.
def read_file():
results = {name: {"code": code, "date": date}
for code, name, date in how_you_read_into_rows}
If you actually think that the values get appended somehow, you are wrong. If you wanted to do that you would need a very different approach. If you actually want to gather the dates and codes per user then use a defauldict appending the code,date pair to a list with the name as the key:
from collections import defaultdict
d = defaultdict(list)
def read_file():
for code, name, date in how_you_read_into_rows:
d["name"].append([code, date])
Or some variation depending on what you want the final output to look like.

Categories

Resources