Creating a function to initialise multiple dictionaries with defaultlist()? - python

Just a general question....
I need to initialise at least 50 different dictionaries, which then goes as one of the arguments for a function (make_somefunction) I made that involves returning a dictionary with customised keys and a list as values
Is there a way to initialise a dictionary and output it directly to the function?
Something like this?
from collections import defaultdict
def initialise(dict):
dict =defaultdict(list)
return (dict)
initialise(dict).make_somefunction(dict, custom_keys, custom_listofvalues)
Instead of
dict1 = defaultdict(list)
dict2 = defaultdict(list)
dict2 = defaultdict(list)
...
dict49 = defaultdict(list)
dict50 = defaultdict(list)
which would individually go as an argument creating different customised dictionaries
make_somefunction(dict1, animals, foods)
make_somefunction(dict2, patients, drugs)
...
make_somefunction(dict50, fridge, fruits)

You can pass a list of the defaultdicts by using a list comprehension.
make_somefunction([defaultdict() for i in range(50)])
By the way, read the python docs to get a better understanding of python.

Related

Python Nested Dictionary append - Solved with defaultdict(lambda: defaultdict(set))

done this millions of times in other languages by python method for this is escaping me.
Basically reading some data from database. Includes data like ID, Fruits, Colors, Name
I need to create an object / dictionary key on Name. Name then holds lists / dictionaries of Fruits and Colors.
{"greenhouse1": {"fruits": {"apples", "oranges"}
{"colors": {"red","orange"}}
I'm iterating through the db by row and the Name key will appear more than once. When it does fruits or colors may already be populated and need to append.
namedict = {}
db_rows = execute_query(cursor, args.store_proc)
for row in db_rows:
db_name = row.id
db_fruit = row.fruit
db_color = row.color
if db_name in namedict:
namedict[db_name]["fruits"].append(db_fruit)
namedict[db_name]["color"].append(db_color)
else:
namedict[db_name]["fruits"] = [db_fruit]
namedict[db_name]["color"] = [db_color]
collections.defaultdict is your friend: If you access a new key of a defaultdict, it is automatically initialised with the provided function (list or set in this case). Because you want a dictionary (e.g. "greenhouse1") of dictionaries ("fruits", "colors") with lists as values (separate fruits and colors), we need a nested defaultdict. The following should work:
from collections import defaultdict
db = defaultdict(lambda: defaultdict(list)) # alternative: set instead of list
db['greenhouse1']['fruits'].append('apples') # use `add` for sets
db['greenhouse1']['fruits'].append('oranges')
db['greenhouse1']['colors'] = ["red", "orange"]
db['greenhouse2']['fruits'].append('banana')
print(db)
# defaultdict(<function __main__.<lambda>()>,
# {'greenhouse1': defaultdict(list,
# {'fruits': ['apples', 'oranges'],
# 'colors': ['red', 'orange']}),
# 'greenhouse2': defaultdict(list, {'fruits': ['banana']})})
A defaultdict works like a regular dict, so don't get confused with the strange looking output. E.g. to access the fruits of greenhouse1 you can simply write db['greenhouse1']['fruits'] and you get back a list (or set).
First, you need quotes around fruits and color in the dictionary keys.
Second, you can't create a nested dictionary element until you create the dictionary.
You can simplify the code by using collections.defaultdict. Since you want nested dictionaries, you need nested defaultdicts.
from collections import defaultdict
namedict = defaultdict(lambda: defaultdict(set))
for row in db_rows:
namedict[row.id]['fruits'].add(row.fruit)
namedict[row.id]['colors'].add(row.color)

accelerate comparing dictionary keys and values to strings in list in python

Sorry if this is trivial I'm still learning but I have a list of dictionaries that looks as follow:
[{'1102': ['00576', '00577', '00578', '00579', '00580', '00581']},
{'1102': ['00582', '00583', '00584', '00585', '00586', '00587']},
{'1102': ['00588', '00589', '00590', '00591', '00592', '00593']},
{'1102': ['00594', '00595', '00596', '00597', '00598', '00599']},
{'1102': ['00600', '00601', '00602', '00603', '00604', '00605']}
...]
it contains ~89000 dictionaries. And I have a list containing 4473208 paths. example:
['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv',
'/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv',
...]
and what I want to do is group each path that contains the grouped values in the dict in the folder containing the key together.
I tried using for loops like this:
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for file in ct_paths:
for key, val in elem.items():
if (file[16:20] == key) and (any(x in file[21:26] for x in val)):
temp1.append(file)
grpd_cts.append(temp1)
but this takes around 30hours. is there a way to make it more efficient? any itertools function or something?
Thanks a lot!
ct_paths is iterated repeatedly in your inner loop, and you're only interested in a little bit of it for testing purposes; pull that out and use it to index the rest of your data, as a dictionary.
What does make your problem complicated is that you're wanting to end up with the original list of filenames, so you need to construct a two-level dictionary where the values are lists of all originals grouped under those two keys.
ct_path_index = {}
for f in ct_paths:
ct_path_index.setdefault(f[16:20], {}).setdefault(f[21:26], []).append(f)
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for key, val in elem.items():
d2 = ct_path_index.get(key)
if d2:
for v in val:
v2 = d2.get(v)
if v2:
temp1 += v2
grpd_cts.append(temp1)
ct_path_index looks like this, using your data:
{'1102': {'00575': ['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv'],
'00578': ['/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv']}}
The use of setdefault (which can be a little hard to understand the first time you see it) is important when building up collections of collections, and is very common in these kinds of cases: it makes sure that the sub-collections are created on demand and then re-used for a given key.
Now, you've only got two nested loops; the inner checks are done using dictionary lookups, which are close to O(1).
Other optimizations would include turning the lists in dict_list into sets, which would be worthwhile if you made more than one pass through dict_list.

Mapping Python Dict keys to another Dict

Due to some poor planning I have a script that expects a python dict with certain keys however, the other script that creates this dict is using a different naming convention.
Unfortunately, due to translations that have already taken place it looks like I'll need to convert the dict keys.
Basically go from
{'oldKey':'data'}
to
{'newKey':'data'}
I was thinking of creating a dict:
{'oldKey':'newKey'}
and iterate through the dict to convert from oldKey to newKey however is this the most efficient/pythonic way to do it?
I can think of a couple of ways to do this which use dictionaries, but one of them might be more efficient depending on the coverage of the key usage.
a) With a dictionary comprehension:
old_dict = {'oldkey1': 'val1', 'oldkey2': 'val2',
'oldkey3': 'val3', 'oldkey4': 'val4',
'oldkey5': 'val5'}
key_map = {'oldkey1': 'newkey1', 'oldkey2': 'newkey2',
'oldkey3': 'newkey3', 'oldkey4': 'newkey4',
'oldkey5': 'newkey5'}
new_dict = {newkey: old_dict[oldkey] for (oldkey, newkey) in key_map.iteritems()}
print new_dict['newkey1']
b) With a simple class that does the mapping. (Note that I have switched the order of key/value in key_map in this example.) This might be more efficient because it will use lazy evaluation - no need to iterate through all the keys - which may save time if not all the keys are used.
class DictMap(object):
def __init__(self, key_map, old_dict):
self.key_map = key_map
self.old_dict = old_dict
def __getitem__(self, key):
return self.old_dict[self.key_map[key]]
key_map = {'newkey1': 'oldkey1',
'newkey2': 'oldkey2',
'newkey3': 'oldkey3',
'newkey4': 'oldkey4',
'newkey5': 'oldkey5'}
new_dict2 = DictMap(key_map, old_dict)
print new_dict2['newkey1']
This will solve your problem:
new_dict={key_map[oldkey]: vals for (oldkey, vals) in old_dict.items()}

How to get keys by value in dictionary (python 2.7)

I have this dictionary that describes students courses:
the keys are names (string) and the values are lists of courses (string)
students_dict={"name1":["math","computer science", "statistics"],"name2":["algebra","statistics","physics"],"name3":["statistics","math","programming"]}
I want to create a function that gets this dictionery and returns a new one:
the keys will be the courses (string)
and the values will be lists of the students names who take this course (list of srtings)
course_students={"statistics":["name1","name2","name3"],"algebra":["name2"],"programming":["name3"],"computer science":["name1"],"physics":["name2"],"math":["name1","name3"]}
the order doen't matter.
edit: this is kind of what im trying to do
def swap_student_courses(students_dict):
students_in_each_cours={}
cours_list=[...]
cours_names=[]
for cours in cours_list:
if students_dict.has_key(cours)==True:
cours_names.append(...)
students_in_each_cours.append(cours_names)
return students_in_each_cours
I would use a defaultdict here for simplicity's sake, but know that you can accomplish the same with a regular dict:
from collections import defaultdict
students_dict={"name1":["math","computer science", "statistics"],
"name2":["algebra","statistics","physics"],
"name3":["statistics","math","programming"]}
course_students = defaultdict(list)
for name, course_list in students_dict.items():
for course in course_list:
course_students[course].append(name)
It can be done with a set comprehension (to first get a unique set of course names) followed by a dict comprehension (to associate course names with a list of students for whom that course appears in their respective list):
all_courses = {course for student_course_list in students_dict.values() for course in student_course_list}
course_students = {course:[student_name for student_name,student_course_list in students_dict.items() if course in student_course_list] for course in all_courses}
Your attempted approach neglected to search through each student's course list: you used students_dict.has_key(cours) forgetting that student names, not courses, are the keys of students_dict.
Here is simple function you could use.
from collections import defaultdict
def create_new_dict(old_dict):
new_dict = defaultdict(list)
for student, courses in old_dict.items():
for course in courses:
new_dict[course].append(student)
return new_dict
The only difference between python standard dict and defaultdict is that if you try to access non existing key in standard dict it will result in KeyError while in defaultdict it will set default value for that key to anything passed on the creation of that dict. In our case its empty list.
Implementation without defaultdict
def create_new_dict(old_dict):
new_dict = dict()
for student, courses in old_dict.items():
for course in courses:
try:
new_dict[course].append(student)
except KeyError:
new_dict[course] = [student]
return new_dict
EDIT----
The KeyError is raising in standard dict because if it is the first time we try to access some key, 'math' for example, it is not in the dictionary. Here is excellent explanation of dictionaries.
It is not a problem that values repeat because in that case we simply append new student to the list.

How to associate elements in a set with multiple dict entries

I'm extracting instances of three elements from an XML file: ComponentStr, keyID, and valueStr. Whenever I find a ComponentStr, I want to add/associate the keyID:valueStr to it. ComponentStr values are not unique. As multiple occurrences of a ComponentStr is read, I want to accumulate the keyID:valueStr for that ComponentStr group. The resulting accumulated data structure after reading the XML file might look like this:
ComponentA: key1:value1, key2:value2, key3:value3
ComponentB: key4:value4
ComponentC: key5:value5, key6:value6
After I generate the final data structure, I want to sort the keyID:valueStr entries within each ComponentStr and also sort all the ComponentStrs.
I'm trying to structure this data in Python 2. ComponentStr seem to work well as a set. The keyID:valueStr is clearly a dict. But how do I associate a ComponentStr entry in a set with its dict entries?
Alternatively, is there a better way to organize this data besides a set and associated dict entries? Each keyID is unique. Perhaps I could have one dict of keyID:some combo of ComponentStr and valueStr? After the data structure was built, I could sort it based on ComponentStr first, then perform some type of slice to group the keyID:valueStr and then sort again on the keyID? Seems complicated.
How about a dict of dicts?
data = {
'ComponentA': {'key1':'value1', 'key2':'value2', 'key3':'value3'},
'ComponentB': {'key4':'value4'},
'ComponentC': {'key5':'value5', 'key6':'value6'},
}
It maintains your data structure and mapping. Interestingly enough, the underlying implementation of dicts is similar to the implementation of sets.
This would be easily constructed a'la this pseudo-code:
data = {}
for file in files:
data[get_component(file)] = {}
for key, value in get_data(file):
data[get_component(file)][key] = value
in the case where you have repeated components, you need to have the sub-dict as the default, but add to the previous one if it's there. I prefer setdefault to other solutions like a defaultdict or subclassing dict with a __missing__ as long as I only have to do it once or twice in my code:
data = {}
for file in files:
for key, value in get_data(file):
data.setdefault([get_component(file)], {})[key] = value
It works like this:
>>> d = {}
>>> d.setdefault('foo', {})['bar'] = 'baz'
>>> d
{'foo': {'bar': 'baz'}}
>>> d.setdefault('foo', {})['ni'] = 'ichi'
>>> d
{'foo': {'ni': 'ichi', 'bar': 'baz'}}
alternatively, as I read your comment on the other answer say you need simple code, you can keep it really simple with some more verbose and less optimized code:
data = {}
for file in files:
for key, value in get_data(file):
if get_component(file) not in data:
data[get_component(file)] = {}
data[get_component(file)][key] = value
You can then sort when you're done collecting the data.
for component in sorted(data):
print(component)
print('-----')
for key in sorted(data[component]):
print(key, data[component][key])
I want to accumulate the keyID:valueStr for that ComponentStr group
In this case you want to have the keys of your dictionary as the ComponentStr, accumulating to me immediately goes to a list, which are easily ordered.
Each keyID is unique. Perhaps I could have one dict of keyID:some
combo of ComponentStr and valueStr?
You should store your data in a manner that is the most efficient when you want to retrieve it. Since you will be accessing your data by the component, even though your keys are unique there is no point in having a dictionary that is accessed by your key (since this is not how you are going to "retrieve" the data).
So, with that - how about using a defaultdict with a list, since you really want all items associated with the same component:
from collections import defaultdict
d = defaultdict(list)
with open('somefile.xml', 'r') as f:
for component, key, value in parse_xml(f):
d[component].append((key, value))
Now you have for each component, a list of tuples which are the associated key and values.
If you want to keep the components in the order that they are read from the file, you can use a OrderedDict (also from the collections module), but if you want to sort them in any arbitrary order, then stick with a normal dictionary.
To get a list of sorted component names, just sort the keys of the dictionary:
component_sorted = sorted(d.keys())
For a use case of printing the sorted components with their associated key/value pairs, sorted by their keys:
for key in component_sorted:
values = d[key]
sorted_values = sorted(values, key=lamdba x: x[0]) # Sort by the keys
print('Pairs for {}'.format(key))
for k,v in sorted_values:
print('{} {}'.format(k,v))

Categories

Resources