How to get keys by value in dictionary (python 2.7) - python

I have this dictionary that describes students courses:
the keys are names (string) and the values are lists of courses (string)
students_dict={"name1":["math","computer science", "statistics"],"name2":["algebra","statistics","physics"],"name3":["statistics","math","programming"]}
I want to create a function that gets this dictionery and returns a new one:
the keys will be the courses (string)
and the values will be lists of the students names who take this course (list of srtings)
course_students={"statistics":["name1","name2","name3"],"algebra":["name2"],"programming":["name3"],"computer science":["name1"],"physics":["name2"],"math":["name1","name3"]}
the order doen't matter.
edit: this is kind of what im trying to do
def swap_student_courses(students_dict):
students_in_each_cours={}
cours_list=[...]
cours_names=[]
for cours in cours_list:
if students_dict.has_key(cours)==True:
cours_names.append(...)
students_in_each_cours.append(cours_names)
return students_in_each_cours

I would use a defaultdict here for simplicity's sake, but know that you can accomplish the same with a regular dict:
from collections import defaultdict
students_dict={"name1":["math","computer science", "statistics"],
"name2":["algebra","statistics","physics"],
"name3":["statistics","math","programming"]}
course_students = defaultdict(list)
for name, course_list in students_dict.items():
for course in course_list:
course_students[course].append(name)

It can be done with a set comprehension (to first get a unique set of course names) followed by a dict comprehension (to associate course names with a list of students for whom that course appears in their respective list):
all_courses = {course for student_course_list in students_dict.values() for course in student_course_list}
course_students = {course:[student_name for student_name,student_course_list in students_dict.items() if course in student_course_list] for course in all_courses}
Your attempted approach neglected to search through each student's course list: you used students_dict.has_key(cours) forgetting that student names, not courses, are the keys of students_dict.

Here is simple function you could use.
from collections import defaultdict
def create_new_dict(old_dict):
new_dict = defaultdict(list)
for student, courses in old_dict.items():
for course in courses:
new_dict[course].append(student)
return new_dict
The only difference between python standard dict and defaultdict is that if you try to access non existing key in standard dict it will result in KeyError while in defaultdict it will set default value for that key to anything passed on the creation of that dict. In our case its empty list.
Implementation without defaultdict
def create_new_dict(old_dict):
new_dict = dict()
for student, courses in old_dict.items():
for course in courses:
try:
new_dict[course].append(student)
except KeyError:
new_dict[course] = [student]
return new_dict
EDIT----
The KeyError is raising in standard dict because if it is the first time we try to access some key, 'math' for example, it is not in the dictionary. Here is excellent explanation of dictionaries.
It is not a problem that values repeat because in that case we simply append new student to the list.

Related

How to look up keys in dictionary when shortened key names are in another list?

I'm trying to look up values in a dictionary by shortened keys (The keys in the dictionary are full length), where the shortened keys are in a different list.
For example, the list might be
names = ["Bob", "Albert", "Man", "Richard"],
and the dictionary I want to look up is as such:
location_dict {"Bob Franklin":"NYC", "Albert":"Chicago", "Manfred":"San Fransisco", "Richard Walker":"Houston"}
The code I have is:
for name in names:
if name.startswith(f'{name}') in location_dict:
print(location_dict[name.startswith('{name}')])
But this doesn't work (because it's self referential, and I need a list of full key names associated with the shortened ones, I understand that much). How do I get the values of a dictionary with shortened key names? I have a list of over a 1000 names like this, and I don't know the most efficient way to do this.
Expected output: If I look up "Bob" from the list of names as a key for the dictionary, then it should return "NYC" from the dictionary
Your condition is wrong. name.startswith(f'{name}') will always return True.
Try this -
for name in names:
for k, v in location_dict.items():
if k.startswith(name):
print(v)
To stop after the first match add a break statement.
for name in names:
for k, v in location_dict.items():
if k.startswith(name):
print(v)
break
Try this in just one line using any():
[v for k,v in location_dict.items() if any(k.startswith(name) for name in names)]

Creating a function to initialise multiple dictionaries with defaultlist()?

Just a general question....
I need to initialise at least 50 different dictionaries, which then goes as one of the arguments for a function (make_somefunction) I made that involves returning a dictionary with customised keys and a list as values
Is there a way to initialise a dictionary and output it directly to the function?
Something like this?
from collections import defaultdict
def initialise(dict):
dict =defaultdict(list)
return (dict)
initialise(dict).make_somefunction(dict, custom_keys, custom_listofvalues)
Instead of
dict1 = defaultdict(list)
dict2 = defaultdict(list)
dict2 = defaultdict(list)
...
dict49 = defaultdict(list)
dict50 = defaultdict(list)
which would individually go as an argument creating different customised dictionaries
make_somefunction(dict1, animals, foods)
make_somefunction(dict2, patients, drugs)
...
make_somefunction(dict50, fridge, fruits)
You can pass a list of the defaultdicts by using a list comprehension.
make_somefunction([defaultdict() for i in range(50)])
By the way, read the python docs to get a better understanding of python.

How to create a nested python dictionary with keys as strings?

Summary of issue: I'm trying to create a nested Python dictionary, with keys defined by pre-defined variables and strings. And I'm populating the dictionary from regular expressions outputs. This mostly works. But I'm getting an error because the nested dictionary - not the main one - doesn't like having the key set to a string, it wants an integer. This is confusing me. So I'd like to ask you guys how I can get a nested python dictionary with string keys.
Below I'll walk you through the steps of what I've done. What is working, and what isn't. Starting from the top:
# Regular expressions module
import re
# Read text data from a file
file = open("dt.cc", "r")
dtcc = file.read()
# Create a list of stations from regular expression matches
stations = sorted(set(re.findall(r"\n(\w+)\s", dtcc)))
The result is good, and is as something like this:
stations = ['AAAA','BBBB','CCCC','DDDD']
# Initialize a new dictionary
rows = {}
# Loop over each station in the station list, and start populating
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
The result is good, and is something like this:
rows['AAAA'] = ['AAAA 0.1132 0.32 P',...]
However, when I try to create a sub-dictionary with a string key:
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
I get the following error.
"TypeError: list indices must be integers, not str"
It doesn't seem to like that I'm specifying the second dictionary key as "dt". If I give it a number instead, it works just fine. But then my dictionary key name is a number, which isn't very descriptive.
Any thoughts on how to get this working?
The issue is that by doing
rows[station] = re.findall(...)
You are creating a dictionary with the station names as keys and the return value of re.findall method as values, which happen to be lists. So by calling them again by
rows[station]["dt"] = re.findall(...)
on the LHS row[station] is a list that is indexed by integers, which is what the TypeError is complaining about. You could do rows[station][0] for example, you would get the first match from the regex. You said you want a nested dictionary. You could do
rows[station] = dict()
rows[station]["dt"] = re.findall(...)
To make it a bit nicer, a data structure that you could use instead is a defaultdict from the collections module.
The defaultdict is a dictionary that accepts a default type as a type for its values. You enter the type constructor as its argument. For example dictlist = defaultdict(list) defines a dictionary that has as values lists! Then immediately doing dictlist[key].append(item1) is legal as the list is automatically created when setting the key.
In your case you could do
from collections import defaultdict
rows = defaultdict(dict)
for station in stations:
rows[station]["bulk"] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
Where you have to assign the first regex result to a new key, "bulk" here but you can call it whatever you like. Hope this helps.

How to associate elements in a set with multiple dict entries

I'm extracting instances of three elements from an XML file: ComponentStr, keyID, and valueStr. Whenever I find a ComponentStr, I want to add/associate the keyID:valueStr to it. ComponentStr values are not unique. As multiple occurrences of a ComponentStr is read, I want to accumulate the keyID:valueStr for that ComponentStr group. The resulting accumulated data structure after reading the XML file might look like this:
ComponentA: key1:value1, key2:value2, key3:value3
ComponentB: key4:value4
ComponentC: key5:value5, key6:value6
After I generate the final data structure, I want to sort the keyID:valueStr entries within each ComponentStr and also sort all the ComponentStrs.
I'm trying to structure this data in Python 2. ComponentStr seem to work well as a set. The keyID:valueStr is clearly a dict. But how do I associate a ComponentStr entry in a set with its dict entries?
Alternatively, is there a better way to organize this data besides a set and associated dict entries? Each keyID is unique. Perhaps I could have one dict of keyID:some combo of ComponentStr and valueStr? After the data structure was built, I could sort it based on ComponentStr first, then perform some type of slice to group the keyID:valueStr and then sort again on the keyID? Seems complicated.
How about a dict of dicts?
data = {
'ComponentA': {'key1':'value1', 'key2':'value2', 'key3':'value3'},
'ComponentB': {'key4':'value4'},
'ComponentC': {'key5':'value5', 'key6':'value6'},
}
It maintains your data structure and mapping. Interestingly enough, the underlying implementation of dicts is similar to the implementation of sets.
This would be easily constructed a'la this pseudo-code:
data = {}
for file in files:
data[get_component(file)] = {}
for key, value in get_data(file):
data[get_component(file)][key] = value
in the case where you have repeated components, you need to have the sub-dict as the default, but add to the previous one if it's there. I prefer setdefault to other solutions like a defaultdict or subclassing dict with a __missing__ as long as I only have to do it once or twice in my code:
data = {}
for file in files:
for key, value in get_data(file):
data.setdefault([get_component(file)], {})[key] = value
It works like this:
>>> d = {}
>>> d.setdefault('foo', {})['bar'] = 'baz'
>>> d
{'foo': {'bar': 'baz'}}
>>> d.setdefault('foo', {})['ni'] = 'ichi'
>>> d
{'foo': {'ni': 'ichi', 'bar': 'baz'}}
alternatively, as I read your comment on the other answer say you need simple code, you can keep it really simple with some more verbose and less optimized code:
data = {}
for file in files:
for key, value in get_data(file):
if get_component(file) not in data:
data[get_component(file)] = {}
data[get_component(file)][key] = value
You can then sort when you're done collecting the data.
for component in sorted(data):
print(component)
print('-----')
for key in sorted(data[component]):
print(key, data[component][key])
I want to accumulate the keyID:valueStr for that ComponentStr group
In this case you want to have the keys of your dictionary as the ComponentStr, accumulating to me immediately goes to a list, which are easily ordered.
Each keyID is unique. Perhaps I could have one dict of keyID:some
combo of ComponentStr and valueStr?
You should store your data in a manner that is the most efficient when you want to retrieve it. Since you will be accessing your data by the component, even though your keys are unique there is no point in having a dictionary that is accessed by your key (since this is not how you are going to "retrieve" the data).
So, with that - how about using a defaultdict with a list, since you really want all items associated with the same component:
from collections import defaultdict
d = defaultdict(list)
with open('somefile.xml', 'r') as f:
for component, key, value in parse_xml(f):
d[component].append((key, value))
Now you have for each component, a list of tuples which are the associated key and values.
If you want to keep the components in the order that they are read from the file, you can use a OrderedDict (also from the collections module), but if you want to sort them in any arbitrary order, then stick with a normal dictionary.
To get a list of sorted component names, just sort the keys of the dictionary:
component_sorted = sorted(d.keys())
For a use case of printing the sorted components with their associated key/value pairs, sorted by their keys:
for key in component_sorted:
values = d[key]
sorted_values = sorted(values, key=lamdba x: x[0]) # Sort by the keys
print('Pairs for {}'.format(key))
for k,v in sorted_values:
print('{} {}'.format(k,v))

How to go from a values_list to a dictionary of lists

I have a django queryset that returns a list of values:
[(client pk, timestamp, value, task pk), (client pk, timestamp, value, task pk),....,].
I am trying to get it to return a dictionary of this format:
{client pk:[[timestamp, value],[timestamp, value],...,], client pk:[list of lists],...,}
The values_list may have multiple records for each client pk. I have been able to get dictionaries of lists for client or task pk using:
def dict_from_vl(vls_list):
keys=[values_list[x][3] for x in range(0,len(values_list),1)]
values = [[values_list[x][1], values_list[x][2]] for x in range(0,len(values_list),1)]
target_dict=dict(zip(keys,values))
return target_dict
However using this method, values for the same key write over previous values as it iterates through the values_list, rather than append them to a list. So this works great for getting the most recent if the values list is sorted oldest records to newest, but not for the purpose of creating a list of lists for the dict value.
Instead of target_dict=dict(zip(keys,values)), do
target_dict = defaultdict(list)
for i, key in enumerate(keys):
target_dict[k].append(values[i])
(defaultdict is available in the standard module collections.)
from collections import defaultdict
d = defaultdict(list)
for x in vls_list:
d[x].append(list(x[1:]))
Although I'm not sure if I got the question right.
I know in Python you're supposed to cram everything into a single line, but you could do it the old fashioned way...
def dict_from_vl(vls_list):
target_dict = {}
for v in vls_list:
if v[0] not in target_dict:
target_dict[v[0]] = []
target_dict[v[0]].append([v[1], v[2]])
return target_dict
For better speed, I suggest you don't create the keys and values lists separately but simply use only one loop:
tgt_dict = defaultdict(list)
for row in vas_list:
tgt_dict[row[0]].append([row[1], row[2]])

Categories

Resources