Difference between dict.values and dict[key].values - python

What is the difference between studentsDict.values() and studentsDict[key].values in the following code?
studentsDict = {'Ayush': {'maths': 24, 'english': 19, 'hindi': 97, 'bio': 20, 'science': 0}, 'Pankaj': {'maths': 52, 'english': 76, 'hindi': 68, 'bio': 97, 'science': 66}, 'Raj': {'maths': 85, 'english': 79, 'hindi': 51, 'bio': 36, 'science': 75}, 'iC5z4DK': {'maths': 24, 'english': 92, 'hindi': 31, 'bio': 29, 'science': 91}, 'Zf1WSV6': {'maths': 81, 'english': 58, 'hindi': 85, 'bio': 31, 'science': 7}}
for key in studentsDict.keys():
for marks in studentsDict[key].values():
if marks < 33:
print(key, "FAILED")
break

studentsDict.keys() gives you each of the keys in the outer dict: "Ayush", "Pankaj", "Raj", "iC5z4DK" and "Zf1WSV6".
studentsDict[key].values() gives you the values for the entry in studentsDict corresponding to key. For example, if key is "Ayush", you would get 24, 19, 97, 20, and 0.

Related

Pandas groupby() and mean() functions in Python

How can I get the average marks for each student from following dataframe using Pandas groupby() and mean() methods?
The aim is to get the average marks in ascending order of all students.
import pandas as pd
# Marks of students in class 4A and 4B
data = {
'S4A': {
'Name': ['Amy', 'Mandy', 'Daisy', 'Ben', 'Peter', 'John'],
'Maths': [99, 87, 88, 70, 88, 76],
'Chemistry': [89, 90, 90, 90, 89, 82],
'Physics': [79, 97, 68, 80, 72, 95],
'English': [90, 65, 56, 67, 86, 82],
'Biology': [79, 89, 59, 70, 79, 78],
'History': [75, 81, 78, 55, 68, 84]
},
'S4B': {
'Name': ['Allen', 'Gordon', 'Jimmy', 'Nancy', 'Sammy', 'William'],
'Maths': [90, 86, 88, 80, 85, 86],
'Chemistry': [89, 78, 88, 90, 79, 82],
'Physics': [89, 97, 78, 81, 82, 55],
'English': [80, 85, 86, 77, 86, 82],
'Biology': [75, 89, 69, 70, 79, 78],
'History': [79, 81, 80, 65, 68, 84]
}
}
# list of subjects
subjects = ['Maths', 'Chemistry', 'Physics', 'English', 'Biology', 'History']
# create dataframe
df = pd.DataFrame(data)
You need to create a dataframe for each class and then compute mean or concat all the classes and compute mean.
df = pd.concat([pd.DataFrame(data[k]) for k in data], ignore_index=True)
mean_df = df.set_index('Name').mean(1)
print(mean_df)
Name
Amy 85.166667
Mandy 84.833333
Daisy 73.166667
Ben 72.000000
Peter 80.333333
John 82.833333
Allen 83.666667
Gordon 86.000000
Jimmy 81.500000
Nancy 77.166667
Sammy 79.833333
William 77.833333
dtype: float64

Converting a list to dictionary having key value pairs

I have a list and i would like to convert it into a dictionary such that key:value pairs should be like
{'apple':87, 'fan':88 ,'jackal':89,...}
Following is the list :
values_list = ['apple', 87, 'fan', 88, 'jackal', 89, 'bat', 98, 'car', 84, 'ice', 80, 'car', 86, 'apple', 82, 'goat', 80, 'dog', 81, 'cat', 80, 'eagle', 90, 'eagle', 98, 'hawk', 89, 'dog', 79, 'fan', 89, 'goat', 85, 'car', 81, 'hawk', 90, 'ice', 85, 'cat', 78, 'goat', 84, 'jackal', 90, 'apple', 80, 'ice', 87, 'bat', 94, 'bat', 92, 'jackal', 91, 'eagle', 93, 'fan', 85]
following is the python script written to do the task :
for i in range(0,length(values_list),2):
value_count_dict = {values_list[i] : values_list[i+1]}
print(value_count_dict)
values_count_dict = dict(value_count_dict)
print(values_count_dict)
output of the script :
But expecting a single dictionary with all key:value pairs in it.
Thank you in advance!
You've misspelled len as length.
The most Pythonic way of doing this is likely with a list comprehension and range using the step argument.
[{values_list[i]: values_list[i+1]} for i in range(0, len(values_list), 2)]
# [{'apple': 87}, {'fan': 88}, {'jackal': 89}, {'bat': 98}, {'car': 84}, {'ice': 80}, {'car': 86}, {'apple': 82}, {'goat': 80}, {'dog': 81}, {'cat': 80}, {'eagle': 90}, {'eagle': 98}, {'hawk': 89}, {'dog': 79}, {'fan': 89}, {'goat': 85}, {'car': 81}, {'hawk': 90}, {'ice': 85}, {'cat': 78}, {'goat': 84}, {'jackal': 90}, {'apple': 80}, {'ice': 87}, {'bat': 94}, {'bat': 92}, {'jackal': 91}, {'eagle': 93}, {'fan': 85}]
In your code you create a new dictionary on each iteration, but you don't store them anywhere, so value_count_dict at the end of the loop is just the last pair.
value_counts = []
for i in range(0, len(values_list), 2):
value_count_dict = {values_list[i]: values_list[i+1]}
print(value_count_dict)
value_counts.append(value_count_dict)
Here we made a for loop that starts at 0 and ends at length of our list and the step is set to 2 because we can find the next key of our dictionary 2 step ahead. We have our key at x and the value at x+1 index of our list respectively. We have updated the key and value in the initially created empty dictionary.
values_list = ['apple', 87, 'fan', 88, 'jackal', 89, 'bat', 98, 'car', 84, 'ice', 80, 'car', 86, 'apple', 82, 'goat', 80, 'dog', 81, 'cat', 80, 'eagle', 90, 'eagle', 98, 'hawk', 89, 'dog', 79, 'fan', 89, 'goat', 85, 'car', 81, 'hawk', 90, 'ice', 85, 'cat', 78, 'goat', 84, 'jackal', 90, 'apple', 80, 'ice', 87, 'bat', 94, 'bat', 92, 'jackal', 91, 'eagle', 93, 'fan', 85]
final_dict={}
for x in range(0,len(values_list),2):
final_dict[values_list[x]]=values_list[x+1]
print(final_dict)
Try zip:
dct = dict(
zip(
values_list[0::2],
values_list[1::2],
)
)
For duplicate keys in your list, the last value will be taken.
You cannot have a duplicated keys as mentioned in above comments but you may try to have the values as list for the duplicated keys such as:
result = {}
l=values_list
for i in range(0, len(l), 2):
result.setdefault(l[i], []).append(l[i+1])
print(result)
and your output would look like:
{'apple': [87, 82, 80], 'fan': [88, 89, 85], 'jackal': [89, 90, 91], 'bat': [98, 94, 92], 'car': [84, 86, 81], 'ice': [80, 85, 87], 'goat': [80, 85, 84], 'dog': [81, 79], 'cat': [80, 78], 'eagle': [90, 98, 93], 'hawk': [89, 90]}

Shifting label numbers by new string

I have an example of annotation file
{'text': "BELGIE BELGIQUE BELGIEN\nIDENTITEITSKAART CARTE D'IDENTITE PERSONALAUSWEIS\nBELGIUM\nIDENTITY CARD\nNaam / Name\nDermrive\nVoornamen / Given names\nBrando Jerom L\nGeslacht / Nationaliteit /\nGeboortedatum /\nSex\nNationality\nDate of birth\nM/M\nBEL\n19 05 1982\nRijksregisternr. 7 National Register Nº\n85.08.23-562.77\nKaartnr. / Card Nº\n752-0465474-34\nVervalt op / Expires on\n23 07 2025\n", 'spans': [{'start': 24, 'end': 40, 'token_start': 16, 'token_end': 16, 'label': 'CardType'}, {'start': 41, 'end': 57, 'token_start': 16, 'token_end': 16, 'label': 'CardType'}, {'start': 58, 'end': 73, 'token_start': 15, 'token_end': 15, 'label': 'CardType'}, {'start': 108, 'end': 116, 'token_start': 8, 'token_end': 8, 'label': 'LastName'}, {'start': 141, 'end': 155, 'token_start': 14, 'token_end': 14, 'label': 'FirstName'}, {'start': 229, 'end': 232, 'token_start': 3, 'token_end': 3, 'label': 'Gender_nid'}, {'start': 233, 'end': 236, 'token_start': 3, 'token_end': 3, 'label': 'Nationality_nid'}, {'start': 237, 'end': 247, 'token_start': 10, 'token_end': 10, 'label': 'DateOfBirth_nid'}, {'start': 288, 'end': 303, 'token_start': 15, 'token_end': 15, 'label': 'Ssn'}, {'start': 323, 'end': 337, 'token_start': 14, 'token_end': 14, 'label': 'CardNumber'}, {'start': 362, 'end': 372, 'token_start': 10, 'token_end': 10, 'label': 'ValidUntil_nid'}]}
So when a i have a start and end position of "LastName"entity, in the example is "Dermrive", when i produce another, shorter or longer LastName for example "Brad", i need to change all the rest by difference of this words, so that other labels stays in the correct postition. Its works perfecly with one entity, but when i try to change all of them, the output is messy and labels are not correct anymore.
def replace_text_by_index_and_type(self, new_text, type):
label_position = self.search_label_position_in_spans(self.annotation['spans'], type.value)
label = self.annotation['spans'][label_position]
begin_new_string = self.annotation["text"][:label["start"]]
end_new_string = self.annotation["text"][label["end"]:]
new_string = begin_new_string + new_text + end_new_string
for to_change_ent in self.annotation['spans'][label_position+1:]:
diff = len(new_text) - (label["end"] - label["start"])
self.annotation['spans'][label_position]["end"] = self.annotation['spans'][label_position]["end"] + diff
#print(f"Diff between original {to_change_ent} and new_string: {diff}")
to_change_ent["start"] += diff
to_change_ent["end"] += diff
return new_string
I start to change all entities from the second one, to keep the start position of first one. And add diff to ending position of first entity, as a results the firstname and lastname are correct, but other entities are shifted to mess.

How to compared two lists in python and return them in dictionary

I have two lists:
names: ['Mary', 'Jack', 'Rose', 'Mary', 'Carl', 'Fred', 'Meg', 'Phil', 'Carl', 'Jack', 'Fred', 'Mary', 'Phil', 'Jack', 'Mary', 'Fred', 'Meg']
grades: [80, 88, 53, 80, 64, 61, 75, 80, 91, 82, 68, 76, 95, 58, 89, 51, 81, 78]
I want to be able to take the average of each persons test scores. For example, Mary pops up in the names list 4 times and I want to be able to take the test scores that are mapped to her and take that average.
The issue is how to compare the duplicate names with the test scores.
Note: I do know that the grades list is longer than the names list, but this was the two lists that was given to me.
Here is what I have done so far
def average_grades(names, grades):
averages = dict()
name_counter = 0
for name in names:
# if the name is the same
if name == names:
# count the occurence of the name
name_counter += 1
print(name_counter)
# cycle through the grades
# for grade in grades:
# print(grade)
Here's a way:
from collections import defaultdict, Counter
names = ['Mary', 'Jack', 'Rose', 'Mary', 'Carl', 'Fred', 'Meg', 'Phil', 'Carl', 'Jack', 'Fred', 'Mary', 'Phil', 'Jack', 'Mary', 'Fred', 'Meg']
grades = [80, 88, 53, 80, 64, 61, 75, 80, 91, 82, 68, 76, 95, 58, 89, 51, 81, 78]
score = defaultdict(int)
# this line initializes a default dict with default value = 0
frequency = Counter(names)
# this yields: Counter({'Mary': 4, 'Jack': 3, 'Fred': 3, 'Carl': 2, 'Meg': 2,'Phil': 2, 'Rose': 1})
for name, grade in zip(names, grades):
score[name] = score.get(name,0)+(grade / frequency[name])
# here you add the (grade of name / count of name) to each name,
# score.get(name,0) this line adds a default value 0 if the key does not exist already
print(score)
Output:
defaultdict(<class 'int'>, {'Mary': 81.25, 'Jack': 76.0, 'Rose': 53.0, 'Carl': 77.5, 'Fred': 60.0, 'Meg': 78.0, 'Phil': 87.5})
NOTE: It ignores the last grade, as I have no idea what to do with it.
You can iterate in parallel, find their average and add to the dictionary:
from itertools import groupby
from collections import defaultdict
names = ['Mary', 'Jack', 'Rose', 'Mary', 'Carl', 'Fred', 'Meg', 'Phil', 'Carl', 'Jack', 'Fred', 'Mary', 'Phil', 'Jack', 'Mary', 'Fred', 'Meg']
grades = [80, 88, 53, 80, 64, 61, 75, 80, 91, 82, 68, 76, 95, 58, 89, 51, 81, 78]
d = defaultdict(int)
f = lambda x: x[0]
for k, g in groupby(sorted(zip(names, grades), key=f), key=f):
grp = list(g)
d[k] = sum(x[1] for x in grp) / len(grp)
print(d)

Seaborn.countplot : order categories by count, also by category?

So I understand how to sort in regards to a barchart (ie here). What I can not find though is how to sort the bar charts by one of the subcategories.
For example, given the following dataframe, I can get the bar plots. But what I would like to do, is have it sorted from greatest to least, by Type of Classic).
import pandas as pd
test_df = pd.DataFrame([
['Jake', 38, 'MW', 'Classic'],
['John', 38,'NW', 'Classic'],
['Sam', 34, 'SE', 'Classic'],
['Sam', 22, 'E' ,'Classic'],
['Joe', 43, 'ESE2', 'Classic'],
['Joe', 34, 'MTN2', 'Classic'],
['Joe', 38, 'MTN2', 'Classic'],
['Scott', 38, 'ESE2', 'Classic'],
['Chris', 34, 'SSE1', 'Classic'],
['Joe', 43, 'S1', 'New'],
['Paul', 34, 'NE2', 'New'],
['Joe', 38, 'MC1', 'New'],
['Joe', 34, 'NE2', 'New'],
['Nick', 38, 'MC1', 'New'],
['Al', 38, 'SSE1', 'New'],
['Al', 34, 'ME', 'New'],
['Al', 34, 'MC1', 'New'],
['Joe', 43, 'S1', 'New']], columns = ['Name','Code_A','Code_B','Type'])
import seaborn as sns
sns.set(style="darkgrid")
palette ={"Classic":"#FF9999","New":"#99CC99"}
g = sns.countplot(y="Name",
palette=palette,
hue="Type",
data=test_df)
So instead of:
'Joe' would be on top, followed by 'Sam', etc.
Add the order argument. Use pandas.crosstab and sort_values to obtain this:
import pandas as pd
test_df = pd.DataFrame([
['Jake', 38, 'MW', 'Classic'],
['John', 38,'NW', 'Classic'],
['Sam', 34, 'SE', 'Classic'],
['Sam', 22, 'E' ,'Classic'],
['Joe', 43, 'ESE2', 'Classic'],
['Joe', 34, 'MTN2', 'Classic'],
['Joe', 38, 'MTN2', 'Classic'],
['Scott', 38, 'ESE2', 'Classic'],
['Chris', 34, 'SSE1', 'Classic'],
['Joe', 43, 'S1', 'New'],
['Paul', 34, 'NE2', 'New'],
['Joe', 38, 'MC1', 'New'],
['Joe', 34, 'NE2', 'New'],
['Nick', 38, 'MC1', 'New'],
['Al', 38, 'SSE1', 'New'],
['Doug', 34, 'ME', 'New'],
['Fred', 34, 'MC1', 'New'],
['Joe', 43, 'S1', 'New']], columns = ['Name','Code_A','Code_B','Type'])
import seaborn as sns
sns.set(style="darkgrid")
palette ={"Classic":"#FF9999","New":"#99CC99"}
order = pd.crosstab(test_df.Name, test_df.Type).sort_values('Classic', ascending=False).index
g = sns.countplot(y="Name",
palette=palette,
hue="Type",
data=test_df,
order=order
)
import pandas as pd
test_df = pd.DataFrame([
['Jake', 38, 'MW', 'Classic'],
['John', 38,'NW', 'Classic'],
['Sam', 34, 'SE', 'Classic'],
['Sam', 22, 'E' ,'Classic'],
['Joe', 43, 'ESE2', 'Classic'],
['Joe', 34, 'MTN2', 'Classic'],
['Joe', 38, 'MTN2', 'Classic'],
['Scott', 38, 'ESE2', 'Classic'],
['Chris', 34, 'SSE1', 'Classic'],
['Joe', 43, 'S1', 'New'],
['Paul', 34, 'NE2', 'New'],
['Joe', 38, 'MC1', 'New'],
['Joe', 34, 'NE2', 'New'],
['Nick', 38, 'MC1', 'New'],
['Al', 38, 'SSE1', 'New'],
['Al', 34, 'ME', 'New'],
['Al', 34, 'MC1', 'New'],
['Joe', 43, 'S1', 'New']], columns = ['Name','Code_A','Code_B','Type'])
import seaborn as sns
sns.set(style="darkgrid")
palette ={"Classic":"#FF9999","New":"#99CC99"}
sb.countplot(y = 'Name', hue='Type', data=test_df,
order=test_df['Name'].value_counts().index)

Categories

Resources