Dictionary with named values? - python

I have a file like this :
A X V1
A Y V2
B X V3
B Y V4
Let's say the first column is a model type, second column is a version number and third is the value of something related.
I would like to answer the question : "What is the value of model A, version X ?"
For all values and all versions.
I wanted to use a dict but i only know dicts with one value for each keys. This one here needs two keys, ie something like :
d[model][version] = value
How would you do this ?

You can nest dictionaries:
d['A'] = {}
d['A']['X'] = 'V1'
or you can use tuple keys instead:
d[('A', 'X')] = 'V1'
Nesting would make it easier to list all known versions for a given model:
versions_for_model = d['A'].keys()
Creating a nested dictionary setup can be simplified a little by using collections.defaultdict():
d = defaultdict(dict)
d['A']['X'] = 'V1'
Here trying to access d['A'] automatically creates a new dictionary value.

with open("Input.txt") as inputFile:
lines = [line.strip().split() for line in inputFile]
result = {}
for k1, k2, v in lines:
result.setdefault(k1, {})[k2] = v
print result
Output
{'A': {'Y': 'V2', 'X': 'V1'}, 'B': {'Y': 'V4', 'X': 'V3'}}
You can access the individual elements like this
print result["A"]["Y"]
Output
V2

Related

Retrieving items from a nested dictionary with a nested for loop fresults in KeyError

I need to systematically access dictionaries that are nested within a list within a dictionary at the 3rd level, like this:
responses = {'1': {'responses': [{1st dict to be retrieved}, {2nd dict to be retrieved}, ...]},
'2': {'responses': [{1st dict to be retrieved}, {2nd dict to be retrieved}, ...]}, ...}
I need to unnest and transform these nested dicts into dataframes, so the end result should look like this:
responses = {'1': df1,
'2': df2, ...}
In order to achieve this, I built a for-loop in order to loop through all keys on the first level. Within that loop, I am using another loop to extract each item from the nested dicts into a new empty list called responses_df:
responses_dict = {}
for key in responses.keys():
for item in responses[key]['responses']:
responses_dict[key].update(item)
However, I get:
KeyError: '1'
The inner loop works if I use it individually on a key within the dict, but that doesn't really help me since the data comes from an API and has to be updated dynamically every few minutes in production.
The nex loop to transform the result into dataframes would look like this:
for key in responses_dict:
responses_df[key] = pd.DataFrame.from_dict(responses_dict[key], orient='index')
But I haven't gotten to try that out since the first operation fails.
Try this:
from collections import defaultdict
responses_dict = defaultdict(dict) # instead of {}
Then your code will work.
In fact responses_dict[key] where key=1 doesn't exist.
So when you simply do print(responses_dict[key]) you get the same error, 1 is not a key of that dict and update is not used as it should be.
Try the following syntax :
responses_dict = {}
for key in responses.keys():
print(key)
for item in responses[key]['responses']:
responses_dict.update(key = item)
I prefer using dictionaries while updating a dictionary.
If you update with an existing key, the value of that key will be updated.
If you update with an new key-value pair, the pair will be added to that dictionary.
>>>d1 = {1: 10, 2:20}
>>>d1.update({1:20})
>>>d1
>>>{1: 20, 2:20}
>>>d1.update({3:30})
>>>d1
>>>{1: 20, 2:20, 3:30}
Try fixing your line with:
responses_dict = {}
for key in responses.keys():
for item in responses[key]['responses']:
responses_dict.update({key: item})
So basically, use dictionary to update a dictionary, more readable and easy.
Try this:
responses = {'1': {'responses': [{'a': 1, 'b': 2}, {'c': 3, 'd': 4}]},
'2': {'responses': [{'e': 5}, {'f': 6}]}}
result = {k: pd.DataFrame(chain.from_iterable(v['responses'])) for k, v in responses.items()}
for df in result.values():
print(df, end='\n\n')
Output:
0
0 a
1 b
2 c
3 d
0
0 e
1 f

Removing duplicate values and finding what key has the most values in python

I have this code that gets information from a text file that has values like key1:value1 and so on, but some of them are presented multiple times under 1 key. How can I remove duplicates and after that how can I sort which key has the most and least values?
def function1(file):
with open("file_name.txt") as file:
name = file.read()
d = {}
for x in name.split():
key, value = x.split(':')
try:
values = d[key]
except KeyError:
values = d[key] = []
values.append(value)
return d
Assuming you have input like:
lines = '''
key1:val1
key2:val2
key3:val3
key1:val4
key1:val5
key2:val6
'''.strip().split()
Something like this should get you started:
from collections import defaultdict
d = defaultdict(list)
for line in lines:
k,v = line.split(':')
d[k].append(v)
items = sorted(d.items(), key=lambda i:len(i[1]))
print(items)
Output (sorted by fewest to most values, add reverse=True to sort for most to fewest)
[
('key3', ['val3']),
('key2', ['val2', 'val6']),
('key1', ['val1', 'val4', 'val5'])
]
Look into the counter module:
from collections import Counter
x = Counter(mylist)
print x
One improvement, compared to your code is that I used defaultdict,
automatically generating a value for an non-existing key.
Another improvement is splitting using re.split, so the input row
can have the colon surrounded with spaces.
An important detail in your question is that you want to count
values without repetirions (as I assume, separately for each key).
So the program has to:
check whether particular value has already been saved under
the current key,
save the current value (add to the list) only if it did not occur before.
So, using Counter is in my opinion not a good idea, because it counts
how many times a value has been repeated, irrespective of the key
under which it occurred, whereas we should count how many different
values have been read under each key.
In the following program:
filterValues function reads lines from the input file, stores
value arrays under the current key and returns the dictionary,
findMinMax function finds 2 tuples (key / value list), one for
the shortes list and another for the longest.
Here is the code:
from collections import defaultdict
import re
def filterValues(fn):
with open(fn) as file:
lines = file.readlines()
d = defaultdict(list) # key -> values
for line in lines:
key, value = re.split('\s*:\s*',line.strip())
values = d[key]
if value not in values: # Save value, w/o repetitions
values.append(value)
return d
def findMinMax(d):
t1 = min(d.items(), key=lambda x: len(x[1]))
t2 = max(d.items(), key=lambda x: len(x[1]))
return t1, t2
d = filterValues('file_name.txt')
print(dict(d))
t1, t2 = findMinMax(d)
print(f'Min. count: {len(t1[1])}: {t1[0]} -> {t1[1]}')
print(f'Max. count: {len(t2[1])}: {t2[0]} -> {t2[1]}')
For the following sample input:
K1 : V1
K1 : V2
K1 : V3
K1 : V1
K1 : V4
K1 : V1
K1 : V4
K2 : V5
K2 : V6
K2 : V6
K2 : V6
K3 : V2
K4 : V5
it prints:
{'K1': ['V1', 'V2', 'V3', 'V4'], 'K2': ['V5', 'V6'], 'K3': ['V2'], 'K4': ['V5']}
Min. count: 1: K3 -> ['V2']
Max. count: 4: K1 -> ['V1', 'V2', 'V3', 'V4']

python update dictionary with dictionary in loop

I am trying to update a dictionary with an other dictionary in a loop
mainDict = {}
for index in range(3):
tempDict = {}
tempDict['a'] = 1
tempDict['b'] = 2
mainDict.update(tempDict)
Output:
>>> print mainDict
{'a': 1, 'b': 2}
What I am expecting is:
{{'a': 1, 'b': 2},{'a': 1, 'b': 2},{'a': 1, 'b': 2}}
Any suggestions please. Thanks.
Dictionaries are key-value pairs. In your expected output there is no dictionary. Either you want a list, and in this case use:
main_list = []
for (...)
main_list.append(temp_dict)
or add keys in the loop:
mainDict = {}
for index in range(3):
tempDict = {}
tempDict['a'] = 1
tempDict['b'] = 2
mainDict[index] = tempDict
As others commented in the comments section, python dictionaries keys must be unique. Quoting from python docs:
It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary)
Possible solution: Create a list instead of a dict to store the dictionaries.

Dividing dictionary into nested dictionaries, based on the key's name on Python 3.4

I have the following dictionary (short version, real data is much larger):
dict = {'C-STD-B&M-SUM:-1': 0, 'C-STD-B&M-SUM:-10': 4.520475, 'H-NSW-BAC-ART:-9': 0.33784000000000003, 'H-NSW-BAC-ART:0': 0, 'H-NSW-BAC-ENG:-59': 0.020309999999999998, 'H-NSW-BAC-ENG:-6': 0,}
I want to divide it into smaller nested dictionaries, depending on a part of the key name.
Expected output would be:
# fixed closing brackets
dict1 = {'C-STD-B&M-SUM: {'-1': 0, '-10': 4.520475}}
dict2 = {'H-NSW-BAC-ART: {'-9': 0.33784000000000003, '0': 0}}
dict3 = {'H-NSW-BAC-ENG: {'-59': 0.020309999999999998, '-6': 0}}
Logic behind is:
dict1: if the part of the key name is 'C-STD-B&M-SUM', add to dict1.
dict2: if the part of the key name is 'H-NSW-BAC-ART', add to dict2.
dict3: if the part of the key name is 'H-NSW-BAC-ENG', add to dict3.
Partial code so far:
def divide_dictionaries(dict):
c_std_bem_sum = {}
for k, v in dict.items():
if k[0:13] == 'C-STD-B&M-SUM':
c_std_bem_sum = k[14:17], v
What I'm trying to do is to create the nested dictionaries that I need and then I'll create the dictionary and add the nested one to it, but I'm not sure if it's a good way to do it.
When I run the code above, the variable c_std_bem_sum becomes a tuple, with only two values that are changed at each iteration. How can I make it be a dictionary, so I can later create another dictionary, and use this one as the value for one of the keys?
One way to approach it would be to do something like
d = {'C-STD-B&M-SUM:-1': 0, 'C-STD-B&M-SUM:-10': 4.520475, 'H-NSW-BAC-ART:-9': 0.33784000000000003, 'H-NSW-BAC-ART:0': 0, 'H-NSW-BAC-ENG:-59': 0.020309999999999998, 'H-NSW-BAC-ENG:-6': 0,}
def divide_dictionaries(somedict):
out = {}
for k,v in somedict.items():
head, tail = k.split(":")
subdict = out.setdefault(head, {})
subdict[tail] = v
return out
which gives
>>> dnew = divide_dictionaries(d)
>>> import pprint
>>> pprint.pprint(dnew)
{'C-STD-B&M-SUM': {'-1': 0, '-10': 4.520475},
'H-NSW-BAC-ART': {'-9': 0.33784000000000003, '0': 0},
'H-NSW-BAC-ENG': {'-59': 0.020309999999999998, '-6': 0}}
A few notes:
(1) We're using nested dictionaries instead of creating separate named dictionaries, which aren't convenient.
(2) We used setdefault, which is a handy way to say "give me the value in the dictionary, but if there isn't one, add this to the dictionary and return it instead.". Saves an if.
(3) We can use .split(":") instead of hardcoding the width, which isn't very robust -- at least assuming that's the delimiter, anyway!
(4) It's a bad idea to use dict, the name of a builtin type, as a variable name.
That's because you're setting your dictionary and overriding it with a tuple:
>>> a = 1, 2
>>> print a
>>> (1,2)
Now for your example:
>>> def divide_dictionaries(dict):
>>> c_std_bem_sum = {}
>>> for k, v in dict.items():
>>> if k[0:13] == 'C-STD-B&M-SUM':
>>> new_key = k[14:17] # sure you don't want [14:], open ended?
>>> c_std_bem_sum[new_key] = v
Basically, this grabs the rest of the key (or 3 characters, as you have it, the [14:None] or [14:] would get the rest of the string) and then uses that as the new key for the dict.

Duplicates in a dictionary (Python)

I need to write a function that returns true if the dictionary has duplicates in it. So pretty much if anything appears in the dictionary more than once, it will return true.
Here is what I have but I am very far off and not sure what to do.
d = {"a", "b", "c"}
def has_duplicates(d):
seen = set()
d={}
for x in d:
if x in seen:
return True
seen.add(x)
return False
print has_duplicates(d)
If you are looking to find duplication in values of the dictionary:
def has_duplicates(d):
return len(d) != len(set(d.values()))
print has_duplicates({'a': 1, 'b': 1, 'c': 2})
Outputs:
True
def has_duplicates(d):
return False
Dictionaries do not contain duplicate keys, ever. Your function, btw., is equivalent to this definition, so it's correct (just a tad long).
If you want to find duplicate values, that's
len(set(d.values())) != len(d)
assuming the values are hashable.
In your code, d = {"a", "b", "c"}, d is a set, not a dictionary.
Neither dictionary keys nor sets can contain duplicates. If you're looking for duplicate values, check if the set of the values has the same size as the dictionary itself:
def has_duplicate_values(d):
return len(set(d.values())) != len(d)
Python dictionaries already have unique keys.
Are you possibly interested in unique values?
set(d.values())
If so, you can check the length of that set to see if it is smaller than the number of values. This works because sets eliminate duplicates from the input, so if the result is smaller than the input, it means some duplicates were found and eliminated.
Not only is your general proposition that dictionaries can have duplicate keys false, but also your implementation is gravely flawed: d={} means that you have lost sight of your input d arg and are processing an empty dictionary!
The only thing that a dictionary can have duplicates of, is values. A dictionary is a key, value store where the keys are unique. In Python, you can create a dictionary like so:
d1 = {k1: v1, k2: v2, k3: v1}
d2 = [k1, v1, k2, v2, k3, v1]
d1 was created using the normal dictionary notation. d2 was created from a list with an even number of elements. Note that both versions have a duplicate value.
If you had a function that returned the number of unique values in a dictionary then you could say something like:
len(d1) != func(d1)
Fortunately, Python makes it easy to do this using sets. Simply converting d1 into a set is not sufficient. Lets make our keys and values real so you can run some code.
v1 = 1; v2 = 2
k1 = "a"; k2 = "b"; k3 = "c"
d1 = {k1: v1, k2: v2, k3: v1}
print len(d1)
s = set(d1)
print s
You will notice that s has three members too and looks like set(['c', 'b', 'a']). That's because a simple conversion only uses the keys in the dict. You want to use the values like so:
s = set(d1.values())
print s
As you can see there are only two elements because the value 1 occurs two times. One way of looking at a set is that it is a list with no duplicate elements. That's what print sees when it prints out a set as a bracketed list. Another way to look at it is as a dict with no values. Like many data processing activities you need to start by selecting the data that you are interested in, and then manipulating it. Start by selecting the values from the dict, then create a set, then count and compare.
This is not a dictionary, is a set:
d = {"a", "b", "c"}
I don't know what are you trying to accomplish but you can't have dictionaries with same key. If you have:
>>> d = {'a': 0, 'b':1}
>>> d['a'] = 2
>>> print d
{'a': 2, 'b': 1}

Categories

Resources