Creating a dictionary within a dictionary in Python - python

Person Node Value
Bob A 2
Bob A 3
Bob A 4
Bob B 2
Bob B 3
Jill A 1
Jill B 2
I am attempting to get the following into a data structure similar to this
{ 'Bob': { 'A':[2,3,4],'B':[2,3], :'Jill':{'A':[1], 'B':[2]}
I know this might not be the best approach, but what I am trying to do with my data structure is the following:
Dictionary whose key is a value and check if it a value.
Value of dictionary is another dictionary and need to check if key is already in the value.
Value of the second dictionary is a list which needs to be appended to if the list exists like in Bob's case.
I have tried numerous approaches, but right now, my code is looking like this.
names = {}
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreaders:
person, letter, value = row[0], row[1], row[2]
if person not in names:
names[person] = { letter: value}
else:
print 'Lost a bit'
### Lost here
print names

I would use a defaultdict, where the default is a defaultdict with a default list. Then it's very easy to populate that dictionary
from collections import defaultdict
d = defaultdict(lambda: defaultdict(list))
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreader:
person, letter, value = row[0], row[1], row[2]
d[person][letter].append(value)

Use nested dictionaries by implementing the perl’s autovivification feature.
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value​
​# For your case
names = AutoVivification()
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreaders:
person, letter, value = row[0], row[1], row[2]
if names[person][letter]:
names[person][letter].append(value)
else:
names[person][letter] = [value]

I would solve it like this. I am sure it could be shortened quite a bit if you make it more recursive and use comprehension. But to understand the principle and make it similar to your start (solution not tested for syntactic errors):
names = {}
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreaders:
person, letter, value = row[0], row[1], row[2]
if person not in names:
names[person] = {}
if letter not in names[person]:
names[person][letter] = []
names[person][letter].append(value)
print names

Don't let the code confuse you. You defined pretty clearly what you need to do. Now just take it slow and do it.
First, as you said: value of second dictionary is a list. You don't want
names[person] = { letter: value }
Instead you want:
names[person] = { letter: [value] }
Now in your else, you know that person is already in names. Is value? If it is, add to the list, otherwise, create a new list:
else:
if letter in names[person]:
names[person][letter].append(value)
else:
names[person][letter] = value
Now, you can probably go about this a whole lot cleaner using the setdefault method of a dict.
This would do the same thing:
names = {}
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreaders:
person, letter, value = row[0], row[1], row[2]
# Make sure that if person isn't in names,
# person is added to names as a dict
names.setdefault(person, {})
# Make sure that if letter isn't in the names[person] dict
# letter is added as an empty list
names[person].setdefault(letter, [])
# At this point, names[person][letter] is an existing, possibly empty
# list. Simply add the new value
names[person][letter].append(value)

Quite a well known pythonic technique is to use defaultdict() to specify complex dictionary structures at initialization time.
Disclaimer: I would highly recommend considering class structures to handle this data, as these kinds of nested dictionaries can lead to code smells rather quickly.
import functools
from collections import defaultdict
d_inner = functools.partial(defaultdict, list)
d = defaultdict(d_inner)
# Do something with data here
for name, letter, number in row:
d[name][letter].append(number)
Sample Output:
defaultdict(<functools.partial object at 0x109de71b0>, {'Bob': defaultdict(<type 'list'>, {'A': [2, 3, 4], 'B': [2, 3]}), 'Jill': defaultdict(<type 'list'>, {'A': [1], 'B': [2]})})

Related

Create dictionary from txt file - Debug

name,score #an example
a,1,
s,2,
d,3,
f,4,
g,5,
h,6,
j,7,
k,8,
l,9,
q,10,
This is my file. I want to make this into a dictionary (a:1,s:2...)
number_of_lines = len(open("scores.txt").readlines( ))
d = {}
with open("scores.txt") as f:
for line in range(number_of_lines-1): #-1 removes the last line which is only \n
(key, value) = line.split(",")
d[key] = value
print(d)
I keep getting the error AttributeError: 'int' object has no attribute 'split' don't know why.
Can you debug this?
Thank in advance,
range() returns numbers, not the actual lines. Since you store the output of range into lines you won't be able to do line.split() as lineis not the actual line, but the value from range(). Instead, do something like this:
d = {}
with open("scores.txt") as f:
for line in f:
key, value = line.split(",")
d[key] = value
print(d)
If you need the index of the line you're on (which you never used, so I don't know if you do), you can use the enumerate function.
d = {}
with open("scores.txt") as f:
for index, line in enumerate(f.readlines()):
key, value = line.split(",")
d[key] = value
print(d)
Mentioned in the comments, there's issues with length of the file etc. But that can be safely checked in the for loop:
d = {}
with open("scores.txt") as f:
for index, line in enumerate(f.readlines()):
if len(line.strip()) <= 0: continue
elif index == 0: continue # Skip the header or use the CSV lib
key, value = line.split(",")
d[key] = value
print(d)
To better understand this, you can lab with the range function (if you don't like to read the docs) on a more standalone basis by doing:
for line in range(0, 10):
print(type(line), line)
Hopefully this solves your issue but also teaches what the range function does.
Lastly, consider using the csv module:
import csv
with open('scores.txt') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for row in reader:
print(row['name'], row['score'])
Pro's: handles empty lines, sorts everything in to a dictionary for you, skips the headers (or more accurately, puts them in as the key in the dict per row) and lastly, handles a lot of CSV "magic" for you (like special delimiters, quote chars etc)
You can use the csv lib to inline create the final result that you're after altho it's a bit slow, you'd probably better off reading and working with the data line by line unless it's for database purposes like this:
import csv
with open('scores.txt') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
d = {row['name']:row['score'] for row in reader}
you can use pandas for this
import pandas as pd
d = pd.read_csv('scores.txt').set_index('name')['score'].to_dict()
This works well with comma separated files, and is faster
You could use a dict comprehension:
data = """
a,1,
s,2,
d,3,
f,4,
g,5,
h,6,
j,7,
k,8,
l,9,
q,10,
"""
dct = {key: value for line in data.split("\n") if line for key, value, *_ in [line.split(",")]}
print(dct)
# {'a': '1', 's': '2', 'd': '3', 'f': '4', 'g': '5', 'h': '6', 'j': '7', 'k': '8', 'l': '9', 'q': '10'}
Or - with your file (considering the header, that is):
with open("scores.txt") as f:
data = f.read()
dct = {key: value
for line in data.split("\n")[1:] if line
for key, value, *_ in [line.split(",")]}

Problem of incorrect output for dictionary returned from file

File contains student ID and ID of the solved problem.
Example:
1,2
1,4
1,3
2,1
2,2
2,3
2,4
The task is to write a function which will take a filename as an argument and return a dictionary with a student ID and amount of solved tasks.
Example output:
{1:3, 2:4}
My code which doesn't support the correct output. Please, help me find a mistake and a solution.
import collections
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary = {key: collections.Counter(str(value))}
return dictionary
Since you only care about the sum, not the individual exercises, you can use a Counter on the first column:
def solved_tasks(filename):
with open(filename) as in_stream:
counts = collections.Counter(
line.partition(',')[0] # first column ...
for line in in_stream if line # ... of every non-empty row
)
return {int(key): value for key, value in counts.items()}
Assuming that you want to save the repeated instances of student id, you can use a defaultdict and save the problems solved by each student as a list in your dictionary:
import collections
dictionary = collections.defaultdict(list)
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary[key].append(value)
return dictionary
Output:
defaultdict(<type 'list'>, {'1': ['2', '4', '3'], '2': ['1', '2', '3', '4']})
If you want the sum:
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary[key] += 1
return dictionary
Output:
defaultdict(<type 'int'>, {'1': 3, '2': 4})
you can count how often a key appears
marks = """1,2
"1,4
"1,3
"2,1
"2,2
"2,3
"2,4
"2,4"""
dict = {}
for line in marks.split("\n"):
key,value = line.strip().split(",")
dict[key] = dict.get(key,[]) + [value]
for key in dict:
dict[key] = len(set(dict[key])) # eliminate duplicates
the dict.get(key,[]) method returns an empty list if the key doesn't exist in the dict as a default parameter.
#Edit: You said there may contain duplicates. This method would eliminate all duplicates.
#Edit: Added multilines with """
def solved_tasks(filename):
res = {}
values=""
with open(filename, "r") as f:
for line in f.readlines():
values += line.strip()[0] #take only the first value and concatinate with the values string
value = values[0] #take the first value
res[int(value)] = values.count(value) #put it in the dict
for i in values: #loop the values
if i != value: # if the value is not the first value, then the value is the new found value
value = i
res[int(value)] = values.count(value) #add the new value to the dict
return res

Python - Create dictionary of dictionaries from CSV

I have a CSV with the first column having many duplicate values, and the second column being a predetermined code that maps to a value in the third column, such as this:
1, a, 24
1, b, 13
1, c, 30
1, d, 0
2, a, 1
2, b, 12
2, c, 82
2, d, 81
3, a, 04
3, b, 23
3, c, 74
3, d, 50
I'm trying to create a dictionary of dictionaries from a CSV, that would result in the following:
dict 1 = {'1':{'a':'24', 'b':'13', 'c':'30','d':'0'},
'2':{'a':'1', 'b':'12', 'c':'82','d':'81'},
... }
My code creates the key values just fine, but the resulting value dictionaries are all empty (though some print statements have shown that they aren't during the run process)...
with open(file, mode='rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
dict1 = {} # creates main dict
for row in reader: # iterates through the rows of the csvfile
if row[0] in dict1:
dict2[row[1]] = row[2] # adds another key, value to dict2
else:
dict1[row[0]] = {} # creates a new key entry for the new dict1 key
dict2 = {} # creates a new dict2 to start building as the value for the new dict1 key
dict2[row[1]] = row[2] # adds the first key, value pair for dict2
Use collections.defaultdict for this.
import collections
with open(file, mode='rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
dict1 = collections.defaultdict(dict)
for row in reader:
dict1[row[0]][row[1]] = row[2]
defaultdict is nothing more than a dictionary which initializes values for unknown keys with a default. Here, the default is to initialize a second, new dictionary (dict is the dictionary constructor). Thus, you can easily set both mappings in the same line.
You don't need dict2 and you are not setting it to be the value dict anyway. Try this modified version:
with open(file, mode='rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
dict1 = {} # creates main dict
for row in reader: # iterates through the rows of the csvfile
if row[0] not in dict1:
dict1[row[0]] = {} # creates a new key entry for the new dict1 key
dict1[row[0]][row[1]] = row[2] # adds another key, value to dict2
You can also use defaultdict to skip checking for existing keys.

Check key, value of nested dictionary in python?

I'm generating a nested dictionary in my program. After generating, I want to iterate through that dictionary, and check for the dictionary key and value.
Program-Code
This is the dictionary I want to iterate whose value contains another dictionary.
main_dict = {101: {1234: [11111,11111],5678: [44444,44444]},
102: {9100: [55555,55555],1112: [77777,88888]}}
I'm reading a csv file and storing contents in this dictionary. Like this :
Input.csv -
lineno,item,total
101,1234,11111
101,1234,11111
101,5678,44444
101,5678,44444
102,9100,55555
102,9100,55555
102,1112,77777
102,1112,88888
This is input csv file. I'm reading this csv file and I want to know for one unique item total is how many times repeating?
For that stuff I'm doing like this :
for line in reader:
if line[0] in main_dict:
if line[1] in main_dict[line[0]]:
main_dict[line[0]][line[1]].append(line[2])
else:
main_dict[line[0]].update({line[1]:[line[2]]})
else:
main_dict[line[0]] = {line[1]:[line[2]]}
print main_dict
Output of above program :
{101: {1234: [11111,11111],5678: [44444,44444]},
102: {9100: [55555,55555],1112: [77777,88888]}}
but I'm facing following error in this line-
if line[1] in main_dict[line[0]]:
IndexError: list index out of range
Iteration of main_dict-
for key,value in main_dict.iteritems():
f1 = open(outputfile + op_directory +'/'+ key+'.csv', 'w')
writer1 = csv.DictWriter(f1, delimiter=',', fieldnames = fieldname)
writer1.writeheader()
if type(value) == type({}):
for k,v in value.iteritems():
if type(v) == type([]):
set1 = set(v)
for se in set1:
writer1.writerow({'item':k,'total':se,'total_count':v.count(se)})
I want to know best way to iterate this type of dictionary?
Sometimes I'm getting correct result just like above dictionary but many a times I face this error, what is that I'm missing?
Thanks in advance!
As the comments pointed out, you are not checking if line is of length 3:
for line in reader:
if not len(line) == 3:
continue
Concerning your algorithm, I would use nested defaultdict to avoid the if/else lines.
EDIT: I added a new defaultdict and the csv writing part after the question edit:
from collections import defaultdict
import csv
counter = defaultdict(lambda: defaultdict(list))
main_dict= defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
fieldnames=['item', 'total', 'total_count']
# we suppose reader is a cvs.reader object
with open('input.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for line in reader:
if not len(line) == 3:
continue
# Remove unwanted spaces
lineno, item, total = [el.strip() for el in line]
# Do not deal with non digit entries (title for example)
if not lineno.isdigit():
continue
counter[lineno][item].append(total)
csvdict = {'item': item,
'total': total,
'total_count': counter[lineno][item].count(total)}
main_dict[lineno][item][total].update(csvdict)
# The writing part
for lineno in sorted(main_dict):
itemdict = main_dict[lineno]
output = 'output_%s.csv' % lineno
with open(output, 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=',')
writer.writeheader()
for totaldict in itemdict.values():
for csvdict in totaldict.values():
writer.writerow(csvdict)
You can then use the following function to print a readable representation of the result:
def myprint(obj, ntab=0):
if isinstance(obj, (dict, defaultdict)):
for k in sorted(obj):
myprint('%s%s'%(ntab*' ', k), ntab+1)
myprint(obj[k], ntab+1)
else:
print('%s%s'%(ntab*' ', obj))
myprint(main_dict)
But if you want to count the item totals, I would use another defaultdict with the total as the key and a tuple (lineno, item) as the value:
from collections import defaultdict
import csv
total_dict = defaultdict(list)
# we suppose reader is a cvs.reader object
with open('input.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for line in reader:
if not len(line) == 3:
continue
# Remove unwanted spaces
lineno, item, total = [el.strip() for el in line]
# Do not deal with non digit entries (title for example)
if not lineno.isdigit():
continue
total_dict[total].append((lineno, item))
You can have the number of each total very easily:
>>> print len(total_dict['55555'])
2

Python - make a dictionary from a csv file with multiple categories

I am trying to make a dictionary from a csv file in python, but I have multiple categories. I want the keys to be the ID numbers, and the values to be the name of the items. Here is the text file:
"ID#","name","quantity","price"
"1","hello kitty","4","9999"
"2","rilakkuma","3","999"
"3","keroppi","5","1000"
"4","korilakkuma","6","699"
and this is what I have so far:
txt = open("hk.txt","rU")
file_data = txt.read()
lst = [] #first make a list, and then convert it into a dictionary.
for key in file_data:
k = key.split(",")
lst.append((k[0],k[1]))
dic = dict(lst)
print(dic)
This just prints an empty list though. I want the keys to be the ID#, and then the values will be the names of the products. I will make another dictionary with the names as the keys and the ID#'s as the values, but I think it will be the same thing but the other way around.
Use the csv module to handle your data; it'll remove the quoting and handle the splitting:
results = {}
with open('hk.txt', 'r', newline='') as txt:
reader = csv.reader(txt)
next(reader, None) # skip the header line
for row in reader:
results[row[0]] = row[1]
For your sample input, this produces:
{'4': 'korilakkuma', '1': 'hello kitty', '3': 'keroppi', '2': 'rilakkuma'}
You can use csv DictReader:
import csv
result={}
with open('/tmp/test.csv', 'r', newline='') as f:
for d in csv.DictReader(f):
result[d['ID#']]=d['name']
print(result)
# {'1': 'hello kitty', '3': 'keroppi', '2': 'rilakkuma', '4': 'korilakkuma'}
You can use a dictionary directly:
dictionary = {}
file_data.readline() # skip the first line
for key in file_data:
key = key.replace('"', '').strip()
k = key.split(",")
dictionary[k[0]] = k[1]
try this or use any library to read the file.
txt = open("hk.txt","rU")
file_data = txt.read()
file_lines = file_data.split("\n")
lst = [] #first make a list, and then convert it into a dictionary.
for linenumber in range(1,len(file_lines)):
k = file_lines[linenumber].split(",")
lst.append((k[0][1:len(k[0])-1],k[1][1:len(k[1])-1]))
dic = dict(lst)
print(dic)
but you can use the dict directly as well.

Categories

Resources