I'm generating a nested dictionary in my program. After generating, I want to iterate through that dictionary, and check for the dictionary key and value.
Program-Code
This is the dictionary I want to iterate whose value contains another dictionary.
main_dict = {101: {1234: [11111,11111],5678: [44444,44444]},
102: {9100: [55555,55555],1112: [77777,88888]}}
I'm reading a csv file and storing contents in this dictionary. Like this :
Input.csv -
lineno,item,total
101,1234,11111
101,1234,11111
101,5678,44444
101,5678,44444
102,9100,55555
102,9100,55555
102,1112,77777
102,1112,88888
This is input csv file. I'm reading this csv file and I want to know for one unique item total is how many times repeating?
For that stuff I'm doing like this :
for line in reader:
if line[0] in main_dict:
if line[1] in main_dict[line[0]]:
main_dict[line[0]][line[1]].append(line[2])
else:
main_dict[line[0]].update({line[1]:[line[2]]})
else:
main_dict[line[0]] = {line[1]:[line[2]]}
print main_dict
Output of above program :
{101: {1234: [11111,11111],5678: [44444,44444]},
102: {9100: [55555,55555],1112: [77777,88888]}}
but I'm facing following error in this line-
if line[1] in main_dict[line[0]]:
IndexError: list index out of range
Iteration of main_dict-
for key,value in main_dict.iteritems():
f1 = open(outputfile + op_directory +'/'+ key+'.csv', 'w')
writer1 = csv.DictWriter(f1, delimiter=',', fieldnames = fieldname)
writer1.writeheader()
if type(value) == type({}):
for k,v in value.iteritems():
if type(v) == type([]):
set1 = set(v)
for se in set1:
writer1.writerow({'item':k,'total':se,'total_count':v.count(se)})
I want to know best way to iterate this type of dictionary?
Sometimes I'm getting correct result just like above dictionary but many a times I face this error, what is that I'm missing?
Thanks in advance!
As the comments pointed out, you are not checking if line is of length 3:
for line in reader:
if not len(line) == 3:
continue
Concerning your algorithm, I would use nested defaultdict to avoid the if/else lines.
EDIT: I added a new defaultdict and the csv writing part after the question edit:
from collections import defaultdict
import csv
counter = defaultdict(lambda: defaultdict(list))
main_dict= defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
fieldnames=['item', 'total', 'total_count']
# we suppose reader is a cvs.reader object
with open('input.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for line in reader:
if not len(line) == 3:
continue
# Remove unwanted spaces
lineno, item, total = [el.strip() for el in line]
# Do not deal with non digit entries (title for example)
if not lineno.isdigit():
continue
counter[lineno][item].append(total)
csvdict = {'item': item,
'total': total,
'total_count': counter[lineno][item].count(total)}
main_dict[lineno][item][total].update(csvdict)
# The writing part
for lineno in sorted(main_dict):
itemdict = main_dict[lineno]
output = 'output_%s.csv' % lineno
with open(output, 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=',')
writer.writeheader()
for totaldict in itemdict.values():
for csvdict in totaldict.values():
writer.writerow(csvdict)
You can then use the following function to print a readable representation of the result:
def myprint(obj, ntab=0):
if isinstance(obj, (dict, defaultdict)):
for k in sorted(obj):
myprint('%s%s'%(ntab*' ', k), ntab+1)
myprint(obj[k], ntab+1)
else:
print('%s%s'%(ntab*' ', obj))
myprint(main_dict)
But if you want to count the item totals, I would use another defaultdict with the total as the key and a tuple (lineno, item) as the value:
from collections import defaultdict
import csv
total_dict = defaultdict(list)
# we suppose reader is a cvs.reader object
with open('input.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for line in reader:
if not len(line) == 3:
continue
# Remove unwanted spaces
lineno, item, total = [el.strip() for el in line]
# Do not deal with non digit entries (title for example)
if not lineno.isdigit():
continue
total_dict[total].append((lineno, item))
You can have the number of each total very easily:
>>> print len(total_dict['55555'])
2
Related
name,score #an example
a,1,
s,2,
d,3,
f,4,
g,5,
h,6,
j,7,
k,8,
l,9,
q,10,
This is my file. I want to make this into a dictionary (a:1,s:2...)
number_of_lines = len(open("scores.txt").readlines( ))
d = {}
with open("scores.txt") as f:
for line in range(number_of_lines-1): #-1 removes the last line which is only \n
(key, value) = line.split(",")
d[key] = value
print(d)
I keep getting the error AttributeError: 'int' object has no attribute 'split' don't know why.
Can you debug this?
Thank in advance,
range() returns numbers, not the actual lines. Since you store the output of range into lines you won't be able to do line.split() as lineis not the actual line, but the value from range(). Instead, do something like this:
d = {}
with open("scores.txt") as f:
for line in f:
key, value = line.split(",")
d[key] = value
print(d)
If you need the index of the line you're on (which you never used, so I don't know if you do), you can use the enumerate function.
d = {}
with open("scores.txt") as f:
for index, line in enumerate(f.readlines()):
key, value = line.split(",")
d[key] = value
print(d)
Mentioned in the comments, there's issues with length of the file etc. But that can be safely checked in the for loop:
d = {}
with open("scores.txt") as f:
for index, line in enumerate(f.readlines()):
if len(line.strip()) <= 0: continue
elif index == 0: continue # Skip the header or use the CSV lib
key, value = line.split(",")
d[key] = value
print(d)
To better understand this, you can lab with the range function (if you don't like to read the docs) on a more standalone basis by doing:
for line in range(0, 10):
print(type(line), line)
Hopefully this solves your issue but also teaches what the range function does.
Lastly, consider using the csv module:
import csv
with open('scores.txt') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for row in reader:
print(row['name'], row['score'])
Pro's: handles empty lines, sorts everything in to a dictionary for you, skips the headers (or more accurately, puts them in as the key in the dict per row) and lastly, handles a lot of CSV "magic" for you (like special delimiters, quote chars etc)
You can use the csv lib to inline create the final result that you're after altho it's a bit slow, you'd probably better off reading and working with the data line by line unless it's for database purposes like this:
import csv
with open('scores.txt') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
d = {row['name']:row['score'] for row in reader}
you can use pandas for this
import pandas as pd
d = pd.read_csv('scores.txt').set_index('name')['score'].to_dict()
This works well with comma separated files, and is faster
You could use a dict comprehension:
data = """
a,1,
s,2,
d,3,
f,4,
g,5,
h,6,
j,7,
k,8,
l,9,
q,10,
"""
dct = {key: value for line in data.split("\n") if line for key, value, *_ in [line.split(",")]}
print(dct)
# {'a': '1', 's': '2', 'd': '3', 'f': '4', 'g': '5', 'h': '6', 'j': '7', 'k': '8', 'l': '9', 'q': '10'}
Or - with your file (considering the header, that is):
with open("scores.txt") as f:
data = f.read()
dct = {key: value
for line in data.split("\n")[1:] if line
for key, value, *_ in [line.split(",")]}
File contains student ID and ID of the solved problem.
Example:
1,2
1,4
1,3
2,1
2,2
2,3
2,4
The task is to write a function which will take a filename as an argument and return a dictionary with a student ID and amount of solved tasks.
Example output:
{1:3, 2:4}
My code which doesn't support the correct output. Please, help me find a mistake and a solution.
import collections
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary = {key: collections.Counter(str(value))}
return dictionary
Since you only care about the sum, not the individual exercises, you can use a Counter on the first column:
def solved_tasks(filename):
with open(filename) as in_stream:
counts = collections.Counter(
line.partition(',')[0] # first column ...
for line in in_stream if line # ... of every non-empty row
)
return {int(key): value for key, value in counts.items()}
Assuming that you want to save the repeated instances of student id, you can use a defaultdict and save the problems solved by each student as a list in your dictionary:
import collections
dictionary = collections.defaultdict(list)
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary[key].append(value)
return dictionary
Output:
defaultdict(<type 'list'>, {'1': ['2', '4', '3'], '2': ['1', '2', '3', '4']})
If you want the sum:
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary[key] += 1
return dictionary
Output:
defaultdict(<type 'int'>, {'1': 3, '2': 4})
you can count how often a key appears
marks = """1,2
"1,4
"1,3
"2,1
"2,2
"2,3
"2,4
"2,4"""
dict = {}
for line in marks.split("\n"):
key,value = line.strip().split(",")
dict[key] = dict.get(key,[]) + [value]
for key in dict:
dict[key] = len(set(dict[key])) # eliminate duplicates
the dict.get(key,[]) method returns an empty list if the key doesn't exist in the dict as a default parameter.
#Edit: You said there may contain duplicates. This method would eliminate all duplicates.
#Edit: Added multilines with """
def solved_tasks(filename):
res = {}
values=""
with open(filename, "r") as f:
for line in f.readlines():
values += line.strip()[0] #take only the first value and concatinate with the values string
value = values[0] #take the first value
res[int(value)] = values.count(value) #put it in the dict
for i in values: #loop the values
if i != value: # if the value is not the first value, then the value is the new found value
value = i
res[int(value)] = values.count(value) #add the new value to the dict
return res
Person Node Value
Bob A 2
Bob A 3
Bob A 4
Bob B 2
Bob B 3
Jill A 1
Jill B 2
I am attempting to get the following into a data structure similar to this
{ 'Bob': { 'A':[2,3,4],'B':[2,3], :'Jill':{'A':[1], 'B':[2]}
I know this might not be the best approach, but what I am trying to do with my data structure is the following:
Dictionary whose key is a value and check if it a value.
Value of dictionary is another dictionary and need to check if key is already in the value.
Value of the second dictionary is a list which needs to be appended to if the list exists like in Bob's case.
I have tried numerous approaches, but right now, my code is looking like this.
names = {}
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreaders:
person, letter, value = row[0], row[1], row[2]
if person not in names:
names[person] = { letter: value}
else:
print 'Lost a bit'
### Lost here
print names
I would use a defaultdict, where the default is a defaultdict with a default list. Then it's very easy to populate that dictionary
from collections import defaultdict
d = defaultdict(lambda: defaultdict(list))
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreader:
person, letter, value = row[0], row[1], row[2]
d[person][letter].append(value)
Use nested dictionaries by implementing the perl’s autovivification feature.
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
# For your case
names = AutoVivification()
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreaders:
person, letter, value = row[0], row[1], row[2]
if names[person][letter]:
names[person][letter].append(value)
else:
names[person][letter] = [value]
I would solve it like this. I am sure it could be shortened quite a bit if you make it more recursive and use comprehension. But to understand the principle and make it similar to your start (solution not tested for syntactic errors):
names = {}
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreaders:
person, letter, value = row[0], row[1], row[2]
if person not in names:
names[person] = {}
if letter not in names[person]:
names[person][letter] = []
names[person][letter].append(value)
print names
Don't let the code confuse you. You defined pretty clearly what you need to do. Now just take it slow and do it.
First, as you said: value of second dictionary is a list. You don't want
names[person] = { letter: value }
Instead you want:
names[person] = { letter: [value] }
Now in your else, you know that person is already in names. Is value? If it is, add to the list, otherwise, create a new list:
else:
if letter in names[person]:
names[person][letter].append(value)
else:
names[person][letter] = value
Now, you can probably go about this a whole lot cleaner using the setdefault method of a dict.
This would do the same thing:
names = {}
with open('impacts.csv', 'rb') as csvfile:
namesreaders = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in namesreaders:
person, letter, value = row[0], row[1], row[2]
# Make sure that if person isn't in names,
# person is added to names as a dict
names.setdefault(person, {})
# Make sure that if letter isn't in the names[person] dict
# letter is added as an empty list
names[person].setdefault(letter, [])
# At this point, names[person][letter] is an existing, possibly empty
# list. Simply add the new value
names[person][letter].append(value)
Quite a well known pythonic technique is to use defaultdict() to specify complex dictionary structures at initialization time.
Disclaimer: I would highly recommend considering class structures to handle this data, as these kinds of nested dictionaries can lead to code smells rather quickly.
import functools
from collections import defaultdict
d_inner = functools.partial(defaultdict, list)
d = defaultdict(d_inner)
# Do something with data here
for name, letter, number in row:
d[name][letter].append(number)
Sample Output:
defaultdict(<functools.partial object at 0x109de71b0>, {'Bob': defaultdict(<type 'list'>, {'A': [2, 3, 4], 'B': [2, 3]}), 'Jill': defaultdict(<type 'list'>, {'A': [1], 'B': [2]})})
I have a dictionary with many, many key/value pairs.
The keys are dates and the values are worldwide top-level domains.
I want to output the dictionary to a text file so that it counts and alpha sorts similar values but only within the same key
for example:
*key: value1:count value2:count*
date1: au:4 be:12 com:44
date2: az:4 com:14 net:5
Code:
with open('access_logshort.txt','rU') as f:
for line in f:
list1 = re.search(r'(?P<Date>[0-9]{2}/[a-zA-Z]{3}/[0-9]{4})(.+)(GET|POST)\s(http://|https://)([a-zA-Z.]+)(\.)(?P<tld>[a-zA-Z]+)(/).+?"\s200',line)
if list1 != None:
print list1.groupdict()
one_tuple = list1.group(1,7)
my_dict[one_tuple[0]]=one_tuple[1]
output:
print my_dict
{'09/Mar/2004': 'hu'}
{'09/Mar/2004': 'hu'}
{'09/Mar/2004': 'com'}
{'09/Mar/2004': 'ru'}
{'09/Mar/2004': 'ru'}
{'09/Mar/2004': 'com'}
T
This should suit your case.
from collections import defaultdict
from dateutil.parser import parse
import csv
import re
data = defaultdict(lambda: defaultdict(int))
with open('access_logshort.txt','rU') as f:
for line in f:
list1 = re.search(r'(?P<Date>[0-9]{2}/[a-zA-Z]{3}/[0-9]{4})(.+)(GET|POST)\s(http://|https://)([a-zA-Z.]+)(\.)(?P<tld>[a-zA-Z]+)(/).+?"\s200',line)
if list1 is not None:
date, domain = list1.group(1,7)
data[date.lower()][domain.lower()] += 1
with open('my_data.csv', 'wb') as ofile:
# add delimiter='\t' to the argument list of csv.writer if you want
# tsv rather than csv
writer = csv.writer(ofile)
for key, value in sorted(data.iteritems(), key=lambda x: parse(x[0])):
domains = sorted(value.iteritems())
writer.writerow([key] + ['{}:{}'.format(*d) for d in domains])
Output:
10/Mar/2004,com:2,hu:2,ru:2
09/Mar/2004,com:2,hu:2,ru:2
I am trying to make a dictionary from a csv file in python, but I have multiple categories. I want the keys to be the ID numbers, and the values to be the name of the items. Here is the text file:
"ID#","name","quantity","price"
"1","hello kitty","4","9999"
"2","rilakkuma","3","999"
"3","keroppi","5","1000"
"4","korilakkuma","6","699"
and this is what I have so far:
txt = open("hk.txt","rU")
file_data = txt.read()
lst = [] #first make a list, and then convert it into a dictionary.
for key in file_data:
k = key.split(",")
lst.append((k[0],k[1]))
dic = dict(lst)
print(dic)
This just prints an empty list though. I want the keys to be the ID#, and then the values will be the names of the products. I will make another dictionary with the names as the keys and the ID#'s as the values, but I think it will be the same thing but the other way around.
Use the csv module to handle your data; it'll remove the quoting and handle the splitting:
results = {}
with open('hk.txt', 'r', newline='') as txt:
reader = csv.reader(txt)
next(reader, None) # skip the header line
for row in reader:
results[row[0]] = row[1]
For your sample input, this produces:
{'4': 'korilakkuma', '1': 'hello kitty', '3': 'keroppi', '2': 'rilakkuma'}
You can use csv DictReader:
import csv
result={}
with open('/tmp/test.csv', 'r', newline='') as f:
for d in csv.DictReader(f):
result[d['ID#']]=d['name']
print(result)
# {'1': 'hello kitty', '3': 'keroppi', '2': 'rilakkuma', '4': 'korilakkuma'}
You can use a dictionary directly:
dictionary = {}
file_data.readline() # skip the first line
for key in file_data:
key = key.replace('"', '').strip()
k = key.split(",")
dictionary[k[0]] = k[1]
try this or use any library to read the file.
txt = open("hk.txt","rU")
file_data = txt.read()
file_lines = file_data.split("\n")
lst = [] #first make a list, and then convert it into a dictionary.
for linenumber in range(1,len(file_lines)):
k = file_lines[linenumber].split(",")
lst.append((k[0][1:len(k[0])-1],k[1][1:len(k[1])-1]))
dic = dict(lst)
print(dic)
but you can use the dict directly as well.