I have csv file like this:
item,#RGB
item1,#ffcc00
item1,#ffcc00
item1,#ff00cc
item2,#00ffcc
item2,#ffcc00
item2,#ffcc00
item2,#ffcc00
....
and I want to make dictionary d, with item name as key and RGB value and count as tuples in list as dictionary value, like:
d[item] = [ (#RGB, count) ]
so for "item1" as in example, I would like to get:
d['item1'] = [ ('#ffcc00', 2), ('#ff00cc', 1) ]
I imagine some Pythonic iterator can do this in one line, but I can't understand how at this moment. So far I've made this:
d={}
with open('data.csv', 'rb') as f:
reader = csv.reader(f)
try:
for row in reader:
try:
if d[(row[0], row[1])]:
i +=1
except KeyError:
i = 1
d[(row[0], row[1])] = i
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
which gives me:
d[(item, #RGB)] = count
Any better way? Or am I doing this wrongly from start?
how about:
a = {}
for row in reader:
a.setdefault(row[0], {}).setdefault(row[1], 0)
a[row[0]][row[1]] += 1
This creates a dictionary like
{'item2': {'#00ffcc': 1, '#ffcc00': 3},
'item1': {'#ffcc00': 2, '#ff00cc': 1}}
I find it more convenient than your structure, but you can convert it to tuples if needed:
b = dict((k, v.items()) for k, v in a.items())
import csv
from collections import defaultdict, Counter
from itertools import islice
with open('infile.txt') as f:
d=defaultdict(Counter)
for k,v in islice(csv.reader(f),1,None):
d[k].update((v,))
print d
prints
defaultdict(<class 'collections.Counter'>, {'item2': Counter({'#ffcc00': 3, '#00ffcc': 1}), 'item1': Counter({'#ffcc00': 2, '#ff00cc': 1})})
Related
I have a CSV file and want to read the file to make a 2d dictionary.
I have tried creating a new dictionary:
f = csv.reader(open('test.csv', 'r'))
for row in f:
k, v, p = row
markovTransition[k] = {v: p}
The code above gives the output I want except It overwrites the key when the keys for the first dictionary are the same.
The CSV file is in the format of:
A,A1,3
A,A2,4
B,B1,6
C,C3,7
C,C2,3
C,C5,1
The desired dictionary is:
{A: {A1: 3, A2: 4}, B: {B1: 6}, C: {C3: 7, C2: 3, C5: 1}
The current dictionary is:
{A: {A2: 4}, B: {B1: 6}, C{C5: 1}}
How do I create a 2d dictionary from a CSV file? Thanks.
This is a nice use case for a defaultdict:
markovTransition=collections.defaultdict(dict)
f = csv.reader(open('test.csv', 'r'))
for row in f:
k, v, p = row
markovTransition[k][v] = p
try this:
markovTransition = {}
f = csv.reader(open('test.csv', 'r'))
for row in f:
k, v, p = row
if k in markovTransition.keys(): # Check if already exists and then push it.
markovTransition[k].update({v: p})
else:
markovTransition[k] = {v: p}
1st Text file format .
cake,60
cake,30
tart,50
bread,89
2nd Text file format .
cake,10
cake,10
tart,10
bread,10
Code I have tried.
from collections import defaultdict
answer = defaultdict(int)
recordNum = int(input("Which txt files do you want to read from "))
count = 1
counter = 0
counst = 1
countesr = 0
while recordNum > counter:
with open('txt'+str(count)+'.txt', 'r') as f:
for line in f:
k, v = line.strip().split(',')
answer[k.strip()] += int(v.strip())
count = count+1
counter = counter+1
print(answer)
The problem.
I want the dictionary to be {'cake': '110', 'tart': '60', 'bread': '99'}
but it prints like this {'cake': '30', 'tart': '50', 'bread': '89'}
Instead of the "cake" value adding with the other cake values from txt file one and two it gets replaced with the latest value. How would I solve this issue. Also i tried to make it so if I write 3, it would open and add from 3 txt files, named, txt1.txt, txt2.txt and txt3.txt
The problem is that your 2nd file doesnt get read:
Which txt files do you want to read from 2
defaultdict(<class 'int'>, {})
defaultdict(<class 'int'>, {'cake': 60})
defaultdict(<class 'int'>, {'cake': 90})
defaultdict(<class 'int'>, {'tart': 50, 'cake': 90})
defaultdict(<class 'int'>, {'tart': 50, 'bread': 89, 'cake': 90})
>> terminating
You could make these edits to read all the files (Note: this assumes your text files are named txt1.txt, txt2.txt, txt3.txt and so on..):
from collections import defaultdict
answer = defaultdict(int)
number_of_records = int(input("How many text files do you want to read?"))
for i in range(1, number_of_records+1):
with open('txt{}.txt'.format(i), 'r') as file:
for line in file:
k, v = line.strip().split(',')
answer[k] += int(v)
print(answer)
How many text files do you want to read?
>> 2
defaultdict(<class 'int'>, {'bread': 99, 'tart': 60, 'cake': 110})
>> terminating
Don't know if the code is pythonic way, but works for me and is hardcoded.
x={}
y={}
with open("a.txt") as file:
for i in file:
(key, val) = i.split(',')
if key in x.keys(): x[key]=x[key]+int(val.rstrip())
else: x[key] = int(val.rstrip())
with open("b.txt") as file:
for i in file:
(key, val) = i.split(',')
if key in y.keys(): y[key]=y[key]+int(val.rstrip())
else: y[key] = int(val.rstrip())
print { k: x.get(k, 0) + y.get(k, 0) for k in set(x) | set(y) }
This may help you :
Merge and sum of two dictionaries
I want to do the below in python.The csv file is:
item1,item2,item2,item3
item2,item3,item4,item1
i want to make a dictionary with unique keys item1, item2, item3 and item4.
dictionary = {item1: value1, item2: value2....}. Value is how many times the key appears in csv file.How can I do this?
Obtain a list of all items from your cvs:
with open('your.csv') as csv:
content = csv.readlines()
items = ','.join(content).split(',')
Then start the mapping
mapping = {}
for item in items:
mapping[item] = (mapping.get(item) or 0) + 1
and your will get the following:
>>> mapping
{'item2': 3, 'item3': 2, 'item1': 2, 'item4': 1}
import csv
from collections import Counter
# define a generator, that will yield you field after field
# ignoring newlines:
def iter_fields(filename):
with open(filename, 'rb') as f:
reader = csv.reader(f)
for row in reader:
for field in row:
yield field
# now use collections.Counter to count your values:
counts = Counter(iter_fields('stackoverflow.csv'))
print counts
# output:
# Counter({'item3': 2, 'item2': 2, 'item1': 1,
# ' item1': 1, ' item2': 1, 'item4': 1})
see https://docs.python.org/2/library/collections.html#collections.Counter
import csv
temp = dict()
with open('stackoverflow.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
for x in row:
if x in temp.keys():
temp[x] = int(temp[x]) + 1
else:
temp[x] = 1
print temp
The output is like:-
{'item2': 3, 'item3': 2, 'item1': 2, 'item4': 1}
I am trying to create a dictionary that has a nested list inside of it.
The goal would be to have it be:
key : [x,y,z]
I am pulling the information from a csv file and counting the number of times a certain key shows up in each column. However I am getting the below error
> d[key][i] = 1
KeyError: 'owner'
Where owner is the title of my column.
if __name__ == '__main__':
d = {}
with open ('sample.csv','r') as f:
reader = csv.reader(f)
for i in range(0,3):
for row in reader:
key = row[0]
if key in d:
d[key][i] +=1
else:
d[key][i] = 1
for key,value in d.iteritems():
print key,value
What do I tweak in this loop to have it create a key if it doesn't exist and then add to it if it does?
The problem is, that you try to use a list ([i]) where no list is.
So you have to replace
d[key][i] = 1
with
d[key] = [0,0,0]
d[key][i] = 1
This would first create the list with three entries (so you can use [0], [1] and [2] afterward without error) and then assigns one to the correct entry in the list.
You can use defaultdict:
from collections import defaultdict
ncols = 3
d = defaultdict(lambda: [0 for i in range(ncols)])
Use a try, catch block to append a list to the new key, then increment as needed
if __name__ == '__main__':
d = {}
with open ('sample.csv','r') as f:
reader = csv.reader(f)
for i in xrange(0,3):
for row in reader:
key = row[i]
try: d[key][i] += 1
except KeyError:
d[key] = [0, 0, 0]
d[key][i] = 1
for key,value in d.iteritems():
print key,value
Using defaultdict and Counter you can come up with a dict that allows you to easily measure how many times a key appeared in a position (in this case 1st, 2nd or 3rd, by the slice)
csv = [
['a','b','c','d'],
['e','f','g', 4 ],
['a','b','c','d']
]
from collections import Counter, defaultdict
d = defaultdict(Counter)
for row in csv:
for idx, value in enumerate(row[0:3]):
d[value][idx] += 1
example usage:
print d
print d['a'][0] #number of times 'a' has been found in the 1st position
print d['b'][2] #number of times 'b' found in the 3rd position
print d['f'][1] #number of times 'f' found in 2nd position
print [d['a'][n] for n in xrange(3)] # to match the format requested in your post
defaultdict(<class 'collections.Counter'>, {'a': Counter({0: 2}), 'c': Counter({2: 2}), 'b': Counter({1: 2}), 'e': Counter({0: 1}), 'g': Counter({2: 1}), 'f': Counter({1: 1})})
2
0
1
[2, 0, 0]
Or put into a function:
def occurrences(key):
return [d[key][n] for n in xrange(3)]
print occurrences('a') # [2, 0, 0]
i am new to the concept of dictionaries in python.
I have a csv file with multiple columns and i want to create a dictionary such that keys are taken from 1st column and values from the second and a key:value pair is made for all rows of those two columns.
The code is as follows:
if __name__=="__main__":
reader = csv.reader(open("file.csv", "rb"))
for rows in reader:
k = rows[0]
v = rows[1]
mydict = {k:v}
print (mydict)
problem: The output returned is only for "last" or "bottom most" row of the first two columns i.e. {'12654':'18790'}. i want the dictionary to contain all 100 rows of the first two columns in this format. How to do that? can i run some loop on the row numbers for the first two columns to do that...i dont know how.
if __name__=="__main__":
mydict = {}
reader = csv.reader(open("file.csv", "rb"))
for rows in reader:
k = rows[0]
v = rows[1]
mydict[k] = v
print mydict
Here:
mydict = {k:v}
You were making new dictionary in every iteration, and the previous data has been lost.
Update:
You can make something like this:
mydict = {}
L = [(1, 2), (2, 4), (1, 3), (3, 2), (3, 4)]
for el in L:
k, v = el
if not k in mydict:
mydict[k] = [v]
else:
mydict[k].append(v)
print mydict
>>>
{1: [2, 3], 2: [4], 3: [2, 4]}
This way, each value of the same key will be stored
Your code will be:
if __name__=="__main__":
mydict = {}
reader = csv.reader(open("file.csv", "rb"))
for i, rows in enumerate(reader):
if i == 0: continue
k = rows[0]
v = rows[1]
if not k in mydict:
mydict[k] = [v]
else:
mydict[k].append(v)
print mydict
Update2: You mean?
for k, v in mydict.items():
print "%s: %s" % (k, v)
>>>
1: [2, 3]
2: [4]
3: [2, 4]
Update3:
This should work:
if __name__=="__main__":
mydict = {}
reader = csv.reader(open("file.csv", "rb"))
for i, rows in enumerate(reader):
if i == 0: continue
k = rows[0]
v = rows[1]
if not k in mydict:
mydict[k] = [v]
else:
mydict[k].append(v)
print mydict
You are creating a new dict and overwriting the old one each iteration. #develerx's answer fixes this problem. I just wanted to point an easier way, using dict comprehensions:
Assuming the csv file contains two columns.
if __name__=="__main__":
reader = csv.reader(open("file.csv", "rb"))
my_dict = {k: v for k, v in reader}
print mydict
If you are using older version(older than 2.7 I think), you can't use dict comprehensions, just use the dict function then:
my_dict = dict((k, v) for k, v in reader)
Edit: And I just thought that; my_dict = dict(reader) could also work.