make dictionary from csv file columns - python

i am new to the concept of dictionaries in python.
I have a csv file with multiple columns and i want to create a dictionary such that keys are taken from 1st column and values from the second and a key:value pair is made for all rows of those two columns.
The code is as follows:
if __name__=="__main__":
reader = csv.reader(open("file.csv", "rb"))
for rows in reader:
k = rows[0]
v = rows[1]
mydict = {k:v}
print (mydict)
problem: The output returned is only for "last" or "bottom most" row of the first two columns i.e. {'12654':'18790'}. i want the dictionary to contain all 100 rows of the first two columns in this format. How to do that? can i run some loop on the row numbers for the first two columns to do that...i dont know how.

if __name__=="__main__":
mydict = {}
reader = csv.reader(open("file.csv", "rb"))
for rows in reader:
k = rows[0]
v = rows[1]
mydict[k] = v
print mydict
Here:
mydict = {k:v}
You were making new dictionary in every iteration, and the previous data has been lost.
Update:
You can make something like this:
mydict = {}
L = [(1, 2), (2, 4), (1, 3), (3, 2), (3, 4)]
for el in L:
k, v = el
if not k in mydict:
mydict[k] = [v]
else:
mydict[k].append(v)
print mydict
>>>
{1: [2, 3], 2: [4], 3: [2, 4]}
This way, each value of the same key will be stored
Your code will be:
if __name__=="__main__":
mydict = {}
reader = csv.reader(open("file.csv", "rb"))
for i, rows in enumerate(reader):
if i == 0: continue
k = rows[0]
v = rows[1]
if not k in mydict:
mydict[k] = [v]
else:
mydict[k].append(v)
print mydict
Update2: You mean?
for k, v in mydict.items():
print "%s: %s" % (k, v)
>>>
1: [2, 3]
2: [4]
3: [2, 4]
Update3:
This should work:
if __name__=="__main__":
mydict = {}
reader = csv.reader(open("file.csv", "rb"))
for i, rows in enumerate(reader):
if i == 0: continue
k = rows[0]
v = rows[1]
if not k in mydict:
mydict[k] = [v]
else:
mydict[k].append(v)
print mydict

You are creating a new dict and overwriting the old one each iteration. #develerx's answer fixes this problem. I just wanted to point an easier way, using dict comprehensions:
Assuming the csv file contains two columns.
if __name__=="__main__":
reader = csv.reader(open("file.csv", "rb"))
my_dict = {k: v for k, v in reader}
print mydict
If you are using older version(older than 2.7 I think), you can't use dict comprehensions, just use the dict function then:
my_dict = dict((k, v) for k, v in reader)
Edit: And I just thought that; my_dict = dict(reader) could also work.

Related

How to create a list of dictionaries from a csv file without list comprehension

The output must be like this:
[{'id': '1', 'first_name': 'Heidie','gender': 'Female'}, {'id': '2', 'first_name': 'Adaline', 'gender': 'Female'}, {...}
There is a code snippet that works, running this requirement.
with open('./test.csv', 'r') as file_read:
reader = csv.DictReader(file_read, skipinitialspace=True)
listDict = [{k: v for k, v in row.items()} for row in reader]
print(listDict)
However, i can't understand some points about this code above:
List comprehension: listDict = [{k: v for k, v in row.items()} for row in reader]
How the python interpret this?
How does the compiler assemble a list always with the header (id,first_name, gender) and their values?
How would be the implementation of this code with nested for
I read theese answers, but i still do not understand:
python list comprehension double for
convert csv file to list of dictionaries
My csv file:
id,first_name,last_name,email,gender
1,Heidie,Philimore,hphilimore0#msu.edu,Female
2,Adaline,Wapplington,awapplington1#icq.com,Female
3,Erin,Copland,ecopland2#google.co.uk,Female
4,Way,Buckthought,wbuckthought3#usa.gov,Male
5,Adan,McComiskey,amccomiskey4#theatlantic.com,Male
6,Kilian,Creane,kcreane5#hud.gov,Male
7,Mandy,McManamon,mmcmanamon6#omniture.com,Female
8,Cherish,Futcher,cfutcher7#accuweather.com,Female
9,Dave,Tosney,dtosney8#businesswire.com,Male
10,Torr,Kiebes,tkiebes9#dyndns.org,Male
your list comprehension :
listDict = [{k: v for k, v in row.items()} for row in reader]
equals:
item_list = []
#go through every row
for row in reader:
item_dict = {}
#in every row go through each item
for k,v in row.items():
#add each items k,v to dict.
item_dict[k] = v
#append every item_dict to item_list
item_list.append(item_dict)
print(item_list)
EDIT (some more explanation):
#lets create a list
list_ = [x ** 2 for x in range(0,10)]
print(list_)
this returns:
[0,1,4,9,16,25,36,49,64,81]
You can write this as:
list_ = []
for x in range(0,10):
list_.append(x ** 2)
So in that example yes you read it 'backwards'
Now assume the next:
#lets create a list
list_ = [x ** 2 for x in range(0,10) if x % 2 == 0]
print(list_)
this returns:
[0,4,16,36,64]
You can write this as:
list_ = []
for x in range(0,10):
if x % 2 == 0:
list_.append(x ** 2)
So thats not 100% backwards, but it should be logical whats happening. Hope this helps you!

Problems with Python Dictionarys and nested Lists

I am trying to create a dictionary that has a nested list inside of it.
The goal would be to have it be:
key : [x,y,z]
I am pulling the information from a csv file and counting the number of times a certain key shows up in each column. However I am getting the below error
> d[key][i] = 1
KeyError: 'owner'
Where owner is the title of my column.
if __name__ == '__main__':
d = {}
with open ('sample.csv','r') as f:
reader = csv.reader(f)
for i in range(0,3):
for row in reader:
key = row[0]
if key in d:
d[key][i] +=1
else:
d[key][i] = 1
for key,value in d.iteritems():
print key,value
What do I tweak in this loop to have it create a key if it doesn't exist and then add to it if it does?
The problem is, that you try to use a list ([i]) where no list is.
So you have to replace
d[key][i] = 1
with
d[key] = [0,0,0]
d[key][i] = 1
This would first create the list with three entries (so you can use [0], [1] and [2] afterward without error) and then assigns one to the correct entry in the list.
You can use defaultdict:
from collections import defaultdict
ncols = 3
d = defaultdict(lambda: [0 for i in range(ncols)])
Use a try, catch block to append a list to the new key, then increment as needed
if __name__ == '__main__':
d = {}
with open ('sample.csv','r') as f:
reader = csv.reader(f)
for i in xrange(0,3):
for row in reader:
key = row[i]
try: d[key][i] += 1
except KeyError:
d[key] = [0, 0, 0]
d[key][i] = 1
for key,value in d.iteritems():
print key,value
Using defaultdict and Counter you can come up with a dict that allows you to easily measure how many times a key appeared in a position (in this case 1st, 2nd or 3rd, by the slice)
csv = [
['a','b','c','d'],
['e','f','g', 4 ],
['a','b','c','d']
]
from collections import Counter, defaultdict
d = defaultdict(Counter)
for row in csv:
for idx, value in enumerate(row[0:3]):
d[value][idx] += 1
example usage:
print d
print d['a'][0] #number of times 'a' has been found in the 1st position
print d['b'][2] #number of times 'b' found in the 3rd position
print d['f'][1] #number of times 'f' found in 2nd position
print [d['a'][n] for n in xrange(3)] # to match the format requested in your post
defaultdict(<class 'collections.Counter'>, {'a': Counter({0: 2}), 'c': Counter({2: 2}), 'b': Counter({1: 2}), 'e': Counter({0: 1}), 'g': Counter({2: 1}), 'f': Counter({1: 1})})
2
0
1
[2, 0, 0]
Or put into a function:
def occurrences(key):
return [d[key][n] for n in xrange(3)]
print occurrences('a') # [2, 0, 0]

Find duplicates of two columns from csv

I want to find duplicate values of one column and replaced with value of another column of csv which has multiple columns. So first I put two columns from the csv to the dictionary. Then I want to find duplicate values of dictionary that has string values and keys. I tried with solutions of remove duplicates of dictionary but got the error as not hashable or no result. Here is the first part of code.
import csv
from collections import defaultdict
import itertools as it
mydict = {}
index = 0
reader = csv.reader(open(r"computing.csv", "rb"))
for i, rows in enumerate(reader):
if i == 0:
continue
if len(rows) == 0:
continue
k = rows[3].strip()
v = rows[2].strip()
if k in mydict:
mydict[k].append(v)
else:
mydict[k] = [v]
#mydict = hash(frozenset(mydict))
print mydict
d = {}
while True:
try:
d = defaultdict(list)
for k,v in mydict.iteritems():
#d[frozenset(mydict.items())]
d[v].append(k)
except:
continue
writer = csv.writer(open(r"OLD.csv", 'wb'))
for key, value in d.items():
writer.writerow([key, value])
Your question is unclear. So I hope I got it right.
Please give an example of input columns and the desired output columns.
Please give a printout of the error and let us know which line caused the error.
if column1=[1,2,3,1,4] and column2=[a,b,c,d,e] do you want the output to be n_column1=[a,2,3,d,4] and column2 =[1,b,c,d,e]
I imagine the exception was in d[v].append(k) since clearly v is a list. you cannot use a list as a key in a dictionary.
In [1]: x = [1,2,3,1,4]
In [2]: y = ['a','b','c','d','e']
In [5]: from collections import defaultdict
In [6]: d = defaultdict(int)
In [7]: for a in x:
...: d[a] += 1
In [8]: d
Out[8]: defaultdict(<type 'int'>, {1: 2, 2: 1, 3: 1, 4: 1})
In [9]: x2 = []
In [10]: for a,b in zip(x,y):
....: x2.append(a if d[a]==1 else b)
....:
In [11]: x
Out[11]: [1, 2, 3, 1, 4]
In [12]: x2
Out[12]: ['a', 2, 3, 'd', 4]
In that case, I guess if I had to change your code to fit. I'd do something like that:
import csv
from collections import defaultdict
import itertools as it
mydict = {}
index = 0
reader = csv.reader(open(r"computing.csv", "rb"))
histogram = defaultdict(int)
k = []
v = []
for i, rows in enumerate(reader):
if i == 0:
continue
if len(rows) == 0:
continue
k.append(rows[3].strip())
v.append(rows[2].strip())
item = k[-1]
histogram[item] += 1
output_column = []
for first_item, second_item in zip(k,v):
output_column.append(first_item if histogram[first_item]==1 else second_item)
writer = csv.writer(open(r"OLD.csv", 'wb'))
for c1, c2 in zip(output_column, v):
writer.writerow([c1, c2])

Combine Python dictionaries that have the same Key name

I have two separate Python List that have common key names in their respective dictionary. The second list called recordList has multiple dictionaries with the same key name that I want to append the first list clientList. Here are examples lists:
clientList = [{'client1': ['c1','f1']}, {'client2': ['c2','f2']}]
recordList = [{'client1': {'rec_1':['t1','s1']}}, {'client1': {'rec_2':['t2','s2']}}]
So the end result would be something like this so the records are now in a new list of multiple dictionaries within the clientList.
clientList = [{'client1': [['c1','f1'], [{'rec_1':['t1','s1']},{'rec_2':['t2','s2']}]]}, {'client2': [['c2','f2']]}]
Seems simple enough but I'm struggling to find a way to iterate both of these dictionaries using variables to find where they match.
When you are sure, that the key names are equal in both dictionaries:
clientlist = dict([(k, [clientList[k], recordlist[k]]) for k in clientList])
like here:
>>> a = {1:1,2:2,3:3}
>>> b = {1:11,2:12,3:13}
>>> c = dict([(k,[a[k],b[k]]) for k in a])
>>> c
{1: [1, 11], 2: [2, 12], 3: [3, 13]}
Assuming you want a list of values that correspond to each key in the two lists, try this as a start:
from pprint import pprint
clientList = [{'client1': ['c1','f1']}, {'client2': ['c2','f2']}]
recordList = [{'client1': {'rec_1':['t1','s1']}}, {'client1': {'rec_2':['t2','s2']}}]
clientList.extend(recordList)
outputList = {}
for rec in clientList:
k = rec.keys()[0]
v = rec.values()[0]
if k in outputList:
outputList[k].append(v)
else:
outputList[k] = [v,]
pprint(outputList)
It will produce this:
{'client1': [['c1', 'f1'], {'rec_1': ['t1', 's1']}, {'rec_2': ['t2', 's2']}],
'client2': [['c2', 'f2']]}
This could work but I am not sure I understand the rules of your data structure.
# join all the dicts for better lookup and update
clientDict = {}
for d in clientList:
for k, v in d.items():
clientDict[k] = clientDict.get(k, []) + v
recordDict = {}
for d in recordList:
for k, v in d.items():
recordDict[k] = recordDict.get(k, []) + [v]
for k, v in recordDict.items():
clientDict[k] = [clientDict[k]] + v
# I don't know why you need a list of one-key dicts but here it is
clientList = [dict([(k, v)]) for k, v in clientDict.items()]
With the sample data you provided this gives the result you wanted, hope it helps.

Update dictionary while parsing CSV file

I have csv file like this:
item,#RGB
item1,#ffcc00
item1,#ffcc00
item1,#ff00cc
item2,#00ffcc
item2,#ffcc00
item2,#ffcc00
item2,#ffcc00
....
and I want to make dictionary d, with item name as key and RGB value and count as tuples in list as dictionary value, like:
d[item] = [ (#RGB, count) ]
so for "item1" as in example, I would like to get:
d['item1'] = [ ('#ffcc00', 2), ('#ff00cc', 1) ]
I imagine some Pythonic iterator can do this in one line, but I can't understand how at this moment. So far I've made this:
d={}
with open('data.csv', 'rb') as f:
reader = csv.reader(f)
try:
for row in reader:
try:
if d[(row[0], row[1])]:
i +=1
except KeyError:
i = 1
d[(row[0], row[1])] = i
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
which gives me:
d[(item, #RGB)] = count
Any better way? Or am I doing this wrongly from start?
how about:
a = {}
for row in reader:
a.setdefault(row[0], {}).setdefault(row[1], 0)
a[row[0]][row[1]] += 1
This creates a dictionary like
{'item2': {'#00ffcc': 1, '#ffcc00': 3},
'item1': {'#ffcc00': 2, '#ff00cc': 1}}
I find it more convenient than your structure, but you can convert it to tuples if needed:
b = dict((k, v.items()) for k, v in a.items())
import csv
from collections import defaultdict, Counter
from itertools import islice
with open('infile.txt') as f:
d=defaultdict(Counter)
for k,v in islice(csv.reader(f),1,None):
d[k].update((v,))
print d
prints
defaultdict(<class 'collections.Counter'>, {'item2': Counter({'#ffcc00': 3, '#00ffcc': 1}), 'item1': Counter({'#ffcc00': 2, '#ff00cc': 1})})

Categories

Resources