I am trying to create a dictionary that has a nested list inside of it.
The goal would be to have it be:
key : [x,y,z]
I am pulling the information from a csv file and counting the number of times a certain key shows up in each column. However I am getting the below error
> d[key][i] = 1
KeyError: 'owner'
Where owner is the title of my column.
if __name__ == '__main__':
d = {}
with open ('sample.csv','r') as f:
reader = csv.reader(f)
for i in range(0,3):
for row in reader:
key = row[0]
if key in d:
d[key][i] +=1
else:
d[key][i] = 1
for key,value in d.iteritems():
print key,value
What do I tweak in this loop to have it create a key if it doesn't exist and then add to it if it does?
The problem is, that you try to use a list ([i]) where no list is.
So you have to replace
d[key][i] = 1
with
d[key] = [0,0,0]
d[key][i] = 1
This would first create the list with three entries (so you can use [0], [1] and [2] afterward without error) and then assigns one to the correct entry in the list.
You can use defaultdict:
from collections import defaultdict
ncols = 3
d = defaultdict(lambda: [0 for i in range(ncols)])
Use a try, catch block to append a list to the new key, then increment as needed
if __name__ == '__main__':
d = {}
with open ('sample.csv','r') as f:
reader = csv.reader(f)
for i in xrange(0,3):
for row in reader:
key = row[i]
try: d[key][i] += 1
except KeyError:
d[key] = [0, 0, 0]
d[key][i] = 1
for key,value in d.iteritems():
print key,value
Using defaultdict and Counter you can come up with a dict that allows you to easily measure how many times a key appeared in a position (in this case 1st, 2nd or 3rd, by the slice)
csv = [
['a','b','c','d'],
['e','f','g', 4 ],
['a','b','c','d']
]
from collections import Counter, defaultdict
d = defaultdict(Counter)
for row in csv:
for idx, value in enumerate(row[0:3]):
d[value][idx] += 1
example usage:
print d
print d['a'][0] #number of times 'a' has been found in the 1st position
print d['b'][2] #number of times 'b' found in the 3rd position
print d['f'][1] #number of times 'f' found in 2nd position
print [d['a'][n] for n in xrange(3)] # to match the format requested in your post
defaultdict(<class 'collections.Counter'>, {'a': Counter({0: 2}), 'c': Counter({2: 2}), 'b': Counter({1: 2}), 'e': Counter({0: 1}), 'g': Counter({2: 1}), 'f': Counter({1: 1})})
2
0
1
[2, 0, 0]
Or put into a function:
def occurrences(key):
return [d[key][n] for n in xrange(3)]
print occurrences('a') # [2, 0, 0]
Related
I'd like to write a function that will take one argument (a text file) to use its contents as keys and assign values to the keys. But I'd like the keys to go from 1 to n:
{'A': 1, 'B': 2, 'C': 3, 'D': 4... }.
I tried to write something like this:
Base code which kind of works:
filename = 'words.txt'
with open(filename, 'r') as f:
text = f.read()
ready_text = text.split()
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
for item in lst:
if item not in dictionary:
dictionary[item] = 1
else:
dictionary[item] += 1
return dictionary
print(create_dict(ready_text))
The output: {'A': 1, 'B': 1, 'C': 1, 'D': 1... }.
Attempt to make the thing work:
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
values = list(range(100)) # values
for item in lst:
if item not in dictionary:
for value in values:
dictionary[item] = values[value]
else:
dictionary[item] = values[value]
return dictionary
The output: {'A': 99, 'B': 99, 'C': 99, 'D': 99... }.
My attempt doesn't work. It gives all the keys 99 as their value.
Bonus question: How can I optimaze my code and make it look more elegant/cleaner?
Thank you in advance.
You can use dict comprehension with enumerate (note the start parameter):
words.txt:
colorless green ideas sleep furiously
Code:
with open('words.txt', 'r') as f:
words = f.read().split()
dct = {word: i for i, word in enumerate(words, start=1)}
print(dct)
# {'colorless': 1, 'green': 2, 'ideas': 3, 'sleep': 4, 'furiously': 5}
Note that "to be or not to be" will result in {'to': 5, 'be': 6, 'or': 3, 'not': 4}, perhaps what you don't want. Having only one entry out of two (same) words is not the result of the algorithm here. Rather, it is inevitable as long as you use a dict.
Your program sends a list of strings to create_dict. For each string in the list, if that string is not in the dictionary, then the dictionary value for that key is set to 1. If that string has been encountered before, then the value of that key is increased by 1. So, since every key is being set to 1, then that must mean there are no repeat keys anywhere, meaning you're sending a list of unique strings.
So, in order to have the numerical values increase with each new key, you just have to increment some number during your loop:
num = 0
for item in lst:
num += 1
dictionary[item] = num
There's an easier way to loop through both numbers and list items at the same time, via enumerate():
for num, item in enumerate(lst, start=1): # start at 1 and not 0
dictionary[item] = num
You can use this code. If an item has been in the lst more than once, the idx is considered one time in dictionary!
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
idx = 1
for item in lst:
if item not in dictionary:
dictionary[item]=idx
idx += 1
return dictionary
I am new to Python and working on a problem where I have to match a list of indices to a list of value with 2 conditions:
If there is a repeated index, then the values should be summed
If there is no index in the list, then value should be 0
For example, below are my 2 lists: 'List of Inds' and 'List of Vals'. So at index 0, my value is 5; at index 1, my value is 4; at index 2, my value is 3 (2+1), at index 3, may value 0 (since no value associated with the index) and so on.
Input:
'List of Inds' = [0,1,4,2,2]
'List Vals' = [5,4,3,2,1]
Output = [5,4,3,0,3]
I have been struggling with it for few days and can't find anything online that can point me in the right direction. Thank you.
List_of_Inds = [0,1,4,2,2]
List_Vals = [5,4,3,2,1]
dic ={}
i = 0
for key in List_of_Inds:
if key not in dic:
dic[key] = 0
dic[key] = List_Vals[i]+dic[key]
i = i+1
output = []
for key in range(0, len(dic)+1):
if key in dic:
output.append(dic[key])
else:
output.append(0)
print(dic)
print(output)
output:
{0: 5, 1: 4, 4: 3, 2: 3}
[5, 4, 3, 0, 3]
The following code works as desired. In computer science it is called "Sparse Matrix" where the data is kept only for said indices, but the "virtual size" of the data structure seems large from the outside.
import logging
class SparseVector:
def __init__(self, indices, values):
self.d = {}
for c, indx in enumerate(indices):
logging.info(c)
logging.info(indx)
if indx not in self.d:
self.d[indx] = 0
self.d[indx] += values[c]
def getItem(self, key):
if key in self.d:
return self.d[key]
else:
return 0
p1 = SparseVector([0,1,4,2,2], [5,4,3,2,1])
print p1.getItem(0);
print p1.getItem(1);
print p1.getItem(2);
print p1.getItem(3);
print p1.getItem(4);
print p1.getItem(5);
print p1.getItem(6);
Answer code is
def ans(list1,list2):
dic={}
ans=[]
if not(len(list1)==len(list2)):
return "Not Possible"
for i in range(0,len(list1)):
ind=list1[i]
val=list2[i]
if not(ind in dic.keys()):
dic[ind]=val
else:
dic[ind]+=val
val=len(list1)
for i in range(0,val):
if not(i in dic.keys()):
ans.append(0)
else:
ans.append(dic[i])
return ans
To test:
print(ans([0,1,4,2,2], [5,4,3,2,1]))
output:
[5, 4, 3, 0, 3]
Hope it helps
Comment if you dont understand any step
what you can do is sort the indexes and values in an ascending order, and then sum it up. Here is an example code:
import numpy as np
ind = [0,1,4,2,2]
vals = [5,4,3,2,1]
points = zip(ind,vals)
sorted_points = sorted(points)
new_ind = [point[0] for point in sorted_points]
new_val = [point[1] for point in sorted_points]
output = np.zeros((len(new_ind)))
for i in range(len(new_ind)):
output[new_ind[i]] += new_val[i]
In this code, the index values are sorted to be in ascending order and then the value array is rearranged according to the sorted index array. Then, using a simple for loop, you can sum the values of each existing index and calculate the output.
This is a grouping problem. You can use collections.defaultdict to build a dictionary mapping, incrementing values in each iteration. Then use a list comprehension:
indices = [0,1,4,2,2]
values = [5,4,3,2,1]
from collections import defaultdict
dd = defaultdict(int)
for idx, val in zip(indices, values):
dd[idx] += val
res = [dd[idx] for idx in range(max(dd) + 1)]
## functional alternative:
# res = list(map(dd.get, range(max(dd) + 1)))
print(res)
# [5, 4, 3, 0, 3]
I am writing a function that take dictionary input and return list of keys which have unique values in that dictionary. Consider,
ip = {1: 1, 2: 1, 3: 3}
so output should be [3] as key 3 has unique value which is not present in dict.
Now there is problem in given fuction:
def uniqueValues(aDict):
dicta = aDict
dum = 0
for key in aDict.keys():
for key1 in aDict.keys():
if key == key1:
dum = 0
else:
if aDict[key] == aDict[key1]:
if key in dicta:
dicta.pop(key)
if key1 in dicta:
dicta.pop(key1)
listop = dicta.keys()
print listop
return listop
I am getting error like:
File "main.py", line 14, in uniqueValues
if aDict[key] == aDict[key1]: KeyError: 1
Where i am doing wrong?
Your main problem is this line:
dicta = aDict
You think you're making a copy of the dictionary, but actually you still have just one dictionary, so operations on dicta also change aDict (and so, you remove values from adict, they also get removed from aDict, and so you get your KeyError).
One solution would be
dicta = aDict.copy()
(You should also give your variables clearer names to make it more obvious to yourself what you're doing)
(edit) Also, an easier way of doing what you're doing:
def iter_unique_keys(d):
values = list(d.values())
for key, value in d.iteritems():
if values.count(value) == 1:
yield key
print list(iter_unique_keys({1: 1, 2: 1, 3: 3}))
Use Counter from collections library:
from collections import Counter
ip = {
1: 1,
2: 1,
3: 3,
4: 5,
5: 1,
6: 1,
7: 9
}
# Generate a dict with the amount of occurrences of each value in 'ip' dict
count = Counter([x for x in ip.values()])
# For each item (key,value) in ip dict, we check if the amount of occurrences of its value.
# We add it to the 'results' list only if the amount of occurrences equals to 1.
results = [x for x,y in ip.items() if count[y] == 1]
# Finally, print the results list
print results
Output:
[3, 4, 7]
I'm trying to count the number of times a specified key occurs in my list of dicts. I've used Counter() and most_common(n) to count up all the keys, but how can I find the count for a specific key? I have this code, which does not work currently:
def Artist_Stats(self, artist_pick):
entries = TopData(self.filename).data_to_dict()
for d in entries:
x = d['artist']
find_artist = Counter()
print find_artist[x][artist_pick]
The "entries" data has about 60k entries and looks like this:
[{'album': 'Nikki Nack', 'song': 'Find a New Way', 'datetime': '2014-12-03 09:08:00', 'artist': 'tUnE-yArDs'},]
You could extract it, put it into a list, and calculate the list's length.
key_artists = [k['artist'] for k in entries if k.get('artist')]
len(key_artists)
Edit: using a generator expression might be better if your data is big:
key_artists = (1 for k in entries if k.get('artist'))
sum(key_artists)
2nd Edit:
for a specific artist, you would replace if k.get('artist') with if k.get('artist') == artist_pick
3rd Edit: you could loop as well, if you're not comfortable with comprehensions or generators, or if you feel that enhances code readability
n = 0 # number of artists
for k in entries:
n += 1 if k.get('artist') == artist_pick else 0
You can add Counter objects together with +. Below is a demonstration:
>>> from collections import Counter
>>> data = [{'a':1, 'b':1}, {'a':1, 'c':1}, {'b':1, 'c':1}, {'a':1, 'c':1}, {'a':1, 'd':1}]
>>> counter = Counter(data[0])
>>> for d in data[1:]:
... counter += Counter(d)
...
>>> counter
Counter({'a': 4, 'c': 3, 'b': 2, 'd': 1})
>>> counter['a'] # Count of 'a' key
4
>>> counter['d'] # Count of 'd' key
1
>>>
Or, if you want to get fancy, replace the for-loop with sum and a generator expression:
>>> from collections import Counter
>>> data = [{'a':1, 'b':1}, {'a':1, 'c':1}, {'b':1, 'c':1}, {'a':1, 'c':1}, {'a':1, 'd':1}]
>>> counter = sum((Counter(d) for d in data[1:]), Counter(data[0]))
>>> counter
Counter({'a': 4, 'c': 3, 'b': 2, 'd': 1})
>>>
I personally prefer the readability of the for-loop though.
If you mean to count the keys rather than the distinct values to a particular key, then without Counter():
artist_key_count = 0
for track in entries:
if 'artist' in track.keys():
artist_key_count += 1
If you mean to count the number of times each artist appears in your list of tracks, you can also do this without Counter():
artist_counts = {}
for track in entries:
artist = track.get('artist')
try:
artist_counts[artist] += 1
except KeyError:
artist_counts[artist] = 1
I want to find duplicate values of one column and replaced with value of another column of csv which has multiple columns. So first I put two columns from the csv to the dictionary. Then I want to find duplicate values of dictionary that has string values and keys. I tried with solutions of remove duplicates of dictionary but got the error as not hashable or no result. Here is the first part of code.
import csv
from collections import defaultdict
import itertools as it
mydict = {}
index = 0
reader = csv.reader(open(r"computing.csv", "rb"))
for i, rows in enumerate(reader):
if i == 0:
continue
if len(rows) == 0:
continue
k = rows[3].strip()
v = rows[2].strip()
if k in mydict:
mydict[k].append(v)
else:
mydict[k] = [v]
#mydict = hash(frozenset(mydict))
print mydict
d = {}
while True:
try:
d = defaultdict(list)
for k,v in mydict.iteritems():
#d[frozenset(mydict.items())]
d[v].append(k)
except:
continue
writer = csv.writer(open(r"OLD.csv", 'wb'))
for key, value in d.items():
writer.writerow([key, value])
Your question is unclear. So I hope I got it right.
Please give an example of input columns and the desired output columns.
Please give a printout of the error and let us know which line caused the error.
if column1=[1,2,3,1,4] and column2=[a,b,c,d,e] do you want the output to be n_column1=[a,2,3,d,4] and column2 =[1,b,c,d,e]
I imagine the exception was in d[v].append(k) since clearly v is a list. you cannot use a list as a key in a dictionary.
In [1]: x = [1,2,3,1,4]
In [2]: y = ['a','b','c','d','e']
In [5]: from collections import defaultdict
In [6]: d = defaultdict(int)
In [7]: for a in x:
...: d[a] += 1
In [8]: d
Out[8]: defaultdict(<type 'int'>, {1: 2, 2: 1, 3: 1, 4: 1})
In [9]: x2 = []
In [10]: for a,b in zip(x,y):
....: x2.append(a if d[a]==1 else b)
....:
In [11]: x
Out[11]: [1, 2, 3, 1, 4]
In [12]: x2
Out[12]: ['a', 2, 3, 'd', 4]
In that case, I guess if I had to change your code to fit. I'd do something like that:
import csv
from collections import defaultdict
import itertools as it
mydict = {}
index = 0
reader = csv.reader(open(r"computing.csv", "rb"))
histogram = defaultdict(int)
k = []
v = []
for i, rows in enumerate(reader):
if i == 0:
continue
if len(rows) == 0:
continue
k.append(rows[3].strip())
v.append(rows[2].strip())
item = k[-1]
histogram[item] += 1
output_column = []
for first_item, second_item in zip(k,v):
output_column.append(first_item if histogram[first_item]==1 else second_item)
writer = csv.writer(open(r"OLD.csv", 'wb'))
for c1, c2 in zip(output_column, v):
writer.writerow([c1, c2])