Append to python dictionary - python

I have a csv file that has numbers integers and floats,
"5",7.30124705657363,2,12,7.45176205440562
"18",6.83169608190656,5,11,7.18118108407457
"20",6.40446470770985,4,10,6.70549470337383
"3",5.37498781178147,17,9,5.9902122724706
"10",5.12954203598201,8,8,5.58108702947798
"9",3.93496153596789,7,7,4.35751055597501
I am doing some arithmetic and then I am trying to add them into a dictionary but I am getting key error. Here is the code that I have,
global oldPriceCompRankDict
oldPriceCompRankDict = {}
def increaseQuantityByOne(self, fileLocation):
rows = csv.reader(open(fileLocation))
rows.next()
print "PricePercentage\t" + "OldQuantity\t" + "newQuantity\t" + "oldCompScore\t" + "newCompScore"
for row in rows:
newQuantity = float(row[2]) + 1.0
newCompetitiveScore = float(row[1]) + float(math.log(float(newQuantity), 100))
print row[1] + "\t", str(row[2])+"\t", str(newQuantity) + "\t", str(row[4]) + "\t", newCompetitiveScore
oldPriceCompRankDict[row[3]].append(row[4])
I have un-ordered key, and I didn't think the key has to be in an ordered format. I thought anything could be key.

No need to put in the global keyword, it's a no-op.
Use a defaultdict instead:
from collections import defaultdict
oldPriceCompRankDict = defaultdict(list)
What is happening is that you never define any keys for oldPriceCompRankDict, you just expect them to be lists by default. The defaultdict type gives you a dict that does just that; when a key is not yet found in oldPriceCompRankDict a new list() instance will be used as the starting value instead of raising a KeyError.

A Python dictionary type does not have an append() method. What you are doing is basically trying to call an append() method of the dictionary element accessible by key row[3]. You get a KeyError because you have nothing under key row[3].
You should substitute your code
oldPriceCompRankDict[row[3]].append(row[4])
for this:
oldPriceCompRankDict[row[3]] = row[4]
In addition, the global keyword is used inside functions to indicate that variable is a global one, you can read about it here: Using global variables in a function other than the one that created them
so the right way to declare a global dictionary would be just oldPriceCompRankDict = {}
Your function will start adding to the dictionary from the second row because you call rows.next() if it is a desirable behavior then it's OK, otherwise you don't need to call that method.
Hope this was helpful, happy coding!

Related

Converting a list of strings that corresponds to definition names into a definition lookup dictionary in Python

Is there a way to convert a list of function or definition names (strings) into a dictionary based lookup table where the key is the definition name and the value is the definition class that corresponds to the name (the definition name is the name given). It has been said that the safest way is to use the lookup table approach (dictionary mapping) such as here. However, the situation is such that the list of definition names is variable and the user/programmer is not able to directly write it into the code but rather it has to be added programmatically.
So how can I do effective the following:
nameList = ['methodName1','methodName2','methodName3','methodName4']
methodLookup = {}
for name in nameList:
methodLookup.update({name:name.strip('\'')})
Where the dictionary value is the function not the string.
Something similar in essence to but the opposite of the following:
for var, val in dictionary.items(): exec(var + ' = val')
I guess you could use something like this:
def plus1(x):
return x + 1
def plus2(x):
return x + 2
function_list = ['plus1', 'plus2']
function_dict = {function_name: globals()[function_name] for function_name in function_list}
And then after that you can call plus1 function by:
function_dict['plus1'](5)
#6

Nested dictionaries from file in Python

So I'm trying to create a nested dictionary but I can't seem to wrap my head around the logic.
So say I have input coming in from a csv:
1,2,3
2,3,4
1,4,5
Now'd like to create a dictionary as follows:
d ={1:{2:3,4:5}, 2:{3:4}}
Such that for the first being some ID column that we create keys in the sub dictionary corresponding to second value.
The way I tried it was to go:
d[row[0]] = {row[1]:row[2]}
But that overwrites the first instead of appending/pushing to it, how would I go about this problem? I can't seem to wrap my mind around what keys to use.
Any guidance is appreciated.
Yes, cause dict[row[0]] = is dict[1] = what overwrites previous dict[1] value
You should use :
dict.setdefault(row[0],{})[row[1]] = row[2]
remember there must be no duplicates for row[1] then
or
dict.setdefault(row[0],{}).update({row[1]:row[2]})
if dict[row[0]] == None:
dict[row[0]] = {row[1]:row[2]}
else:
dict[row[0]][row[1]] = row[2]
you can use defaultdict, which is similaire to the built-in dictionaries, but will create dictionaries with a default values, in your case a default value will be a dictionary
from collections import defaultdict
res = defaultdict(dict)
with this code we are creating a defaultdict, with it's values default value being of type dict, so next we would do
for row in l:
res[row[0]][row[1]] = row[2]

Creating a dictionary in Python + working with that dictionary

I am quite new to Python and am just trying to get my head around some basics.
I was wondering if anyone could show me how to perform the following tasks. I have a text file with multiple lines, those lines are as follows:
name1, variable, variable, variable
name2, variable, variable, variable
name3, variable, variable, variable
I want to store these items in a dictionary so they can be easily called. I want the name to be the key. I then want to be able to call the variables like this: key[0] or key1
The code I have at the moment does not do this:
d = {}
with open("servers.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val
Once this is done, I would like to be able to take an input from a user and then check the array to see if this item is present in the array. I have found a few threads on Stackoverflow however none seem to do what I require.
There is a Similar Question asked here.
Any assistance you can provide would be amazing. I am new to this but I hope to learn fast & start contributing to threads myself in the near future :)
Cheers!
You're nearly there. Assuming that .split() actually splits the lines correctly (which it wouldn't do if there are actual commas between the values), you just need an additional unpacking operator (*):
d = {}
with open("servers.txt") as f:
for line in f:
key, *val = line.split() # first element -> key, rest -> val[0], val[1] etc.
d[int(key)] = val
If you want to check if a user-entered key exists, you can do something like
ukey = int(input("Enter key number: "))
values = d.get(ukey)
if d is not None:
# do something
else:
print("That key doesn't exist.")
Suppose that your file my_file.csv looks like:
name1, variable, variable, variable
name2, variable, variable, variable
name3, variable, variable, variable
Use pandas to do the work:
import pandas as pd
result = pd.read_csv('my_file.csv', index_col=0, header=None)
print(result)
print(result.loc['name1'])
Notice that pandas is a 3rd party library, and you need to install it using pip or easy_install tools.

How to read 2 different text files line by line and make another file containing a dictionary using python?

I have two text files name weburl.txt and imageurl.txt, weburl.txt contain URLs of website and imageurl.txt contain all images URLs I want to create a dictionary that read a line of weburl.txt and make key of a dictionary and imageurl.txt line as a value.
weburl.txt
url1
url2
url3
url4
url5......
imageurl.txt
imgurl1
imgurl2
imgurl3
imgurl4
imgurl5
required output is
{'url1': imgurl1, 'url2': imgurl2, 'url3': imgurl3......}
I am using this code
with open('weburl.txt') as f :
key = f.readlines()
with open('imageurl.txt') as g:
value = g.readlines()
dict[key] = [value]
print dict
I am not getting the required results
you can write something like
with open('weburl.txt') as f, \
open('imageurl.txt') as g:
# we use `str.strip` method
# to remove newline characters
keys = (line.strip() for line in f)
values = (line.strip() for line in g)
result = dict(zip(keys, values))
print(result)
more info about zip at docs
There are problems with the statement dict[key] = [value] on so many levels that I get a kind of vertigo as we drill down through them:
The apparent intention to use a variable called dict (a bad idea because it would overshadow Python's builtin reference to the dict class). Let's call it d instead.
Not initializing the dictionary instance first. If you had called it something liked this oversight would earn you an easy-to-understand NameError. However since you're calling it dict, Python will actually be attempting to set items in the dict class itself (which doesn't support __setitem__) instead of inside a dict instance, so you'll get a different, more-confusing error.
Attempting to make a dict entry assignment where the key is a non-hashable type (key is alist). You could convert thelist to the hashable type tuple easily enough, but that's not what you want because you'd still be...
Attempting to assign bunch of values to their respective keys all at once. This can't be done with d[key] = value syntax. It could be done all in one relatively simple statement, i.e. d=dict(zip(key,value)) but unfortunately that doesn't get around the fact that you're...
Not stripping the newline character off the end of each key and value.
Instead, this line:
d = dict((k.strip(), v.strip()) for k, v in zip(key, value))
will do what you appear to want.

Writing a Array of Dictionaries to CSV

I'm trying to get the dictionary (which the first part of the program generates) to write to a csv so that I can perform further operations on the data in excel. I realize the code isn't efficient but at this point I'd just like it to work. I can deal with speeding it up later.
import csv
import pprint
raw_data = csv.DictReader(open("/Users/David/Desktop/crimestats/crimeincidentdata.csv", "r"))
neighborhood = []
place_count = {}
stats = []
for row in raw_data:
neighborhood.append(row["Neighborhood"])
for place in set(neighborhood):
place_count.update({place:0})
for key,value in place_count.items():
for place in neighborhood:
if key == place:
place_count[key] = place_count[key]+1
for key in place_count:
stats.append([{"Location":str(key)},{"Volume":str(place_count[key])}])
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(stats)
The program is still running fine here as is evident by the pprint output
[ [{'Location': 'LINNTON'}, {'Volume': '109'}],
[{'Location': 'SUNDERLAND'}, {'Volume': '118'}],
[{'Location': 'KENTON'}, {'Volume': '715'}]
This is where the error is definitely happening. The program writes the headers to the csv just fine then throws the ValueError.
fieldnames = ['Location', 'Volume']
with open('/Users/David/Desktop/crimestats/localdata.csv', 'w', newline='') as output_file:
csvwriter = csv.DictWriter(output_file, delimiter=',', fieldnames=fieldnames, dialect='excel')
csvwriter.writeheader()
for row in stats:
csvwriter.writerow(row)
output_file.close()
I've spent quite a bit of time searching for this problem but none of the suggestions I have attempted to use have worked. I figure I must me missing something so I'd really appreciate any and all help.
Traceback (most recent call last):
File "/Users/David/Desktop/crimestats/statsreader.py", line 34, in <module>
csvwriter.writerow(row)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/csv.py", line 153, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/csv.py", line 149, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: {'Location': 'SABIN'}, {'Volume': '247'}
I believe your problem is here:
for key in place_count:
stats.append([{"Location":str(key)},{"Volume":str(place_count[key])}])
This is creating a list of two dictionaries. The first has only a "Location" key, and the second has only a "Volume" key. However, the csv.DictWriter objects are expecting a single dictionary per row, with all the keys in the dictionary. Change that code snippet to the following and it should work:
for key in place_count:
stats.append({"Location": str(key), "Volume": str(place_count[key])})
That should take care of the errors you're seeing.
Now, as for why the error message is complaining about fields not in fieldnames, which completely misled you away from the real problem you're having: the writerow() function expects to get a dictionary as its row parameter, but you're passing it a list. The result is confusion: it iterates over the dict in a for loop expecting to get the dict's keys (because that's what you get when you iterate over a dict in Python), and it compares those keys to the values in the fieldnames list. What it's expecting to see is:
"Location"
"Volume"
in either order (because a Python dict makes no guarantees about which order it will return its keys). The reason why they want you to pass in a fieldnames list is so that the fields can be written to the CSV in the correct order. However, because you're passing in a list of two dictionaries, when it iterates over the row parameter, it gets the following:
{'Location': 'SABIN'}
{'Volume': '247'}
Now, the dictionary {'Location': 'SABIN'} does not equal the string "Location", and the dictionary {'Volume': '247'} does not equal the string "Volume", so the writerow() function thinks it's found dict keys that aren't in the fieldnames list you supplied, and it throws that exception. What was really happening was "you passed me a list of two dicts-of-one-key, when I expected a single dict-with-two-keys", but the function wasn't written to check for that particular mistake.
Now I'll mention a couple things you could do to speed up your code. One thing that will help quite a bit is to reduce those three for loops at the start of your code down to just one. What you're trying to do is to go through the raw data, and count the number of times each neighborhood shows up. First I'll show you a better way to do that, then I'll show you an even better way that improves on my first solution.
The better way to do that is to make use of the wonderful defaultdict class that Python provides in the collections module. defaultdict is a subclass of Python's dictionary type, which will automatically create dict entries when they're accessed for the first time. Its constructor takes a single parameter, a function which will be called with no parameters and should return the desired default value for any new item. If you had used defaultdict for your place_count dict, this code:
place_count = {}
for place in set(neighborhood):
place_count.update({place:0})
could simply become:
place_count = defaultdict(int)
What's going on here? Well, the int function (which really isn't a function, it's the constructor for the int class, but that's a bit beyond the scope of this explanation) just happens to return 0 if it's called with no parameters. So instead of writing your own function def returnzero(): return 0, you can just use the existing int function (okay, constructor). Now every time you do place_count["NEW PLACE"], the key NEW PLACE will automatically appear in your place_count dictionary, with the value 0.
Now, your counting loop needs to be modified too: it used to go over the keys of place_count, but now that place_count automatically creates its keys the first time they're accessed, you need a different source. But you still have that source in the raw data: the row["Neighborhood"] value for each row. So your for key,value in place_count.items(): loop could become:
for row in raw_data:
place = row["Neighborhood"]
place_count[place] = place_count[place] + 1
And now that you're using a defaultdict, you don't even need that first loop (the one that created the neighborhood list) at all! So we've just turned three loops into one. The final version of what I'm suggesting looks like this:
from collections import defaultdict
place_count = defaultdict(int)
for row in raw_data:
place = row["Neighborhood"]
place_count[place] = place_count[place] + 1
# Or: place_count[place] += 1
However, there's a way to improve that even more. The Counter object from the collections module is designed for just this case, and has some handy extra functionality, like the ability to retrieve the N most common items. So the final final version :-) of what I'm suggesting is:
from collections import Counter
place_count = Counter()
for row in raw_data:
place = row["Neighborhood"]
place_count[place] = place_count[place] + 1
# Or: place_count[place] += 1
That way if you need to retrieve the 5 most crime-ridden neighborhoods, you can just call place_count.most_common(5).
You can read more about Counter and defaultdict in the documentation for the collections module.

Categories

Resources