I have created a code that imports data via .xlrd in two directories in Python.
Code:
import xlrd
#category.clear()
#term.clear()
book = xlrd.open_workbook("C:\Users\Koen\Google Drive\etc...etc..")
sheet = book.sheet_by_index(0)
num_rows = sheet.nrows
for i in range(1,num_rows,1):
category = {i:( sheet.cell_value(i, 0))}
term = {i:( sheet.cell_value(i, 1))}
When I open one of the two directories (category or term), it will present me with a list of values.
print(category[i])
So far, so good.
However, when I try to open an individual value
print(category["2"])
, it will consistently give me an error>>
Traceback (most recent call last):
File "testfile", line 15, in <module>
print(category["2"])
KeyError: '2'
The key's are indeed numbered (as determined by i).
I've already tried to []{}""'', etc etc. Nothing works.
As I need those values later on in the code, I would like to know what the cause of the key-error is.
Thanks in advance for taking a look!
First off, you are reassigning category and term in every iteration of the for loop, this way the dictionary will always have one key at each iteration, finishing with the last index, so if our sheet have 100 lines, the dict will only have the key 99. To overcome this, you need to define the dictionary outside the loop and assign the keys inside the loop, like following:
category = {}
term = {}
for i in range(1, num_rows, 1):
category[i] = (sheet.cell_value(i, 0))
term[i] = (sheet.cell_value(i, 1))
And second, the way you are defining the keys using the for i in range(1, num_rows, 1):, they are integers, so you have to access the dictionary keys like so category[1]. To use string keys you need to cast them with category[str(i)] for example.
I hope have clarifying the problem.
Related
[I had problem on how to iter through dict to find a pair of similar words and output it then the delete from dict]
My intention is to generate a random output label then store it into dictionary then iter through the dictionary and store the first key in the list or some sort then iter through the dictionary to search for similar key eg Light1on and Light1off has Light1 in it and get the value for both of the key to store into a table in its respective columns.
such as
Dict = {Light1on,Light2on,Light1off...}
store value equal to Light1on the iter through the dictionary to get eg Light1 off then store its Light1on:value1 and Light1off:value2 into a table or DF with columns name: On:value1 off:value2
As I dont know how to insert the code as code i can only provide the image sry for the trouble,its my first time asking question here thx.
from collections import defaultdict
import difflib, random
olist = []
input = 10
olist1 = ['Light1on','Light2on','Fan1on','Kettle1on','Heater1on']
olist2 = ['Light2off','Kettle1off','Light1off','Fan1off','Heater1off']
events = list(range(input + 1))
for i in range(len(olist1)):
output1 = random.choice(olist1)
print(output1,'1')
olist1.remove(output1)
output2 = random.choice(olist2)
print(output2,'2')
olist2.remove(output2)
olist.append(output1)
olist.append(output2)
print(olist,'3')
outputList = {olist[i]:events[i] for i in range(10)}
print (str(outputList),'4')
# Iterating through the keys finding a pair match
for s in range(5):
for i in outputList:
if i == list(outputList)[0]:
skeys = difflib.get_close_matches(i, outputList, n=2, cutoff=0.75)
print(skeys,'5')
del outputList[skeys]
# Modified Dictionary
difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animaltion'])
['animal']
Updated: I was unable to delete the pair of similar from the list(Dictionary) after founding par in the dictionary
You're probably getting an error about a dictionary changing size during iteration. That's because you're deleting keys from a dictionary you're iterating over, and Python doesn't like that:
d = {1:2, 3:4}
for i in d:
del d[i]
That will throw:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around that, one solution is to store a list of the keys you want to delete, then delete all those keys after you've finished iterating:
keys_to_delete = []
d = {1:2, 3:4}
for i in d:
if i%2 == 1:
keys_to_delete.append(i)
for i in keys_to_delete:
del d[i]
Ta-da! Same effect, but this way avoids the error.
Also, your code above doesn't call the difflib.get_close_matches function properly. You can use print(help(difflib.get_close_matches)) to see how you are meant to call that function. You need to provide a second argument that indicates the items to which you wish to compare your first argument for possible matches.
All of that said, I have a feeling that you can accomplish your fundamental goals much more simply. If you spend a few minutes describing what you're really trying to do (this shouldn't involve any references to data types, it should just involve a description of your data and your goals), then I bet someone on this site can help you solve that problem much more simply!
I'm new in Python. I'm trying to a write a brief script. I want to run a loop in which I have to read many files and for each file run a command.In particular, I want to do a calculation throught the the two rows of every file and return an output whith a name which is refered to the relative file.
I was able to load the files in a list ('work'). I tried to write the second single loop for the calculation that I have to do whith one of the file in the list and it runs correctly. THe problem is that I'm not able to iterate it over all the files and obtain each 'integr' value from the relative file.
Let me show what I tried to do:
import numpy as np
#I'm loading the files that contain the values whith which I want to do my calculation in a loop
work = {}
for i in range(0,100):
work[i] = np.loadtxt('work{}.txt'.format(i), float).T
#Now I'm trying to write a double loop in which I want to iterate the second loop (the calculation) over the files (that don't have the same length) in the list
integr = 0
for k in work:
for i in range(1, len(k[1,:])):
integr = integr + k[1,i]*(k[0,i] - k[0,i-1])
#I would like to print every 'integr' which come from the calculation over each file
print(integr)
When I try to run this, I obtain this message error:
Traceback (most recent call last):
File "lavoro.py", line 11, in <module>
for i in range(1, len(k[1,:])):
TypeError: 'int' object has no attribute '__getitem__'
Thank you in advance.
I am a bit guessing, but if I understood correctly, you want work to be a list and not a dictionary. Or maybe you don't want it, but surely you can use a list instead of a dictionary, given the context.
This is how you can create your work list:
work = []
for i in range(0,100):
work.append(np.loadtxt('work{}.txt'.format(i), float).T)
Or using the equivalent list comprehension of the above loop (usually the list comprehension is faster):
work = [np.loadtxt('work{}.txt'.format(i), float).T for i in range(100)]
Now you can loop over the work list to do your calculations (I assume they are correct, no way for me to check this):
for k in work:
integr = 0
for i in range(1, len(k[1,:])):
integr = integr + k[1,i]*(k[0,i] - k[0,i-1])
Note that I moved integr = 0 inside the loop, so that is reinitalized to 0 for each file, otherwise each inner loop will add to the result of the previous inner loops.
However if that was the desided behaviour, move integr = 0 outside the loop as your original code.
Guessing from the context you wanted:
for k in work.values():
iterating over dictionary produces only keys, not values.
I am trying to figure out the most efficient way of finding similar values of a specific cell in a specified column(not all columns) in an excel .xlsx document. The code I have currently assumes all of the strings are unsorted. However the file I am using and the files I will be using all have strings sorted from A-Z. So instead of doing a linear search I wonder what other search algorithm I could use as well as being able to fix my coding eg(binary search etc).
So far I have created a function: find(). Before the function runs the program takes in a value from the user's input that then gets set as the sheet name. I print out all available sheet names in the excel doc just to help the user. I created an empty array results[] to store well....the results. I created a for loop that iterates through only column A because I only want to iterate through a custom column. I created a variable called start that is the first coordinate in column A eg(A1 or A400) this will change depending on the iteration the loop is on. I created a variable called next that will get compared with the start. Next is technically just start + 1, however since I cant add +1 to a string I concatenate and type cast everything so that the iteration becomes a range from A1-100 or however many cells are in column A. My function getVal() gets called with two parameters, the coordinate of the cell and the worksheet we are working from. The value that is returned from getVal() is also passed inside my function Similar() which is just a function that calls SequenceMatcher() from difflib. Similar just returns the percentage of how similar two strings are. Eg. similar(hello, helloo) returns int 90 or something like that. Once the similar function is called if the strings are above 40 percent similar appends the coordinates into the results[] array.
def setSheet(ws):
sheet = wb[ws]
return sheet
def getVal(coordinate, worksheet):
value = worksheet[coordinate].value
return value
def similar(first, second):
percent = SequenceMatcher(None, first, second).ratio() * 100
return percent
def find():
column = "A"
print("\n")
print("These are all available sheets: ", wb.sheetnames)
print("\n")
name = input("What sheet are we working out of> ")
results = []
ws = setSheet(name)
for i in range(1, ws.max_row):
temp = str(column + str(i))
x = ws[temp]
start = ws[x].coordinate
y = str(column + str(i + 1))
next = ws[y].coordinate
if(similar(getVal(start,ws), getVal(next,ws)) > 40):
results.append(getVal(start))
return results
This is some nasty looking code so I do apologize in advance. The expected results should just be a list of strings that are "similar".
My little program uses the Riot API (game), where I put players into either the 'ally team' or the 'enemy team'. Since the data comes from JSON, there are lots of lists and dicts are involved, and my issues probably stems from there, though I have not been able to find out where.
Here is the part that causes the issue:
first_game_test = game_list[0]
summ_team_ID = first_game_test["teamId"]
summoners_in_game = first_game_test["fellowPlayers"]
ally_team = []
enemy_team = []
for i in range(len(summoners_in_game)):
for name, value in summoners_in_game[i].iteritems():
if summoners_in_game[i]["teamId"] == summ_team_ID:
#if summoners_in_game[i] not in ally_team:
summoner_name = idtosummoner.idToSummonerName(summoners_in_game[i]['summonerId'])
summoner_champ = champion_id.champIdGet(summoners_in_game[i]['championId'])
ally_team.append({summoner_name: summoner_champ})
else:
#if summoners_in_game[i] not in enemy_team:
enemy_team.append(summoners_in_game[i])
The idtosummoner and champion_id modules have been checked multiple times; I'm quite certain that the issue does not stem from there.
As you can see, I used a simple duplicate check fix (commented out). It started to mess with further coding, however: the summoner_name, and summoner_champ variables cause an error at the 3th or 4th index (I haven't added the lines to else yet, since I want to fix the issue first).
The console output shows the following:
PS C:\Users\ptnfolder> python matchhistory.py
Nussen
Nussen
Nussen
kimbb
Traceback (most recent call last):
File "matchhistory.py", line 67, in <module>
matchHistory("thebirdistheword")
File "matchhistory.py", line 39, in matchHistory
print idtosummoner.idToSummonerName(summoners_in_game[i].get('summonerId'))
File "C:\Users\ptnfolder\idtosummoner.py", line 10, in idToSummonerName
champ_name_dict = json_data[str(summID)]
KeyError: '29716673'
The strange part is that the KeyError actually should resolve to 'kimbb' - since the for loop somehow triples every entry -; it works once, and then the program crashes.
You are looping over the keys and values of the dictionaries in a list:
for i in range(len(summoners_in_game)):
for name, value in summoners_in_game[i].iteritems():
so for each key-value pair, you execute your loop body. In your loop body, you test a specific key:
if summoners_in_game[i]["teamId"] == summ_team_ID:
so for each key in the dictionary, you test if the value for the 'teamId' key matches summ_team_ID.
This executes as many times as there are keys in the dictionary, but you only want to test one of the keys.
Just remove the loop over the key-value pairs:
for i in range(len(summoners_in_game)):
if summoners_in_game[i]["teamId"] == summ_team_ID:
summoner_name = idtosummoner.idToSummonerName(summoners_in_game[i]['summonerId'])
summoner_champ = champion_id.champIdGet(summoners_in_game[i]['championId'])
ally_team.append({summoner_name: summoner_champ})
else:
enemy_team.append(summoners_in_game[i])
Rather than use indices generated by range(), you could just loop over the list directly, and not have to keep indexing:
for team in summoners_in_game:
if team["teamId"] == summ_team_ID:
summoner_name = idtosummoner.idToSummonerName(team['summonerId'])
summoner_champ = champion_id.champIdGet(team['championId'])
ally_team.append({summoner_name: summoner_champ})
else:
enemy_team.append(team)
I'm trying to get the dictionary (which the first part of the program generates) to write to a csv so that I can perform further operations on the data in excel. I realize the code isn't efficient but at this point I'd just like it to work. I can deal with speeding it up later.
import csv
import pprint
raw_data = csv.DictReader(open("/Users/David/Desktop/crimestats/crimeincidentdata.csv", "r"))
neighborhood = []
place_count = {}
stats = []
for row in raw_data:
neighborhood.append(row["Neighborhood"])
for place in set(neighborhood):
place_count.update({place:0})
for key,value in place_count.items():
for place in neighborhood:
if key == place:
place_count[key] = place_count[key]+1
for key in place_count:
stats.append([{"Location":str(key)},{"Volume":str(place_count[key])}])
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(stats)
The program is still running fine here as is evident by the pprint output
[ [{'Location': 'LINNTON'}, {'Volume': '109'}],
[{'Location': 'SUNDERLAND'}, {'Volume': '118'}],
[{'Location': 'KENTON'}, {'Volume': '715'}]
This is where the error is definitely happening. The program writes the headers to the csv just fine then throws the ValueError.
fieldnames = ['Location', 'Volume']
with open('/Users/David/Desktop/crimestats/localdata.csv', 'w', newline='') as output_file:
csvwriter = csv.DictWriter(output_file, delimiter=',', fieldnames=fieldnames, dialect='excel')
csvwriter.writeheader()
for row in stats:
csvwriter.writerow(row)
output_file.close()
I've spent quite a bit of time searching for this problem but none of the suggestions I have attempted to use have worked. I figure I must me missing something so I'd really appreciate any and all help.
Traceback (most recent call last):
File "/Users/David/Desktop/crimestats/statsreader.py", line 34, in <module>
csvwriter.writerow(row)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/csv.py", line 153, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/csv.py", line 149, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: {'Location': 'SABIN'}, {'Volume': '247'}
I believe your problem is here:
for key in place_count:
stats.append([{"Location":str(key)},{"Volume":str(place_count[key])}])
This is creating a list of two dictionaries. The first has only a "Location" key, and the second has only a "Volume" key. However, the csv.DictWriter objects are expecting a single dictionary per row, with all the keys in the dictionary. Change that code snippet to the following and it should work:
for key in place_count:
stats.append({"Location": str(key), "Volume": str(place_count[key])})
That should take care of the errors you're seeing.
Now, as for why the error message is complaining about fields not in fieldnames, which completely misled you away from the real problem you're having: the writerow() function expects to get a dictionary as its row parameter, but you're passing it a list. The result is confusion: it iterates over the dict in a for loop expecting to get the dict's keys (because that's what you get when you iterate over a dict in Python), and it compares those keys to the values in the fieldnames list. What it's expecting to see is:
"Location"
"Volume"
in either order (because a Python dict makes no guarantees about which order it will return its keys). The reason why they want you to pass in a fieldnames list is so that the fields can be written to the CSV in the correct order. However, because you're passing in a list of two dictionaries, when it iterates over the row parameter, it gets the following:
{'Location': 'SABIN'}
{'Volume': '247'}
Now, the dictionary {'Location': 'SABIN'} does not equal the string "Location", and the dictionary {'Volume': '247'} does not equal the string "Volume", so the writerow() function thinks it's found dict keys that aren't in the fieldnames list you supplied, and it throws that exception. What was really happening was "you passed me a list of two dicts-of-one-key, when I expected a single dict-with-two-keys", but the function wasn't written to check for that particular mistake.
Now I'll mention a couple things you could do to speed up your code. One thing that will help quite a bit is to reduce those three for loops at the start of your code down to just one. What you're trying to do is to go through the raw data, and count the number of times each neighborhood shows up. First I'll show you a better way to do that, then I'll show you an even better way that improves on my first solution.
The better way to do that is to make use of the wonderful defaultdict class that Python provides in the collections module. defaultdict is a subclass of Python's dictionary type, which will automatically create dict entries when they're accessed for the first time. Its constructor takes a single parameter, a function which will be called with no parameters and should return the desired default value for any new item. If you had used defaultdict for your place_count dict, this code:
place_count = {}
for place in set(neighborhood):
place_count.update({place:0})
could simply become:
place_count = defaultdict(int)
What's going on here? Well, the int function (which really isn't a function, it's the constructor for the int class, but that's a bit beyond the scope of this explanation) just happens to return 0 if it's called with no parameters. So instead of writing your own function def returnzero(): return 0, you can just use the existing int function (okay, constructor). Now every time you do place_count["NEW PLACE"], the key NEW PLACE will automatically appear in your place_count dictionary, with the value 0.
Now, your counting loop needs to be modified too: it used to go over the keys of place_count, but now that place_count automatically creates its keys the first time they're accessed, you need a different source. But you still have that source in the raw data: the row["Neighborhood"] value for each row. So your for key,value in place_count.items(): loop could become:
for row in raw_data:
place = row["Neighborhood"]
place_count[place] = place_count[place] + 1
And now that you're using a defaultdict, you don't even need that first loop (the one that created the neighborhood list) at all! So we've just turned three loops into one. The final version of what I'm suggesting looks like this:
from collections import defaultdict
place_count = defaultdict(int)
for row in raw_data:
place = row["Neighborhood"]
place_count[place] = place_count[place] + 1
# Or: place_count[place] += 1
However, there's a way to improve that even more. The Counter object from the collections module is designed for just this case, and has some handy extra functionality, like the ability to retrieve the N most common items. So the final final version :-) of what I'm suggesting is:
from collections import Counter
place_count = Counter()
for row in raw_data:
place = row["Neighborhood"]
place_count[place] = place_count[place] + 1
# Or: place_count[place] += 1
That way if you need to retrieve the 5 most crime-ridden neighborhoods, you can just call place_count.most_common(5).
You can read more about Counter and defaultdict in the documentation for the collections module.