I have a dictionary:
vd = {'Klein': [1,1,1], 'Fox-Epstein': [1,-1,0], 'Ravella': [-1,0,0]}
I need a procedure that iterates over the dictionary checking which entry is most similar to one provided as an argument. I have two procedures the first is nested in the second.
def policy_compare(sen_a, sen_b, voting_dict):
a = 0
for i in range(len(voting_dict[sen_a])):
a += voting_dict[sen_a][i] * voting_dict[sen_b][i]
return a
This returns the dot product of two of the selected entries.
def most_similar(sen, voting_dict):
a = []
for i in voting_dict.keys():
score = policy_compare(sen,i, voting_dict)
a += score
return a
The second procedure is not complete for two reasons:
At the moment it is returning an error and I can't see where I am going wrong.
It just returns a list of the dot products (The one with the greatest dot product in the most similar), whereas I require the 'key' who's scalar product with the chosen sen is largest.
FULL error.
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
most_similar('Klein', vd)
File "/Users/anthony/Desktop/matrix/megalol.py", line 15, in most_similar
a += score
TypeError: 'int' object is not iterable
a is a list, score is an int. You can't add the two together. A list is iterated over in order to get the contents in order to add them to another - thus the "weird" error. It can't iterate over the int (score) in order to add it to the the list (a).
try a.append(score) to add score on to the end of it.
Here are some modifications to approach the solution you want:
vd = {'Klein': [1,1,1], 'Fox-Epstein': [1,-1,0], 'Ravella': [-1,0,0]}
def policy_compare(sen_a, sen_b, voting_dict):
a = 0
for i in range(len(voting_dict[sen_a])):
a += voting_dict[sen_a][i] * voting_dict[sen_b][i]
return a
def most_similar(sen, voting_dict):
a = []
for this_sen in voting_dict.keys():
if this_sen == sen:
continue
score = policy_compare(sen, this_sen, voting_dict)
a.append((this_sen, score))
return max(a,key=lambda sen: sen[1])
print most_similar('Klein', vd)
As someone else has said, you want to append to your list a. I've added the senator's name along with the dot product in a tuple (for each item in a) because the dictionary keys come out in arbitrary order and you won't know which one is being referred to by each entry in a otherwise. I've returned the maximum dot product entry from most_similar. Also, to avoid comparing senators with themselves you want to use continue (go back to the start of the loop for the next iteration), not pass (do nothing and continue with the current iteration).
Related
[I had problem on how to iter through dict to find a pair of similar words and output it then the delete from dict]
My intention is to generate a random output label then store it into dictionary then iter through the dictionary and store the first key in the list or some sort then iter through the dictionary to search for similar key eg Light1on and Light1off has Light1 in it and get the value for both of the key to store into a table in its respective columns.
such as
Dict = {Light1on,Light2on,Light1off...}
store value equal to Light1on the iter through the dictionary to get eg Light1 off then store its Light1on:value1 and Light1off:value2 into a table or DF with columns name: On:value1 off:value2
As I dont know how to insert the code as code i can only provide the image sry for the trouble,its my first time asking question here thx.
from collections import defaultdict
import difflib, random
olist = []
input = 10
olist1 = ['Light1on','Light2on','Fan1on','Kettle1on','Heater1on']
olist2 = ['Light2off','Kettle1off','Light1off','Fan1off','Heater1off']
events = list(range(input + 1))
for i in range(len(olist1)):
output1 = random.choice(olist1)
print(output1,'1')
olist1.remove(output1)
output2 = random.choice(olist2)
print(output2,'2')
olist2.remove(output2)
olist.append(output1)
olist.append(output2)
print(olist,'3')
outputList = {olist[i]:events[i] for i in range(10)}
print (str(outputList),'4')
# Iterating through the keys finding a pair match
for s in range(5):
for i in outputList:
if i == list(outputList)[0]:
skeys = difflib.get_close_matches(i, outputList, n=2, cutoff=0.75)
print(skeys,'5')
del outputList[skeys]
# Modified Dictionary
difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animaltion'])
['animal']
Updated: I was unable to delete the pair of similar from the list(Dictionary) after founding par in the dictionary
You're probably getting an error about a dictionary changing size during iteration. That's because you're deleting keys from a dictionary you're iterating over, and Python doesn't like that:
d = {1:2, 3:4}
for i in d:
del d[i]
That will throw:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around that, one solution is to store a list of the keys you want to delete, then delete all those keys after you've finished iterating:
keys_to_delete = []
d = {1:2, 3:4}
for i in d:
if i%2 == 1:
keys_to_delete.append(i)
for i in keys_to_delete:
del d[i]
Ta-da! Same effect, but this way avoids the error.
Also, your code above doesn't call the difflib.get_close_matches function properly. You can use print(help(difflib.get_close_matches)) to see how you are meant to call that function. You need to provide a second argument that indicates the items to which you wish to compare your first argument for possible matches.
All of that said, I have a feeling that you can accomplish your fundamental goals much more simply. If you spend a few minutes describing what you're really trying to do (this shouldn't involve any references to data types, it should just involve a description of your data and your goals), then I bet someone on this site can help you solve that problem much more simply!
I have lists of items:
['MRS_103_005_010_BG_001_v001',
'MRS_103_005_010_BG_001_v002',
'MRS_103_005_010_FG_001_v001',
'MRS_103_005_010_FG_001_v002',
'MRS_103_005_010_FG_001_v003',
'MRS_103_005_020_BG_001_v001',
'MRS_103_005_020_BG_001_v002',
'MRS_103_005_020_BG_001_v003']
I need to identify the latest version of each item and store it to a new list. Having trouble with my logic.
Based on how this has been built I believe I need to first compare the indices to each other. If I find a match I then check to see which number is greater.
I figured I first needed to do a check to see if the folder names matched between the current index and the next index. I did this by making two variables, 0 and 1, to represent the index so I could do a staggered incremental comparison of the list on itself. If the two indices matched I then needed to check the vXXX number on the end. whichever one was the highest would be appended to the new list.
I suspect that the problem lies in one copy of the list getting to an empty index before the other one does but I'm unsure of how to compensate for that.
Again, I am not a programmer by trade. Any help would be appreciated! Thank you.
# Preparing variables for filtering the folders
versions = foundVerList
verAmountTotal = len(foundVerList)
verIndex = 0
verNextIndex = 1
highestVerCount = 1
filteredVersions = []
# Filtering, this will find the latest version of each folder and store to a list
while verIndex < verAmountTotal:
try:
nextVer = (versions[verIndex])
nextVerCompare = (versions[verNextIndex])
except IndexError:
verNextIndex -= 1
if nextVer[0:24] == nextVerCompare[0:24]:
if nextVer[-3:] < nextVerCompare [-3:]:
filteredVersions.append(nextVerCompare)
else:
filteredVersions.append(nextVer)
verIndex += 1
verNextIndex += 1
My expected output is:
print filteredVersions
['MRS_103_005_010_BG_001_v002', 'MRS_103_005_010_FG_001_v003']
['MRS_103_005_020_BG_001_v003']
The actual output is:
print filteredVersions
['MRS_103_005_010_BG_001_v002', 'MRS_103_005_010_FG_001_v002',
'MRS_103_005_010_FG_001_v003']
['MRS_103_005_020_BG_001_v002', 'MRS_103_005_020_BG_001_v003']
During the with loop I am using os.list on each folder referenced via verIndex. I believe the problem is that a list is being generated for every folder that is searched but I want all the searches to be combined in a single list which will THEN go through the groupby and sorted actions.
Seems like a case for itertools.groupby:
from itertools import groupby
grouped = groupby(data, key=lambda version: version.rsplit('_', 1)[0])
result = [sorted(group, reverse=True)[0] for key, group in grouped]
print(result)
Output:
['MRS_103_005_010_BG_001_v002',
'MRS_103_005_010_FG_001_v003',
'MRS_103_005_020_BG_001_v003']
This groups the entries by everything before the last underscore, which I understand to be the "item code".
Then, it sorts each group in reverse order. The elements of each group differ only by the version, so the entry with the highest version number will be first.
Lastly, it extracts the first entry from each group, and puts it back into a result list.
Try this:
text = """MRS_103_005_010_BG_001_v001
MRS_103_005_010_BG_001_v002
MRS_103_005_010_FG_001_v001
MRS_103_005_010_FG_001_v002
MRS_103_005_010_FG_001_v003
MRS_103_005_020_BG_001_v001
MRS_103_005_020_BG_001_v002
MRS_103_005_020_BG_001_v003
"""
result = {}
versions = text.splitlines()
for item in versions:
v = item.split('_')
num = int(v.pop()[1:])
name = item[:-3]
if result.get(name, 0) < num:
result[name] = num
filteredVersions = [k + str(v) for k, v in result.items()]
print(filteredVersions)
output:
['MRS_103_005_010_BG_001_v2', 'MRS_103_005_010_FG_001_v3', 'MRS_103_005_020_BG_001_v3']
I have created a code that imports data via .xlrd in two directories in Python.
Code:
import xlrd
#category.clear()
#term.clear()
book = xlrd.open_workbook("C:\Users\Koen\Google Drive\etc...etc..")
sheet = book.sheet_by_index(0)
num_rows = sheet.nrows
for i in range(1,num_rows,1):
category = {i:( sheet.cell_value(i, 0))}
term = {i:( sheet.cell_value(i, 1))}
When I open one of the two directories (category or term), it will present me with a list of values.
print(category[i])
So far, so good.
However, when I try to open an individual value
print(category["2"])
, it will consistently give me an error>>
Traceback (most recent call last):
File "testfile", line 15, in <module>
print(category["2"])
KeyError: '2'
The key's are indeed numbered (as determined by i).
I've already tried to []{}""'', etc etc. Nothing works.
As I need those values later on in the code, I would like to know what the cause of the key-error is.
Thanks in advance for taking a look!
First off, you are reassigning category and term in every iteration of the for loop, this way the dictionary will always have one key at each iteration, finishing with the last index, so if our sheet have 100 lines, the dict will only have the key 99. To overcome this, you need to define the dictionary outside the loop and assign the keys inside the loop, like following:
category = {}
term = {}
for i in range(1, num_rows, 1):
category[i] = (sheet.cell_value(i, 0))
term[i] = (sheet.cell_value(i, 1))
And second, the way you are defining the keys using the for i in range(1, num_rows, 1):, they are integers, so you have to access the dictionary keys like so category[1]. To use string keys you need to cast them with category[str(i)] for example.
I hope have clarifying the problem.
I need to write a function that accepts a dictionary as the inventory and also a product_list of (name, number) pairs which indicate when we should update the inventory of that product by adding a certain number to it which could be a negative number.
Once a product is mentioned for the first time it is added to the dictionary and when its count reaches zero it shold remain in the dictionary. If the count ever becomes negative I need to raise a value error.
Example:
d = {"apple":50, "pear":30, "orange":25}
ps = [("apple",20),("pear",-10),("grape",18)]
shelve(d,ps)
d
{'pear': 20, 'grape': 18, 'orange': 25, 'apple': 70}
shelve(d,[("apple",-1000)])
Traceback (most recent call last):
ValueError: negative amount for apple
My code is giving either an unexpected EOF error or invalid syntax depending on if I include the last print line. It is definitely not currently accomplishing the goal but I believe this is the format and somewhat the logic I'll need to solve this. I need the function to print 'negative amount for x' where x is the fruit that is negative. Any help on this is appreciated
Code:
def shelve(inventory,product_list):
count = 0
try:
for x in product_list:
if x == True:
product_list.append(x)
count += key
else:
return product_list
except ValueError:
print ('negative amount for (product)')
print "hello program starts here"
d = {"apple":50, "pear":30, "orange":25}
ps = [("apple",20),("pear",-10),("grape",18)]
shelve(d,ps)
the important part of your task is to split your problem in sub problems. Using the dict and list data structure is mainly based on iterating over those data structures. Start simple and do one step at a time.So one way to solve the problem could be:
1.) Iterate over the product list (you can print the items to see what is happening). This will be the product loop.
for x in ps:
print x
Check how you can access the lists elements with e.g. changing print x to print x[0] or x[1]
2.) Now for every product in the product loop, you need to iterate the inventory and set the inventory to the corresponding values. Start by just iterating the inventory and print its contents. Check out how it works before doing more complicated stuff, play around with it. ^^-d
(I just noticed there is a simpler solution than iterating, since its a dict, you will know what to do)
3.) Now add the Value error and Exception stuff
Hope this helps
I am trying to sort a file of sequences according to a certain parameter. The data looks as follows:
ID1 ID2 32
MVKVYAPASSANMSVGFDVLGAAVTP ...
ID1 ID2 18
MKLYNLKDHNEQVSFAQAVTQGLGKN ...
....
There are about 3000 sequences like this, i.e. the first line contains two ID field and one rank field (the sorting key) while the second one contains the sequence. My approach is to open the file, convert the file object to a list object, separate the annotation line (ID1, ID2, rank) from the actual sequence (annotation lines always occur on even indices, while sequence lines always occur on odd indices), merge them into a dictionary and sort the dictionary using the rank field. The code reads like so:
#!/usr/bin/python
with open("unsorted.out","rb") as f:
f = f.readlines()
assert type(f) == list, "ERROR: file object not converted to list"
annot=[]
seq=[]
for i in range(len(f)):
# IDs
if i%2 == 0:
annot.append(f[i])
# Sequences
elif i%2 != 0:
seq.append(f[i])
# Make dictionary
ids_seqs = {}
ids_seqs = dict(zip(annot,seq))
# Solub rankings are the third field of the annot list, i.e. annot[i].split()[2]
# Use this index notation to rank sequences according to solubility measurements
sorted_niwa = sorted(ids_seqs.items(), key = lambda val: val[0].split()[2], reverse=False)
# Save to file
with open("sorted.out","wb") as out:
out.write("".join("%s %s" % i for i in sorted_niwa))
The problem I have encountered is that when I open the sorted file to inspect manually, as I scroll down I notice that some sequences have been wrongly sorted. For example, I see the rank 9 placed after rank 89. Up until a certain point the sorting is correct, but I don't understand why it hasn't worked throughout.
Many thanks for any help!
Sounds like you're comparing strings instead of numbers. "9" > "89" because the character '9' comes lexicographically after the character '8'. Try converting to integers in your key.
sorted_niwa = sorted(ids_seqs.items(), key = lambda val: int(val[0].split()[2]), reverse=False)