In my LIST(not dictionary) I have these strings:
"K:60",
"M:37",
"M_4:47",
"M_5:89",
"M_6:91",
"N:15",
"O:24",
"P:50",
"Q:50",
"Q_7:89"
in output I need to have
"K:60",
"M_6:91",
"N:15",
"O:24",
"P:50",
"Q_7:89"
What is the possible decision?
Or even maybe, how to take tag with the maximum among strings with the same tag.
Use re.split and list comprehension as shown below. Use the fact that when the dictionary dct is created, only the last value is kept for each repeated key.
import re
lst = [
"K:60",
"M:37",
"M_4:47",
"M_5:89",
"M_6:91",
"N:15",
"O:24",
"P:50",
"Q:50",
"Q_7:89"
]
dct = dict([ (re.split(r'[:_]', s)[0], s) for s in lst])
lst_uniq = list(dct.values())
print(lst_uniq)
# ['K:60', 'M_6:91', 'N:15', 'O:24', 'P:50', 'Q_7:89']
Probably far from the cleanest but here is a method quite easy to understand.
l = ["K:60", "M:37", "M_4:47", "M_5:89", "M_6:91", "N:15", "O:24", "P:50", "Q:50", "Q_7:89"]
reponse = []
val = []
complete_val = []
for x in l:
if x[0] not in reponse:
reponse.append(x[0])
complete_val.append(x.split(':')[0])
val.append(int(x.split(':')[1]))
elif int(x.split(':')[1]) > val[reponse.index(x[0])]:
val[reponse.index(x[0])] = int(x.split(':')[1])
for x in range(len(complete_val)):
print(str(complete_val[x]) + ":" + str(val[x]))
K:60
M:91
N:15
O:24
P:50
Q:89
I do not see any straight-forward technique. Other than iterating on entire thing and computing yourself, I do not see if any built-in can be used. I have written this where you do not require your values to be sorted in your input.
But I like the answer posted by Timur Shtatland, you can make us of that if your values are already sorted in input.
intermediate = {}
for item in a:
key, val = item.split(':')
key = key.split('_')[0]
val = int(val)
if intermediate.get(key, (float('-inf'), None))[0] < val:
intermediate[key] = (val, item)
ans = [x[1] for x in intermediate.values()]
print(ans)
which gives:
['K:60', 'M_6:91', 'N:15', 'O:24', 'P:50', 'Q_7:89']
How do I code a function in python which can:
iterate through a list of word strings which may contain duplicate words and referencing to a dictionary,
find the word with the highest absolute sum, and
output it along with the corresponding absolute value.
The function also has to ignore words which are not in the dictionary.
For example,
Assume the function is called H_abs_W().
Given the following list and dict:
list_1 = ['apples','oranges','pears','apples']
Dict_1 = {'apples':5.23,'pears':-7.62}
Then calling the function as:
H_abs_W(list_1,Dict_1)
Should give the output:
'apples',10.46
EDIT:
I managed to do it in the end with the code below. Looking over the answers, turns out I could have done it in a shorter fashion, lol.
def H_abs_W(list_1,Dict_1):
freqW = {}
for char in list_1:
if char in freqW:
freqW[char] += 1
else:
freqW[char] = 1
ASum_W = 0
i_word = ''
for a,b in freqW.items():
x = 0
d = Dict_1.get(a,0)
x = abs(float(b)*float(d))
if x > ASum_W:
ASum_W = x
i_word = a
return(i_word,ASum_W)
list_1 = ['apples','oranges','pears','apples']
Dict_1 = {'apples':5.23,'pears':-7.62}
d = {k:0 for k in list_1}
for x in list_1:
if x in Dict_1.keys():
d[x]+=Dict_1[x]
m = max(Dict_1, key=Dict_1.get)
print(m,Dict_1[m])
try this,
key, value = sorted(Dict_1.items(), key = lambda x : x[1], reverse=True)[0]
print(f"{key}, {list_1.count(key) * value}")
# apples, 10.46
you can use Counter to calculate the frequency(number of occurrences) of each item in the list.
max(counter.values()) will give us the count of maximum occurring element
max(counter, key=counter.get) will give the which item in the list is
associated with that highest count.
========================================================================
from collections import Counter
def H_abs_W(list_1, Dict_1):
counter = Counter(list_1)
count = max(counter.values())
item = max(counter, key=counter.get)
return item, abs(count * Dict_1.get(item))
I have question, where I need to implement ladder problem with different logic.
In each step, the player must either add one letter to the word
from the previous step, or take away one letter, and then rearrange the letters to make a new word.
croissant(-C) -> arsonist(-S) -> aroints(+E)->notaries(+B)->baritones(-S)->baritone
The new word should make sense from a wordList.txt which is dictionary of word.
Dictionary
My code look like this,
where I have calculated first the number of character removed "remove_list" and added "add_list". Then I have stored that value in the list.
Then I read the file, and stored into the dictionary which the sorted pair.
Then I started removing and add into the start word and matched with dictionary.
But now challenge is, some word after deletion and addition doesn't match with the dictionary and it misses the goal.
In that case, it should backtrack to previous step and should add instead of subtracting.
I am looking for some sort of recursive function, which could help in this or complete new logic which I could help to achieve the output.
Sample of my code.
start = 'croissant'
goal = 'baritone'
list_start = map(list,start)
list_goal = map(list, goal)
remove_list = [x for x in list_start if x not in list_goal]
add_list = [x for x in list_goal if x not in list_start]
file = open('wordList.txt','r')
dict_words = {}
for word in file:
strip_word = word.rstrip()
dict_words[''.join(sorted(strip_word))]=strip_word
file.close()
final_list = []
flag_remove = 0
for i in remove_list:
sorted_removed_list = sorted(start.replace(''.join(map(str, i)),"",1))
sorted_removed_string = ''.join(map(str, sorted_removed_list))
if sorted_removed_string in dict_words.keys():
print dict_words[sorted_removed_string]
final_list.append(sorted_removed_string)
flag_remove = 1
start = sorted_removed_string
print final_list
flag_add = 0
for i in add_list:
first_character = ''.join(map(str,i))
sorted_joined_list = sorted(''.join([first_character, final_list[-1]]))
sorted_joined_string = ''.join(map(str, sorted_joined_list))
if sorted_joined_string in dict_words.keys():
print dict_words[sorted_joined_string]
final_list.append(sorted_joined_string)
flag_add = 1
sorted_removed_string = sorted_joined_string
Recursion-based backtracking isn't a good idea for search problem of this sort. It blindly goes downward in search tree, without exploiting the fact that words are almost never 10-12 distance away from each other, causing StackOverflow (or recursion limit exceeded in Python).
The solution here uses breadth-first search. It uses mate(s) as helper, which given a word s, finds all possible words we can travel to next. mate in turn uses a global dictionary wdict, pre-processed at the beginning of the program, which for a given word, finds all it's anagrams (i.e re-arrangement of letters).
from queue import Queue
words = set(''.join(s[:-1]) for s in open("wordsEn.txt"))
wdict = {}
for w in words:
s = ''.join(sorted(w))
if s in wdict: wdict[s].append(w)
else: wdict[s] = [w]
def mate(s):
global wdict
ans = [''.join(s[:c]+s[c+1:]) for c in range(len(s))]
for c in range(97,123): ans.append(s + chr(c))
for m in ans: yield from wdict.get(''.join(sorted(m)),[])
def bfs(start,goal,depth=0):
already = set([start])
prev = {}
q = Queue()
q.put(start)
while not q.empty():
cur = q.get()
if cur==goal:
ans = []
while cur: ans.append(cur);cur = prev.get(cur)
return ans[::-1] #reverse the array
for m in mate(cur):
if m not in already:
already.add(m)
q.put(m)
prev[m] = cur
print(bfs('croissant','baritone'))
which outputs: ['croissant', 'arsonist', 'rations', 'senorita', 'baritones', 'baritone']
I have a list with the following structure;
[('0','927','928'),('2','693','694'),('2','742','743'),('2','776','777'),('2','804','805'),
('2','987','988'),('2','997','998'),('2','1019','1020'),
('2','1038','1039'),('2','1047','1048'),('2','1083','1084'),('2','659','660'),
('2','677','678'),('2','743','744'),('2','777','778'),('2','805','806'),('2','830','831')
the 1st number is an id, the second a position of a word and the third number is the position of a second word. What I need to do and am struggling with is finding sets of words next to each other.
These results are given for searches of 3 words, so there is the positions of word 1 with word 2 and positions of word 2 with word 3. For example ;
I run the phrase query "women in science" I then get the values given in the list above, so ('2','776','777') is the results for 'women in' and ('2','777','778') is the results for 'in science'.
I need to find a way to match these results up, so for every document it groups the words together depending on amounts of word in the query. (so if there is 4 words in the query there will be 3 results that need to be matched together).
Is this possible?
You need to quickly find word info by its position. Create a dictionary keyed by word position:
# from your example; I wonder why you use strings and not numbers.
positions = [('0','927','928'),('2','693','694'),('2','742','743'),('2','776','777'),('2','804','805'),
('2','987','988'),('2','997','998'),('2','1019','1020'),
('2','1038','1039'),('2','1047','1048'),('2','1083','1084'),('2','659','660'),
('2','677','678'),('2','743','744'),('2','777','778'),('2','805','806'),('2','830','831')]
# create the dictionary
dict_by_position = {w_pos:(w_id, w_next) for (w_id, w_pos, w_next) in positions}
Now it's a piece of cake to follow chains:
>>> dict_by_position['776']
('2', '777')
>>> dict_by_position['777']
('2', '778')
Or programmatically:
def followChain(start, position_dict):
result = []
scanner = start
while scanner in position_dict:
next_item = position_dict[scanner]
result.append(next_item)
unused_id, scanner = next_item # unpack the (id, next_position)
return result
>>> followChain('776', dict_by_position)
[('2', '777'), ('2', '778')]
Finding all chains that are not subchains of each other:
seen_items = set()
for start in dict_by_position:
if start not in seen_items:
chain = followChain(start, dict_by_position)
seen_items.update(set(chain)) # mark all pieces of chain as seen
print chain # or do something reasonable instead
The following will do what you're asking, as I understand it - it's not the prettiest output in the world, and I think that if possible you should be using numbers if numbers are what you're trying to work with.
There are probably more elegant solutions, and simplifications that could be made to this:
positions = [('0','927','928'),('2','693','694'),('2','742','743'),('2','776','777'),('2','804','805'),
('2','987','988'),('2','997','998'),('2','1019','1020'),
('2','1038','1039'),('2','1047','1048'),('2','1083','1084'),('2','659','660'),
('2','677','678'),('2','743','744'),('2','777','778'),('2','805','806'),('2','830','831')]
sorted_dict = {}
sorted_list = []
grouped_list = []
doc_ids = []
def sort_func(positions):
for item in positions:
if item[0] not in doc_ids:
doc_ids.append(item[0])
for doc_id in doc_ids:
sorted_set = []
for item in positions:
if item[0] != doc_id:
continue
else:
if item[1] not in sorted_set:
sorted_set.append(item[1])
if item[2] not in sorted_set:
sorted_set.append(item[2])
sorted_list = sorted(sorted_set)
sorted_dict[doc_id] = sorted_list
for k in sorted_dict:
group = []
grouped_list = []
for i in sorted_dict[k]:
try:
if int(i)-1 == int(sorted_dict[k][sorted_dict[k].index(i)-1]):
group.append(i)
else:
if group != []:
grouped_list.append(group)
group = [i]
except IndexError:
group.append(i)
continue
if grouped_list != []:
sorted_dict[k] = grouped_list
else:
sorted_dict[k] = group
return sorted_dict
My output for the above was:
{'0': ['927', '928'], '2': [['1019', '1020'], ['1038', '1039'], ['1047', '1048'], ['1083', '1084'], ['659', '660'], ['677', '678'], ['693', '694'], ['742', '743', '744'], ['776', '777', '778'], ['804', '805', '806'], ['830', '831'], ['987', '988']]}
I have a list containing strings as ['Country-Points'].
For example:
lst = ['Albania-10', 'Albania-5', 'Andorra-0', 'Andorra-4', 'Andorra-8', ...other countries...]
I want to calculate the average for each country without creating a new list. So the output would be (in the case above):
lst = ['Albania-7.5', 'Andorra-4.25', ...other countries...]
Would realy appreciate if anyone can help me with this.
EDIT:
this is what I've got so far. So, "data" is actually a dictionary, where the keys are countries and the values are list of other countries points' to this country (the one as Key). Again, I'm new at Python so I don't realy know all the built-in functions.
for key in self.data:
lst = []
index = 0
score = 0
cnt = 0
s = str(self.data[key][0]).split("-")[0]
for i in range(len(self.data[key])):
if s in self.data[key][i]:
a = str(self.data[key][i]).split("-")
score += int(float(a[1]))
cnt+=1
index+=1
if i+1 != len(self.data[key]) and not s in self.data[key][i+1]:
lst.append(s + "-" + str(float(score/cnt)))
s = str(self.data[key][index]).split("-")[0]
score = 0
self.data[key] = lst
itertools.groupby with a suitable key function can help:
import itertools
def get_country_name(item):
return item.split('-', 1)[0]
def get_country_value(item):
return float(item.split('-', 1)[1])
def country_avg_grouper(lst) :
for ctry, group in itertools.groupby(lst, key=get_country_name):
values = list(get_country_value(c) for c in group)
avg = sum(values)/len(values)
yield '{country}-{avg}'.format(country=ctry, avg=avg)
lst[:] = country_avg_grouper(lst)
The key here is that I wrote a function to do the change out of place and then I can easily make the substitution happen in place by using slice assignment.
I would probabkly do this with an intermediate dictionary.
def country(s):
return s.split('-')[0]
def value(s):
return float(s.split('-')[1])
def country_average(lst):
country_map = {}|
for point in lst:
c = country(pair)
v = value(pair)
old = country_map.get(c, (0, 0))
country_map[c] = (old[0]+v, old[1]+1)
return ['%s-%f' % (country, sum/count)
for (country, (sum, count)) in country_map.items()]
It tries hard to only traverse the original list only once, at the expense of quite a few tuple allocations.