Unclear error when I try to reverse dictionary python - python

This is my code:
my_dict = {'Anthony Hopkins': ' Hannibal, The Edge, Meet Joe Black, Proof', 'Julia Roberts': ' Pretty Woman, Oceans Eleven, Runaway Bride', 'Salma Hayek': ' Desperado, Wild Wild West', 'Gwyneth Paltrow': ' Shakespeare in Love, Bounce, Proof', 'Meg Ryan': ' You have got mail, Sleepless in Seattle', 'Russell Crowe': ' Gladiator, A Beautiful Mind, Cinderella Man, American Gangster' .....}
dictrev={}
for i in mydict:
for j in mydict[i] :
if j not in dictrev:
dictrev.setdefault(j, []).append(i)
print (dictrev)
The problem is that when I debug I saw that the program reads only one character values (this line for j in mydict[i] : and I need the first value (there are multiple values).
Any suggestions what is the problem
Thank you very much for your help

Could you please format your code like this:
do whatever
You do that by typing enter two times, then for each line of code indenting four spaces. To type normally after that, start a new line and do not type the four spaces at the start of it.
If I understand what you are asking, you want to swap the key and value of the dictionary, and you are getting an error while doing so. I cannot read your unformatted code (no offense), so I will provide a dictionary swapping technique that works for me.
my_dict = {1: "bob", 2: "bill", 3: "rob"}
new_dict = {}
for key in my_dict:
new_key = my_dict[key]
new_value = key
new_dict.update({new_key:new_value})
print(new_dict)
This code works by having the original dictionary, my_dict and the uncompleted reversed dictionary, new_dict. It iterates through my_dict, which only provides the key, and using that key, it finds the value. The value that we want to be a key is assigned to new_key and the key that we want to be a value is assigned to new_value. It then updates the reversed dictionary with the new key/value. The final line prints the new, reversed dictionary. If you want to set my_dict to the reversed dict, use my_dict = new_dict. I hope this answers your question.

As has been pointed out in the comments, the values in your dict are strings, thus iterating over them will produce single characters. Split them into the desired tokens and it will work:
dictrev={} # movie: actors-list (I assume)
for k in mydict:
for v in mydict[k].split(', '): # iterate through the comma-separated titles
dictrev.setdefault(v, []).append(k)

If what you want is the reverse your dictionary values (separated by commas), the following may be the solution that you're looking for:
my_dict = {
'Anthony Hopkins': ' Hannibal, The Edge, Meet Joe Black, Proof',
'Julia Roberts' : ' Pretty Woman, Oceans Eleven, Runaway Bride'
}
res_dict {}
for item in my_dict:
res_dict[item] = ', '.join(reversed(my_dict[item].strip().split(','))).strip()
strip() used to remove spaces at the beginning / end of each value
split() used to split values (using , separator)
reversed() used to reverse the resulted list
join() used to form the final value for each key of res_dict
Output:
>>> res_dict
{'Anthony Hopkins': 'Proof, Meet Joe Black, The Edge, Hannibal', 'Julia Roberts': 'Runaway Bride, Oceans Eleven, Pretty Woman'}

Related

How to find required word in novel in python?

I have a text and I have got a task in python with reading module:
Find the names of people who are referred to as Mr. XXX. Save the result in a dictionary with the name as key and number of times it is used as value. For example:
If Mr. Churchill is in the novel, then include {'Churchill' : 2}
If Mr. Frank Churchill is in the novel, then include {'Frank Churchill' : 4}
The file is .txt and it contains around 10-15 paragraphs.
Do you have ideas about how can it be improved? (It gives me error after some words, I guess error happens due to the reason that one of the Mr. is at the end of the line.)
orig_text= open('emma.txt', encoding = 'UTF-8')
lines= orig_text.readlines()[32:16267]
counts = dict()
for line in lines:
wordsdirty = line.split()
try:
print (wordsdirty[wordsdirty.index('Mr.') + 1])
except ValueError:
continue
Try this:
text = "When did Mr. Churchill told Mr. James Brown about the fish"
m = [x[0] for x in re.findall('(Mr\.( [A-Z][a-z]*)+)', text)]
You get:
['Mr. Churchill', 'Mr. James Brown']
To solve the line issue simply read the entire file:
text = file.read()
Then, to count the occurrences, simply run:
Counter(m)
Finally, if you'd like to drop 'Mr. ' from all your dictionary entries, use x[0][4:] instead of x[0].
This can be easily done using regex and capturing group.
Take a look here for reference, in this scenario you might want to do something like
# retrieve a list of strings that match your regex
matches = re.findall("Mr\. ([a-zA-Z]+)", your_entire_file) # not sure about the regex
# then create a dictionary and count the occurrences of each match
# if you are allowed to use modules, this can be done using Counter
Counter(matches)
To access the entire file like that, you might want to map it to memory, take a look at this question

Remove mirrored duplicate strings in list python?

What is an efficient python algorithm to remove all mirrored text duplicates in a list where the items are in the format as below?
ExList = [' dutch italian english', ' italian english dutch', ' dutch italian german', ' dutch german italian' ]
Required result: [' dutch english italian ', 'dutch german italian' ]
This solution uses the set datastructure and focuses on producing compact code, mostly with list/set/generator comprehenstions. If this is a homework task for a beginner course and you just copy the result, it will be very obvious that you did not write the code yourself. Try to follow the thought process and reproduce the results yourself.
1) split each element at " " (space)
for item in ExList:
splitted = item.split(" ")
2) remove now empty elements due to superfluous spaces in the input. This can be done in 1 line with the step above (empty strings are "falsy") using a list comprehenstion:
for item in ExList:
splitted = [lang for lang in item.split(" ") if lang]
3) Put the result in a set, which by definition disregards order and ignores duplicates. For this step we primarily need the property of unordered identity, meaning set([1, 2]) == set([2, 1]). This can be combined with the line above using a generator comprehension:
for item in ExList:
itemSet = set(lang for lang in item.split(" ") if lang)
Now, within that loop, put all those sets of languages into another set. This time, because all the item sets with the same items in any order are considered equal, the outer set will automatically disregard any duplicates. To be able to put the item set into another set, it needs to be immutable (because mutability might cause a change in identity), which is called a frozenset in python. The code looks like this:
ExList = [' dutch italian english', ' italian english dutch', ' dutch italian german', ' dutch german italian' ]
result = set()
for item in ExList:
result.add(frozenset(lang for lang in item.split(" ") if lang))
Or, as a set comprehension on one line:
result = {frozenset(lang for lang in item.split(" ") if lang) for item in ExList}
The result is as follows:
>>> print(result)
{frozenset({'italian', 'dutch', 'german'}), frozenset({'italian', 'dutch', 'english'})}
you can turn that back into lists if the set print output looks confusing to you
>>> print([list(itemSet) for itemSet in result])
[['italian', 'dutch', 'german'], ['italian', 'dutch', 'english']]
This may work for you:
def unique_list(s):
x = set([tuple(sorted(s.split())) for s in ExList])
return [" ".join(s) for s in x]
print(unique_list(ExList)
This might not be the most efficient solution, but hope it will be of some help.
Using the property that keys of dictionary are unique.
m_dict = {}
for a in ExList:
b = a.split()
b.sort()
m_dict[' '.join(b)] = None
print m_dict.keys()

Replacing emoji with text in python pandas?

How to replace the Values of the dictionary with the keys of the dictionary in the Data?
I have this dictionary
Dict = {' butterfly': "Ƹ̵̡Ӝ̵̨̄Ʒ'",
' clapping hands': "o/', '*o/*'",
' face with raised eyebrow': "O?O'",
' face with symbols on mouth': ">.'",
' grimacing face': "e.e', 'O.e', 'O.e'",
' rolling on the floor laughing': "m/*.*m/'"}
Keys = text/meaning of emoji,
Values = emoji,
I want to replace the emoji(values) with the text(key) in my data.
Please suggest any better way to proceed.
sample data which has emoji....
.#AnnaKendrick47 My set up at the electronics boat at work. ^_^
"Fun update for everyone who's requested, #EW is now IN!! #WordsWFriends\n ⬇️ ⬇️ ⬇️
'#AnnaKendrick47 please sing #DrewGasparini \'s Circus""',
One way would be like this:
>>> [k for k,v in Dict.iteritems() if v==">.'"]
[' face with symbols on mouth']
But if you can define the dictionary however you like, it would probably be better to do so with the emoji as the keys rather than the values. If you can't change the definition, you could define a second dictionary this way round:
>>> dict2 = dict(zip(Dict.values(),Dict.keys()))
>>> dict2[">.'"]
' face with symbols on mouth'

Python: I want to check for the count of the words in the string

I managed to do that but the case I'm struggling with is when I have to consider 'color' equal to 'colour' for all such words and return count accordingly. To do this, I wrote a dictionary of common words with spelling changes in American and GB English for this, but pretty sure this isn't the right approach.
ukus=dict() ukus={'COLOUR':'COLOR','CHEQUE':'CHECK',
'PROGRAMME':'PROGRAM','GREY':'GRAY',
'JEWELLERY':'JEWELERY','ALUMINIUM':'ALUMINUM',
'THEATER':'THEATRE','LICENSE':'LICENCE','ARMOUR':'ARMOR',
'ARTEFACT':'ARTIFACT','CENTRE':'CENTER',
'CYPHER':'CIPHER','DISC':'DISK','FIBRE':'FIBER',
'FULFILL':'FULFIL','METRE':'METER',
'SAVOURY':'SAVORY','TONNE':'TON','TYRE':'TIRE',
'COLOR':'COLOUR','CHECK':'CHEQUE',
'PROGRAM':'PROGRAMME','GRAY':'GREY',
'JEWELERY':'JEWELLERY','ALUMINUM':'ALUMINIUM',
'THEATRE':'THEATER','LICENCE':'LICENSE','ARMOR':'ARMOUR',
'ARTIFACT':'ARTEFACT','CENTER':'CENTRE',
'CIPHER':'CYPHER','DISK':'DISC','FIBER':'FIBRE',
'FULFIL':'FULFILL','METER':'METRE','SAVORY':'SAVOURY',
'TON':'TONNNE','TIRE':'TYRE'}
This is the dictionary I wrote to check the values. As you can see this is degrading the performance. Pyenchant isn't available for 64bit python. Someone please help me out. Thank you in advance.
Okay, I think I know enough from your comments to provide this as a solution. The function below allows you to choose either UK or US replacement (it uses US default, but you can of course flip that) and allows for you to either perform minor hygiene on the string.
import re
ukus={'COLOUR':'COLOR','CHEQUE':'CHECK',
'PROGRAMME':'PROGRAM','GREY':'GRAY',
'JEWELLERY':'JEWELERY','ALUMINIUM':'ALUMINUM',
'THEATER':'THEATRE','LICENSE':'LICENCE','ARMOUR':'ARMOR',
'ARTEFACT':'ARTIFACT','CENTRE':'CENTER',
'CYPHER':'CIPHER','DISC':'DISK','FIBRE':'FIBER',
'FULFILL':'FULFIL','METRE':'METER',
'SAVOURY':'SAVORY','TONNE':'TON','TYRE':'TIRE'}
usuk={'COLOR':'COLOUR','CHECK':'CHEQUE',
'PROGRAM':'PROGRAMME','GRAY':'GREY',
'JEWELERY':'JEWELLERY','ALUMINUM':'ALUMINIUM',
'THEATRE':'THEATER','LICENCE':'LICENSE','ARMOR':'ARMOUR',
'ARTIFACT':'ARTEFACT','CENTER':'CENTRE',
'CIPHER':'CYPHER','DISK':'DISC','FIBER':'FIBRE',
'FULFIL':'FULFILL','METER':'METRE','SAVORY':'SAVOURY',
'TON':'TONNNE','TIRE':'TYRE'}
def str_wd_count(my_string, uk=False, hygiene=True):
us = not(uk)
# if the UK flag is TRUE, default to UK version, else default to US version
print "Using the "+uk*"UK"+us*"US"+" dictionary for default words"
# optional hygiene of non-alphanumeric characters for pure word counting
if hygiene:
my_string = re.sub('[^ \d\w]',' ',my_string)
my_string = re.sub(' {1,}',' ',my_string)
# create a list of the unqique words in the text
ttl_wds = [ukus.get(w,w) if us else usuk.get(w,w) for w in my_string.upper().split(' ')]
wd_counts = {}
for wd in ttl_wds:
wd_counts[wd] = wd_counts.get(wd,0)+1
return wd_counts
As a sample of use, consider the string
str1 = 'The colour of the dog is not the same as the color of the tire, or is it tyre, I can never tell which one will fulfill'
# Resulting sorted dict.items() With Default Settings
'[(THE,5),(TIRE,2),(COLOR,2),(OF,2),(IS,2),(FULFIL,1),(NEVER,1),(DOG,1),(SAME,1),(IT,1),(WILL,1),(I,1),(AS,1),(CAN,1),(WHICH,1),(TELL,1),(NOT,1),(ONE,1),(OR,1)]'
# Resulting sorted dict.items() With hygiene=False
'[(THE,5),(COLOR,2),(OF,2),(IS,2),(FULFIL,1),(NEVER,1),(DOG,1),(SAME,1),(TIRE,,1),(WILL,1),(I,1),(AS,1),(CAN,1),(WHICH,1),(TELL,1),(NOT,1),(ONE,1),(OR,1),(IT,1),(TYRE,,1)]'
# Resulting sorted dict.items() With UK Swap, hygiene=True
'[(THE,5),(OF,2),(IS,2),(TYRE,2),(COLOUR,2),(WHICH,1),(I,1),(NEVER,1),(DOG,1),(SAME,1),(OR,1),(WILL,1),(AS,1),(CAN,1),(TELL,1),(NOT,1),(FULFILL,1),(ONE,1),(IT,1)]'
# Resulting sorted dict.items() With UK Swap, hygiene=False
'[(THE,5),(OF,2),(IS,2),(COLOUR,2),(ONE,1),(I,1),(NEVER,1),(DOG,1),(SAME,1),(TIRE,,1),(WILL,1),(AS,1),(CAN,1),(WHICH,1),(TELL,1),(NOT,1),(FULFILL,1),(TYRE,,1),(IT,1),(OR,1)]'
You can use the resulting dictionary of word counts in any way you'd like, and if you need the original string with the modifications added it is easy enough to modify the function to also return that.
Step 1:
Create a temporary string and then replace all the words with values of your dict with it's corresponding keys as:
>>> temp_string = str(my_string)
>>> for k, v in ukus.items():
... temp_string = temp_string.replace(" {} ".format(v), " {} ".format(k)) # <--surround by space " " to replace only words
Step 2:
Now, in order to find words in the string, firstly split it into list of words and then use itertools.Counter() to get count of each element in the list. Below is the sample code:
>>> from collections import Counter
>>> my_string = 'Hello World! Hello again. I am saying Hello one more time'
>>> count_dict = Counter(my_string.split())
# Value of count_dict:
# Counter({'Hello': 3, 'saying': 1, 'again.': 1, 'I': 1, 'am': 1, 'one': 1, 'World!': 1, 'time': 1, 'more': 1})
>>> count_dict['Hello']
3
Step 3:
Now, since you want the count of both "colour" and "color" in your dict, re-iterate the dict to add those values, and the missing values as "0"
for k, v in ukus.items():
if k in count_dict:
count_dict[v] = count_dict[k]
else:
count_dict[v] = count_dict[k] = 0

Keeping a count of words in a list without using any count method in python?

I need to keep a count of words in the list that appear once in a list, and one list for words that appear twice without using any count method, I tried using a set but it removes only the duplicate not the original. Is there any way to keep the words appearing once in one list and words that appear twice in another list?
the sample file is text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n',
'Andy Gosling\n'], so technically Andy, and Andy would be in one list, and the rest in the other.
Using dictionaries is not allowed :/
for word in text:
clean = clean_up(word)
for words in clean.split():
clean2 = clean_up(words)
l = clean_list.append(clean2)
if clean2 not in clean_list:
clean_list.append(clean2)
print(clean_list)
This is a very bad, unPythonic way of doing things; but once you disallow Counter and dict, this is about all that's left. (Edit: except for sets, d'oh!)
text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n', 'Andy Gosling\n']
once_words = []
more_than_once_words = []
for sentence in text:
for word in sentence.split():
if word in more_than_once_words:
pass # do nothing
elif word in once_words:
once_words.remove(word)
more_than_once_words.append(word)
else:
once_words.append(word)
which results in
# once_words
['Fennimore', 'Cooper', 'Peter,', 'Paul,', 'and', 'Mary', 'Gosling']
# more_than_once_words
['Andy']
It is a silly problem removing key data structures or loops or whatever. Why not just program is C then? Tell your teacher to get a job...
Editorial aside, here is a solution:
>>> text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n','Andy Gosling\n']
>>> data=' '.join(e.strip('\n,.') for e in ''.join(text).split()).split()
>>> data
['Andy', 'Fennimore', 'Cooper', 'Peter', 'Paul', 'and', 'Mary', 'Andy', 'Gosling']
>>> [e for e in data if data.count(e)==1]
['Fennimore', 'Cooper', 'Peter', 'Paul', 'and', 'Mary', 'Gosling']
>>> list({e for e in data if data.count(e)==2})
['Andy']
If you can use a set (I wouldn't use it either, if you're not allowed to use dictionaries), then you can use the set to keep track of what words you have 'seen'... and another one for the words that appear more than once. Eg:
seen = set()
duplicate = set()
Then, each time you get a word, test if it is on seen. If it is not, add it to seen. If it is in seen, add it to duplicate.
At the end, you'd have a set of seen words, containing all the words, and a duplicate set, with all those that appear more than once.
Then you only need to substract duplicate from seen, and the result is the words that have no duplicates (ie. the ones that appear only once).
This can also be implemented using only lists (which would be more honest to your homework, if a bit more laborious).
from itertools import groupby
from operator import itemgetter
text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n', 'Andy Gosling\n']
one, two = [list(group) for key, group in groupby( sorted(((key, len(list(group))) for key, group in groupby( sorted(' '.join(text).split()))), key=itemgetter(1)), key=itemgetter(1))]

Categories

Resources