Replacing emoji with text in python pandas? - python

How to replace the Values of the dictionary with the keys of the dictionary in the Data?
I have this dictionary
Dict = {' butterfly': "Ƹ̵̡Ӝ̵̨̄Ʒ'",
' clapping hands': "o/', '*o/*'",
' face with raised eyebrow': "O?O'",
' face with symbols on mouth': ">.'",
' grimacing face': "e.e', 'O.e', 'O.e'",
' rolling on the floor laughing': "m/*.*m/'"}
Keys = text/meaning of emoji,
Values = emoji,
I want to replace the emoji(values) with the text(key) in my data.
Please suggest any better way to proceed.
sample data which has emoji....
.#AnnaKendrick47 My set up at the electronics boat at work. ^_^
"Fun update for everyone who's requested, #EW is now IN!! #WordsWFriends\n ⬇️ ⬇️ ⬇️
'#AnnaKendrick47 please sing #DrewGasparini \'s Circus""',

One way would be like this:
>>> [k for k,v in Dict.iteritems() if v==">.'"]
[' face with symbols on mouth']
But if you can define the dictionary however you like, it would probably be better to do so with the emoji as the keys rather than the values. If you can't change the definition, you could define a second dictionary this way round:
>>> dict2 = dict(zip(Dict.values(),Dict.keys()))
>>> dict2[">.'"]
' face with symbols on mouth'

Related

how do i use string.replace() to replace only when the string is exactly matching

I have a dataframe with a list of poorly spelled clothing types. I want them all in the same format , an example is i have "trous" , "trouse" and "trousers", i would like to replace the first 2 with "trousers".
I have tried using string.replace but it seems its getting the first "trous" and changing it to "trousers" as it should and when it gets to "trouse", it works also but when it gets to "trousers" it makes "trousersersers"! i think its taking the strings which contain trous and trouse and trousers and changing them.
Is there a way i can limit the string.replace to just look for exactly "trous".
here's what iv troied so far, as you can see i have a good few changes to make, most of them work ok but its the likes of trousers and t-shirts which have a few similar changes to be made thats causing the upset.
newTypes=[]
for string in types:
underwear = string.replace(('UNDERW'), 'UNDERWEAR').replace('HANKY', 'HANKIES').replace('TIECLI', 'TIECLIPS').replace('FRAGRA', 'FRAGRANCES').replace('ROBE', 'ROBES').replace('CUFFLI', 'CUFFLINKS').replace('WALLET', 'WALLETS').replace('GIFTSE', 'GIFTSETS').replace('SUNGLA', 'SUNGLASSES').replace('SCARVE', 'SCARVES').replace('TROUSE ', 'TROUSERS').replace('SHIRT', 'SHIRTS').replace('CHINO', 'CHINOS').replace('JACKET', 'JACKETS').replace('KNIT', 'KNITWEAR').replace('POLO', 'POLOS').replace('SWEAT', 'SWEATERS').replace('TEES', 'T-SHIRTS').replace('TSHIRT', 'T-SHIRTS').replace('SHORT', 'SHORTS').replace('ZIP', 'ZIP-TOPS').replace('GILET ', 'GILETS').replace('HOODIE', 'HOODIES').replace('HOODZIP', 'HOODIES').replace('JOGGER', 'JOGGERS').replace('JUMP', 'SWEATERS').replace('SWESHI', 'SWEATERS').replace('BLAZE ', 'BLAZERS').replace('BLAZER ', 'BLAZERS').replace('WC', 'WAISTCOATS').replace('TTOP', 'T-SHIRTS').replace('TROUS', 'TROUSERS').replace('COAT', 'COATS').replace('SLIPPE', 'SLIPPERS').replace('TRAINE', 'TRAINERS').replace('DECK', 'SHOES').replace('FLIP', 'SLIDERS').replace('SUIT', 'SUITS').replace('GIFTVO', 'GIFTVOUCHERS')
newTypes.append(underwear)
types = newTypes
Assuming you're okay with not using string.replace(), you can simply do this:
lst = ["trousers", "trous" , "trouse"]
for i in range(len(lst)):
if "trous" in lst[i]:
lst[i] = "trousers"
print(lst)
# Prints ['trousers', 'trousers', 'trousers']
This checks if the shortest substring, trous, is part of the string, and if so converts the entire string to trousers.
Use a dict for string to be replaced:
d={
'trous': 'trouser',
'trouse': 'trouser',
# ...
}
newtypes=[d.get(string,string) for string in types]
d.get(string,string) will return string if string is not in d.

Remove mirrored duplicate strings in list python?

What is an efficient python algorithm to remove all mirrored text duplicates in a list where the items are in the format as below?
ExList = [' dutch italian english', ' italian english dutch', ' dutch italian german', ' dutch german italian' ]
Required result: [' dutch english italian ', 'dutch german italian' ]
This solution uses the set datastructure and focuses on producing compact code, mostly with list/set/generator comprehenstions. If this is a homework task for a beginner course and you just copy the result, it will be very obvious that you did not write the code yourself. Try to follow the thought process and reproduce the results yourself.
1) split each element at " " (space)
for item in ExList:
splitted = item.split(" ")
2) remove now empty elements due to superfluous spaces in the input. This can be done in 1 line with the step above (empty strings are "falsy") using a list comprehenstion:
for item in ExList:
splitted = [lang for lang in item.split(" ") if lang]
3) Put the result in a set, which by definition disregards order and ignores duplicates. For this step we primarily need the property of unordered identity, meaning set([1, 2]) == set([2, 1]). This can be combined with the line above using a generator comprehension:
for item in ExList:
itemSet = set(lang for lang in item.split(" ") if lang)
Now, within that loop, put all those sets of languages into another set. This time, because all the item sets with the same items in any order are considered equal, the outer set will automatically disregard any duplicates. To be able to put the item set into another set, it needs to be immutable (because mutability might cause a change in identity), which is called a frozenset in python. The code looks like this:
ExList = [' dutch italian english', ' italian english dutch', ' dutch italian german', ' dutch german italian' ]
result = set()
for item in ExList:
result.add(frozenset(lang for lang in item.split(" ") if lang))
Or, as a set comprehension on one line:
result = {frozenset(lang for lang in item.split(" ") if lang) for item in ExList}
The result is as follows:
>>> print(result)
{frozenset({'italian', 'dutch', 'german'}), frozenset({'italian', 'dutch', 'english'})}
you can turn that back into lists if the set print output looks confusing to you
>>> print([list(itemSet) for itemSet in result])
[['italian', 'dutch', 'german'], ['italian', 'dutch', 'english']]
This may work for you:
def unique_list(s):
x = set([tuple(sorted(s.split())) for s in ExList])
return [" ".join(s) for s in x]
print(unique_list(ExList)
This might not be the most efficient solution, but hope it will be of some help.
Using the property that keys of dictionary are unique.
m_dict = {}
for a in ExList:
b = a.split()
b.sort()
m_dict[' '.join(b)] = None
print m_dict.keys()

Replace specific words by user dictionary and others by 0

So I have a review dataset having reviews like
Simply the best. I bought this last year. Still using. No problems
faced till date.Amazing battery life. Works fine in darkness or broad
daylight. Best gift for any book lover.
(This is from the original dataset, I have removed all punctuation and have all lower case in my processed dataset)
What I want to do is replace some words by 1(as per my dictionary) and others by 0.
My dictionary is
dict = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}
I want my output like:
0010000000000001000000000100000
I have used this code:
df['newreviews'] = df['reviews'].map(dict).fillna("0")
This always returns 0 as output. I did not want this so I took 1s and 0s as strings, but despite that I'm getting the same result.
Any suggestions how to solve this?
First dont use dict as variable name, because builtins (python reserved word), then use list comprehension with get for replace not matched values to 0.
Notice:
If data are like date.Amazing - no space after punctuation is necessary replace by whitespace.
df = pd.DataFrame({'reviews':['Simply the best. I bought this last year. Still using. No problems faced till date.Amazing battery life. Works fine in darkness or broad daylight. Best gift for any book lover.']})
d = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}
df['reviews'] = df['reviews'].str.replace(r'[^\w\s]+', ' ').str.lower()
df['newreviews'] = [''.join(d.get(y, '0') for y in x.split()) for x in df['reviews']]
Alternative:
df['newreviews'] = df['reviews'].apply(lambda x: ''.join(d.get(y, '0') for y in x.split()))
print (df)
reviews \
0 simply the best i bought this last year stil...
newreviews
0 0011000000000001000000000100000
You can do:
# clean the sentence
import re
sent = re.sub(r'\.','',sent)
# convert to list
sent = sent.lower().split()
# get values from dict using comprehension
new_sent = ''.join([str(1) if x in mydict else str(0) for x in sent])
print(new_sent)
'001100000000000000000000100000'
You can do it by
df.replace(repl, regex=True, inplace=True)
where df is your dataframe and repl is your dictionary.

Remove whitespace before a specific character in python?

I was wondering if you knew the best way to do this.
This program uses OCR to read text. Occasionally, spaces appear before a decimal point like so:
{'MORTON BASSET BLK SESAME SEE': '$6.89'}
{"KELLOGG'S RICE KRISPIES": '$3.49'}
{'RAID FLY RIBBON 4PK': '$1 .49'}
as you can see, a space appears before the decimal point on the last entry. Any ideas on how to strip JUST this whitespace?
Thank you :)
EDIT: contents before decimal point may contain a varying amount of whitespace. Like
$1 .49
$1 .49
$1 .49
Use regular expressions.
import re
a_list = {"1 .49", "1 .49", "1 .49"}
for a in a_list:
print re.sub(' +.', '.', a)
Result will be
1.49
1.49
1.49
You can just strip out all whitespace from the string, assuming that they follow the same format. SOmething like this:
for item in items:
for key in item.keys():
item[key] = item[key].replace(" ", "")
The key part is replacing the whitespace with no whitespace.
If you just want the whitespace before the ".", then you could use:
.replace(" .", ".") instead.
This would only replace 1 white space. To replace multiple, you could use a while loop like this:
while ' .' in item[key]:
item[key].replace(' .', '.')
For your dict obj:-
>>> d = {'RAID FLY RIBBON 4PK': '$1 .49'}
>>> d['RAID FLY RIBBON 4PK'] = d['RAID FLY RIBBON 4PK'].replace(' ','')
>>> d
{'RAID FLY RIBBON 4PK': '$1.49'}
Even if there is varying space; replace would work fine. See this:-
>>> d = {'RAID FLY RIBBON 4PK': '$1 .49'}
>>> d['RAID FLY RIBBON 4PK'] = d['RAID FLY RIBBON 4PK'].replace(' ','')
>>> d
{'RAID FLY RIBBON 4PK': '$1.49'}
This is trivial with split and join:
"".join("1 .49".split())
This works because splits on one or more spaces. To do this for each value in a dictionary:
{k, "".join(v.split()) for k,v in dict_.items()}
i think that maybe you want something more generic not only for that key:
for key, value in d.items():
d[key]=value.replace(" ","")
in this way independent of the key othe number of space the result will be without white spaces
Sure:
string.replace(' .', '')

Unclear error when I try to reverse dictionary python

This is my code:
my_dict = {'Anthony Hopkins': ' Hannibal, The Edge, Meet Joe Black, Proof', 'Julia Roberts': ' Pretty Woman, Oceans Eleven, Runaway Bride', 'Salma Hayek': ' Desperado, Wild Wild West', 'Gwyneth Paltrow': ' Shakespeare in Love, Bounce, Proof', 'Meg Ryan': ' You have got mail, Sleepless in Seattle', 'Russell Crowe': ' Gladiator, A Beautiful Mind, Cinderella Man, American Gangster' .....}
dictrev={}
for i in mydict:
for j in mydict[i] :
if j not in dictrev:
dictrev.setdefault(j, []).append(i)
print (dictrev)
The problem is that when I debug I saw that the program reads only one character values (this line for j in mydict[i] : and I need the first value (there are multiple values).
Any suggestions what is the problem
Thank you very much for your help
Could you please format your code like this:
do whatever
You do that by typing enter two times, then for each line of code indenting four spaces. To type normally after that, start a new line and do not type the four spaces at the start of it.
If I understand what you are asking, you want to swap the key and value of the dictionary, and you are getting an error while doing so. I cannot read your unformatted code (no offense), so I will provide a dictionary swapping technique that works for me.
my_dict = {1: "bob", 2: "bill", 3: "rob"}
new_dict = {}
for key in my_dict:
new_key = my_dict[key]
new_value = key
new_dict.update({new_key:new_value})
print(new_dict)
This code works by having the original dictionary, my_dict and the uncompleted reversed dictionary, new_dict. It iterates through my_dict, which only provides the key, and using that key, it finds the value. The value that we want to be a key is assigned to new_key and the key that we want to be a value is assigned to new_value. It then updates the reversed dictionary with the new key/value. The final line prints the new, reversed dictionary. If you want to set my_dict to the reversed dict, use my_dict = new_dict. I hope this answers your question.
As has been pointed out in the comments, the values in your dict are strings, thus iterating over them will produce single characters. Split them into the desired tokens and it will work:
dictrev={} # movie: actors-list (I assume)
for k in mydict:
for v in mydict[k].split(', '): # iterate through the comma-separated titles
dictrev.setdefault(v, []).append(k)
If what you want is the reverse your dictionary values (separated by commas), the following may be the solution that you're looking for:
my_dict = {
'Anthony Hopkins': ' Hannibal, The Edge, Meet Joe Black, Proof',
'Julia Roberts' : ' Pretty Woman, Oceans Eleven, Runaway Bride'
}
res_dict {}
for item in my_dict:
res_dict[item] = ', '.join(reversed(my_dict[item].strip().split(','))).strip()
strip() used to remove spaces at the beginning / end of each value
split() used to split values (using , separator)
reversed() used to reverse the resulted list
join() used to form the final value for each key of res_dict
Output:
>>> res_dict
{'Anthony Hopkins': 'Proof, Meet Joe Black, The Edge, Hannibal', 'Julia Roberts': 'Runaway Bride, Oceans Eleven, Pretty Woman'}

Categories

Resources