Remove mirrored duplicate strings in list python?

Remove mirrored duplicate strings in list python? - python

What is an efficient python algorithm to remove all mirrored text duplicates in a list where the items are in the format as below?
ExList = [' dutch italian english', ' italian english dutch', ' dutch italian german', ' dutch german italian' ]
Required result: [' dutch english italian ', 'dutch german italian' ]

This solution uses the set datastructure and focuses on producing compact code, mostly with list/set/generator comprehenstions. If this is a homework task for a beginner course and you just copy the result, it will be very obvious that you did not write the code yourself. Try to follow the thought process and reproduce the results yourself.
1) split each element at " " (space)
for item in ExList:
splitted = item.split(" ")
2) remove now empty elements due to superfluous spaces in the input. This can be done in 1 line with the step above (empty strings are "falsy") using a list comprehenstion:
for item in ExList:
splitted = [lang for lang in item.split(" ") if lang]
3) Put the result in a set, which by definition disregards order and ignores duplicates. For this step we primarily need the property of unordered identity, meaning set([1, 2]) == set([2, 1]). This can be combined with the line above using a generator comprehension:
for item in ExList:
itemSet = set(lang for lang in item.split(" ") if lang)
Now, within that loop, put all those sets of languages into another set. This time, because all the item sets with the same items in any order are considered equal, the outer set will automatically disregard any duplicates. To be able to put the item set into another set, it needs to be immutable (because mutability might cause a change in identity), which is called a frozenset in python. The code looks like this:
ExList = [' dutch italian english', ' italian english dutch', ' dutch italian german', ' dutch german italian' ]
result = set()
for item in ExList:
result.add(frozenset(lang for lang in item.split(" ") if lang))
Or, as a set comprehension on one line:
result = {frozenset(lang for lang in item.split(" ") if lang) for item in ExList}
The result is as follows:
>>> print(result)
{frozenset({'italian', 'dutch', 'german'}), frozenset({'italian', 'dutch', 'english'})}
you can turn that back into lists if the set print output looks confusing to you
>>> print([list(itemSet) for itemSet in result])
[['italian', 'dutch', 'german'], ['italian', 'dutch', 'english']]

This may work for you:
def unique_list(s):
x = set([tuple(sorted(s.split())) for s in ExList])
return [" ".join(s) for s in x]
print(unique_list(ExList)

This might not be the most efficient solution, but hope it will be of some help.
Using the property that keys of dictionary are unique.
m_dict = {}
for a in ExList:
b = a.split()
b.sort()
m_dict[' '.join(b)] = None
print m_dict.keys()

Related

Python: Locate word that contains certain string in list of tuples

I am trying to locate words that contains certain string inside a list of lists in python, for example: If I have a list of tuples like:
the_list = [
('Had denoting properly #T-jointure you occasion directly raillery'),
('. In said to of poor full be post face snug. Introduced imprudence'),
('see say #T-unpleasing devonshire acceptance son.'),
('Exeter longer #T-wisdom gay nor design age.', 'Am weather to entered norland'),
('no in showing service. Nor repeated speaking', ' shy appetite.'),
('Excited it hastily an pasture #T-it observe.', 'Snug #T-hand how dare here too.')
]
I want to find a specific string that I search for and extract a complete word that contains it, example
for sentence in the_list:
for word in sentence:
if '#T-' in word:
print(word)

import re
wordSearch = re.compile(r'word')
for x, y in the_list:
if wordSearch.match(x):
print(x)
elif wordSearch.match(y):
print(y)

You could use list of comprehension on a flattened array of yours:
from pandas.core.common import flatten
[[word for word in x.split(' ') if '#T-' in word] for x in list(flatten(the_list)) if '#T-' in x]
#[['#T-jointure'], ['#T-unpleasing'], ['#T-wisdom'], ['#T-it'], ['#T-hand']]
Relevant places: How to make a flat list out of list of lists? (specifically this answer), Double for loop list comprehension.

you would need to use re for this task
import re
a = re.search("#(.*?)[\s]",'Exeter longer #T-wisdom gay nor design age.')
a.group(0)
Note : you need to account for the Nonetype else it will throw and error
for name in the_list:
try:
if isinstance(name,(list,tuple)):
for name1 in name:
result = re.search("#(.*?)[\s]",name1)
print(result.group(0))
else:
result = re.search("#(.*?)[\s]",name)
print(result.group(0))
except:
pass

Why is my RegEx code replacing some strings, but not others?

I have abstracts of academic articles. Sometimes, the abstract will contain lines like "PurposeThis article explores...." or "Design/methodology/approachThe design of our study....". I call terms like "Purpose" and "Design/methodology/approach" labels. I want the string to look like this: [label][:][space]. For example: "Purpose: This article explores...."
The code below gets me the result I want when the original string has a space between the label and the text (e.g. "Purpose This article explores....". But I don't understand why it also doesn't work when there is no space. May I ask what I need to do to the code below so that the labels are formatted the way I want, even when the original text has no space between the label and the text? Note that I imported re.sub.
def clean_abstract(my_abstract):
labels = ['Purpose', 'Design/methodology/approach', 'Methodology/Approach', 'Methodology/approach' 'Findings', 'Research limitations/implications', 'Research limitations/Implications' 'Practical implications', 'Social implications', 'Originality/value']
for i in labels:
cleaned_abstract = sub(i, i + ': ', cleaned_abstract)
return cleaned_abstract

Code
See code in use here
labels = ['Purpose', 'Design/methodology/approach', 'Methodology/Approach', 'Methodology/approach' 'Findings', 'Research limitations/implications', 'Research limitations/Implications' 'Practical implications', 'Social implications', 'Originality/value']
strings = ['PurposeThis article explores....', 'Design/methodology/approachThe design of our study....']
print [l + ": " + s.split(l)[1].lstrip() for l in labels for s in strings if l in s]
Results
[
'Purpose: This article explores....',
'Design/methodology/approach: The design of our study....'
]
Explanation
Using the logic from this post.
print [] returns a list of results
l + ": " + s.split(l)[1].lstrip() creates our strings
l is explained below
: literally
s.split(l).lstrip() Split s on l and remove any whitespace from the left side of the string
for l in labels Loops over labels setting l to the value upon each iteration
for s in strings Loops over strings setting s to the value upon each iteration
if l in s If l is found in s

How to split list elements to a line separated by space

I have a list in python as :
values = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("".join(values))
I want output should be as :-
Subjects: Maths English Hindi Science Physical_Edu Accounts
I am new to Python, I used join() method but unable to get expected output.

You could map the str.stripfunction to every element in the list and join them afterwards.
values = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("Subjects:", " ".join(map(str.strip, values)))

Using a regular expression approach:
import re
lst = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
rx = re.compile(r'.*')
print("Subjects: {}".format(" ".join(match.group(0) for item in lst for match in [rx.match(item)])))
# Subjects: Maths English Hindi Science Physical_Edu Accounts
But better use strip() (or even better: rstrip()) as provided in other answers like:
string = "Subjects: {}".format(" ".join(map(str.rstrip, lst)))
print(string)

strip() each element of the string and then join() with a space in between them.
a = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("Subjects: " +" ".join(map(lambda x:x.strip(), a)))
Output:
Subjects: Maths English Hindi Science Physical_Edu Accounts
As pointed out by #miindlek, you can also achieve the same thing, by using map(str.strip, a) in place of map(lambda x:x.strip(), a))

What you can do is use this example to strip the newlines and join them using:
joined_string = " ".join(stripped_array)

Unclear error when I try to reverse dictionary python

This is my code:
my_dict = {'Anthony Hopkins': ' Hannibal, The Edge, Meet Joe Black, Proof', 'Julia Roberts': ' Pretty Woman, Oceans Eleven, Runaway Bride', 'Salma Hayek': ' Desperado, Wild Wild West', 'Gwyneth Paltrow': ' Shakespeare in Love, Bounce, Proof', 'Meg Ryan': ' You have got mail, Sleepless in Seattle', 'Russell Crowe': ' Gladiator, A Beautiful Mind, Cinderella Man, American Gangster' .....}
dictrev={}
for i in mydict:
for j in mydict[i] :
if j not in dictrev:
dictrev.setdefault(j, []).append(i)
print (dictrev)
The problem is that when I debug I saw that the program reads only one character values (this line for j in mydict[i] : and I need the first value (there are multiple values).
Any suggestions what is the problem
Thank you very much for your help

Could you please format your code like this:
do whatever
You do that by typing enter two times, then for each line of code indenting four spaces. To type normally after that, start a new line and do not type the four spaces at the start of it.
If I understand what you are asking, you want to swap the key and value of the dictionary, and you are getting an error while doing so. I cannot read your unformatted code (no offense), so I will provide a dictionary swapping technique that works for me.
my_dict = {1: "bob", 2: "bill", 3: "rob"}
new_dict = {}
for key in my_dict:
new_key = my_dict[key]
new_value = key
new_dict.update({new_key:new_value})
print(new_dict)
This code works by having the original dictionary, my_dict and the uncompleted reversed dictionary, new_dict. It iterates through my_dict, which only provides the key, and using that key, it finds the value. The value that we want to be a key is assigned to new_key and the key that we want to be a value is assigned to new_value. It then updates the reversed dictionary with the new key/value. The final line prints the new, reversed dictionary. If you want to set my_dict to the reversed dict, use my_dict = new_dict. I hope this answers your question.

As has been pointed out in the comments, the values in your dict are strings, thus iterating over them will produce single characters. Split them into the desired tokens and it will work:
dictrev={} # movie: actors-list (I assume)
for k in mydict:
for v in mydict[k].split(', '): # iterate through the comma-separated titles
dictrev.setdefault(v, []).append(k)

If what you want is the reverse your dictionary values (separated by commas), the following may be the solution that you're looking for:
my_dict = {
'Anthony Hopkins': ' Hannibal, The Edge, Meet Joe Black, Proof',
'Julia Roberts' : ' Pretty Woman, Oceans Eleven, Runaway Bride'
}
res_dict {}
for item in my_dict:
res_dict[item] = ', '.join(reversed(my_dict[item].strip().split(','))).strip()
strip() used to remove spaces at the beginning / end of each value
split() used to split values (using , separator)
reversed() used to reverse the resulted list
join() used to form the final value for each key of res_dict
Output:
>>> res_dict
{'Anthony Hopkins': 'Proof, Meet Joe Black, The Edge, Hannibal', 'Julia Roberts': 'Runaway Bride, Oceans Eleven, Pretty Woman'}

Keeping a count of words in a list without using any count method in python?

I need to keep a count of words in the list that appear once in a list, and one list for words that appear twice without using any count method, I tried using a set but it removes only the duplicate not the original. Is there any way to keep the words appearing once in one list and words that appear twice in another list?
the sample file is text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n',
'Andy Gosling\n'], so technically Andy, and Andy would be in one list, and the rest in the other.
Using dictionaries is not allowed :/
for word in text:
clean = clean_up(word)
for words in clean.split():
clean2 = clean_up(words)
l = clean_list.append(clean2)
if clean2 not in clean_list:
clean_list.append(clean2)
print(clean_list)

This is a very bad, unPythonic way of doing things; but once you disallow Counter and dict, this is about all that's left. (Edit: except for sets, d'oh!)
text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n', 'Andy Gosling\n']
once_words = []
more_than_once_words = []
for sentence in text:
for word in sentence.split():
if word in more_than_once_words:
pass # do nothing
elif word in once_words:
once_words.remove(word)
more_than_once_words.append(word)
else:
once_words.append(word)
which results in
# once_words
['Fennimore', 'Cooper', 'Peter,', 'Paul,', 'and', 'Mary', 'Gosling']
# more_than_once_words
['Andy']

It is a silly problem removing key data structures or loops or whatever. Why not just program is C then? Tell your teacher to get a job...
Editorial aside, here is a solution:
>>> text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n','Andy Gosling\n']
>>> data=' '.join(e.strip('\n,.') for e in ''.join(text).split()).split()
>>> data
['Andy', 'Fennimore', 'Cooper', 'Peter', 'Paul', 'and', 'Mary', 'Andy', 'Gosling']
>>> [e for e in data if data.count(e)==1]
['Fennimore', 'Cooper', 'Peter', 'Paul', 'and', 'Mary', 'Gosling']
>>> list({e for e in data if data.count(e)==2})
['Andy']

If you can use a set (I wouldn't use it either, if you're not allowed to use dictionaries), then you can use the set to keep track of what words you have 'seen'... and another one for the words that appear more than once. Eg:
seen = set()
duplicate = set()
Then, each time you get a word, test if it is on seen. If it is not, add it to seen. If it is in seen, add it to duplicate.
At the end, you'd have a set of seen words, containing all the words, and a duplicate set, with all those that appear more than once.
Then you only need to substract duplicate from seen, and the result is the words that have no duplicates (ie. the ones that appear only once).
This can also be implemented using only lists (which would be more honest to your homework, if a bit more laborious).

from itertools import groupby
from operator import itemgetter
text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n', 'Andy Gosling\n']
one, two = [list(group) for key, group in groupby( sorted(((key, len(list(group))) for key, group in groupby( sorted(' '.join(text).split()))), key=itemgetter(1)), key=itemgetter(1))]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove mirrored duplicate strings in list python? - python

This may work for you: def unique_list(s): x = set([tuple(sorted(s.split())) for s in ExList]) return [" ".join(s) for s in x] print(unique_list(ExList)

This might not be the most efficient solution, but hope it will be of some help. Using the property that keys of dictionary are unique. m_dict = {} for a in ExList: b = a.split() b.sort() m_dict[' '.join(b)] = None print m_dict.keys()

Related

Python: Locate word that contains certain string in list of tuples

Why is my RegEx code replacing some strings, but not others?

How to split list elements to a line separated by space

Unclear error when I try to reverse dictionary python

Keeping a count of words in a list without using any count method in python?

Categories

Resources