This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to check if my list has an item from another list(dictionary)?
This is actually homework for a mark.
The user of program must write sentence down. Than program checks the words and prints the wrong ones (if wrong words appear more than once program must print them only once). Wrong words must be printed in the order they appear in the sentence.
Here is how I did it. But there is one problem. The wrong words do not apper in the same order they apper in the sentence beacause of built-in function sorted. Is there any other method to delete duplicates in list?
And dictionary is imported from dictionary.txt!!
sentence=input("Sentence:")
dictionary=open("dictionary.txt", encoding="latin2").read().lower().split()
import re
words=re.findall("\w+",sentence.lower())
words=sorted(set(words))
sez=[]
for i in words:
if i not in dictionary:
sez.append(i)
print(sez)
words = filter(lambda index, item: words.index(item) == index, enumerate(words))
It'll filter out every duplicate and will maintain the order.
As Thomas pointed out, this is a rather heavy approach. if you need to process a larger number of words, you could use this for loop:
dups = set()
filtered_list = []
for word in words:
if not word in dups:
filtered_list.append(word)
dups.add(word)
To delete duplicates in a list, add them to a dictionary. A dictionary only has 1 KEY:VALUE pair.
You can use OrderedSet recipe.
#edit: BTW if the dictionary is big then it's better to convert dictionary list into a set -- checking existence of an element in a set takes constant time instead of O(n) in the case of list.
You should check this answer:
https://stackoverflow.com/a/7961425/1225541
If you use his method and stop sorting the words array (remove the words=sorted(set(words)) line) it should do what you expect.
Related
This question already has answers here:
How can I check if two strings are anagrams of each other?
(27 answers)
Closed 11 months ago.
I want to write a function that finds anagrams. Anagrams are words that are written with the same characters. For example, "abba" and "baba".
I have gotten as far as writing a function that can recognize if a certain string has the same letters as another string. However, I can't account for the number of repeated letters in the string.
How should I do this?
This is the code I have written so far:
def anagrams(word, words):
list1 = []
for i in words:
if set(word) == set(i):
list1.append(i)
return list1
The inputs look something like this:
('abba', ['aabb', 'abcd', 'bbaa', 'dada'])
I want to find an anagram for the first string, within the list.
You kind of point to the solution in your question. Your current problem is that using a set, you ignore the count of times each individual letter is contained in the input and target strings. So to fix this, start using these counts. For instance you can have a mapping between letter and the number of its occurrences in each of the two strings and then compare those mappings.
For the purposes of learning I would like to encourage you to use dict to solve the problem. Still after you know how to do that, there is a built-in container in collections, called Counter that can do the same for you.
I'm having trouble in a school project because I don't know how to join elements of a list in segments. Here's an example: Let's say I have the following list:
list = ["T","h","i","s","I","s","A","L","i","s","t",]
How could I join this list so that the program outputs the following?:
Output: ["This","Is","A","List"]
Assuming list is your input, and without giving you the answer outright since it's a school project you should do yourself, here are some hints.
You'll want to check if a character is uppercase to know when the start of a word is. With python, you can use isupper() (ex: 'C'.isupper() would return True).
Python strings are iterable.
You can add a character to the end of a string using += (ex: myWord += 'a')
You can add a string to a list using append (ex: myList.append(myWord))
Remember this is a learning experience and there's no real value to being given the answer outright, if that's what you were hoping for. Best of luck and welcome to StackOverflow.
You can use regex for this
import re
list = ["T","h","i","s","I","s","A","L","i","s","t",]
sep=[s for s in re.split("([A-Z][^A-Z]*)", ''.join(list)) if s]
print(sep)
I have set (not list) of strings (words). It is a big one. (It's ripped out of images with openCV and tesseract so there's no reliable way to predict its contents.)
At some point of working with this list I need to find out if it contains at least one word that begins with part I'm currently processing.
So it's like (NOT an actual code):
if exists(word.startswith(word_part) in word_set) then continue else break
There is a very good answer on how to find all strings in list that start with something here:
result = [s for s in string_list if s.startswith(lookup)]
or
result = filter(lambda s: s.startswith(lookup), string_list)
But they return list or iterator of all strings found.
I only need to find if any such string exists within set, not get them all.
Performance-wise it seems kinda stupid to get list, then get its len and see if it's more than zero and then just drop that list.
It there a better / faster / cleaner way?
Your pseudocode is very close to real code!
if any(word.startswith(word_part) for word in word_set):
continue
else:
break
any returns as soon as it finds one true element, so it's efficient.
You need yield:
def find_word(word_set, letter):
for word in word_set:
if word.startswith(letter):
yield word
yield None
if next(find_word(word_set, letter)): print('word exists')
Yield gives out words lazily. So if you call it once, it will give out only one word.
I want to compare a list of strings and if a certain sequence of characters match, I want to put those matching strings into a new_list, like so:
string_list1 = ['CE.1.FXZ', 'CE.1.FXX', 'CE.1.FXY', 'CE.4.FXZ', 'CE.4.FXX', 'CE.4.FXY']
new_list = ['CE.1.FXZ', 'CE.1.FXX', 'CE.1.FXY']
As you can see, the common character in each is either 1 or 4.
My question is how can I separate strings based on a common character, if I do not know the common character beforehand? For example, I would like to parse the string_list1 into a function and have the function automatically identify the common characters and then separate based on that. Any help would be great! Thanks.
You can isolate the "common character" in your example with python built-in str.split() method (more info at https://docs.python.org/fr/2.7/library/stdtypes.html#str.split) like so :
for i in string_list1:
common_character = i.split(".")[1]
Next step would be creating a list each time you see a novel "common_character" or adding your element to an existing list using the list.append() method (one by one).
Best of luck !
If the common char is always the second token (when split on the .) you can use a default dict where each key is the common char and each value is the list of common chars.
from collections import defaultdict
string_list1 = ['CE.1.FXZ', 'CE.1.FXX', 'CE.1.FXY', 'CE.4.FXZ', 'CE.4.FXX', 'CE.4.FXY']
common_chars = defaultdict(list)
for str in string_list1:
common_chars[str.split('.')[1]].append(str)
for common_group in common_chars.values():
print(common_group)
Outputs:
['CE.1.FXZ', 'CE.1.FXX', 'CE.1.FXY']
['CE.4.FXZ', 'CE.4.FXX', 'CE.4.FXY']
I have a text file with thousands of words in it. I have to count the number of words that are in alphabetical order. The following is cut out from a bunch of other code I've got:
Counter = 0
for word in wordStr:
word = word.strip()
if len(word) > 4:
a = 0
b = 1
while word[a] < word[b]:
a += 1
b += 1
Counter += 1
return Counter
There are some obvious things wrong here and I know it, but don't know how to fix it. My reasoning is this: if the first letter of a word is < the second letter of the word, that part of the word is alphabetical. So I need to go through and perform this kind of operation on a word until I find the entire word to be alphabetical or run into a situation where letter a is > letter b.
At the moment, my code increases the Counter when word[a] < word[b]. However, I need to change this so it only increases when the entire word is alphabetical, not just the first two letters. My other problem is that I get errors because eventually the while loop tries to compare string indexes that don't exist because of the way I am incrementing a and b. I know lots need to be rewritten and I have got the logic down.. just struggling to implement it.
EDIT: I forgot I have had this problem before and someone on my other question helped me solve it. Sorry for the confusion.
An easy way to see if a word is in alphabetical order is to sort it, then see if the sorted version is the same as the original version. Python has a function sorted() that can be use to sort a string; however, the result will come out as a list. So you'll need to convert the sorted version back to a string, or else convert the original string to a list (the second is a bit easier, just pass the string into list()), before comparing them.
You might also want to convert the string to lower case (or upper case -- doesn't matter as long as it's consistent) first because that will affect the sorting order: all capital letters come before lower case ones, so Cat would test as already being in alphabetical order even though it isn't. You can do this using the .lower() method on the string object.
Since this looks like homework I won't post working code but it should be very simple to put together from what I've given you.