Duplicate words in a list [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How can I return a duplicated word in a list?
I was asked to create a function, word_count(text, n). The text is converted into a list and then return the word that is repeated n times. I've tried to write it but it seems to return every single word.
>>> repeat_word_count("one one was a racehorse two two was one too", 3)
['one']
I've used the for loop and conditioned it. I'm wanted to post my code but I'm scared that my school will find the code online :(

I think I know what it is that you are wanting to do and without seeing your code I can't point out exactly where you're going wrong so I'll take you through creating this function step by step and you should be able to figure out where you went wrong.
I think you are trying to create this:
def function(a,b):
"""where a is a sentence and b is the target number.
The function will return to you each word in the
given sentence that occurs exactly b times."""
To do this we have to do the following:
converts the sentence into a list of words and removes punctuation,capitalization,and spaces.
iterates through each unique word in the sentence and print it if it occurs exactly b times in the sentence
put these together to make a function
so in your example your sentence was "one one was a racehorse two two was one too", and you're looking for all words that occur exactly 3 times, so the function should return the word "one"
We'll look at each step one at a time.
FIRST STEP -
we have to take the sentence or sentences and convert them into a list of words. Since I don't know if you will be using sentences with punctuation and/or capitalization I'll have to assume that its possible and plan to deal with them. We'll have to omit any punctuation/spaces from the list and also change all the letters in each word to lowercase if they happened to have a capital letter because even though "Cat" and "cat" are the same word, according to a computer brain, "Cat" does NOT equal any of these:
"cat" - lowercase c doesn't match uppercase C in "Cat"
" Cat" - there is an extra space at the start of the word
"Cat." - There is a period after the word
"Cat " - There is a space after the word
So if we use "One one was a racehorse two two was one, too." as our input we'll need to handle spaces, punctuation, and capitalization. Luckily all of this work can be done with 2 lines of code by using a regular expression and list comprehension to get rid of all the junk and create a list of words.
import re
wordlist=[i.lower() for i in re.findall(r"[\w']+",sentence)]
This gives us our list of words:
['one', 'one', 'was', 'a', 'racehorse', 'two', 'two', 'was', 'one', 'too']
SECOND STEP -
Now we need to iterate through each Unique word in the wordlist and see if it occurs exactly b times. Since we only need unique words we can create a list that only contains each word exactly once by converting the wordlist from a list to a set and looping through each word in the set and counting the number of times they appear in the wordlist. Any that occur exactly b number of times are our solutions. I'm not exactly sure how you were wanting the results returned but i'm going to assume you want to have each word that fits the criteria to be printed one at a time.
for word in set(wordlist):
if wordlist.count(word)==b:
print word
THIRD STEP -
Now I'll put all this together to create my function:
import re
def repeat_word_count(a,b):
wordlist=[i.lower() for i in re.findall(r"[\w']+",a)]
for word in set(wordlist):
if wordlist.count(word)==b:
print word
I hope this helps you understand a bit better

Related

How can I parse only string without using regex in python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
I am learning Python and have a question about parsing strings without regex. We should use a while loop. Here is the question;
We will have a string from the user with the input function. And then we will export just alpha characters from this sentence to a list.
For example, sentence: "The weather is so lovely today. Jack (our Jack) – Jason - and Alex went to park..? "
Example output: ["The", "weather", "is", "so","lovely","today","Jack","our","Jack","and","Alex","went","to","park"]
I have to note that punctuation marks and special characters such as parentheses are not part of words.
Below you can find I tried my codes. I couldn't find where I had an error.
s=" The weather is so lovely today. Jack (our Jack) – Jason - and Alex went to park..?"
i = 0
j = 0
l=[]
k=[]
count = 0
while s:
while j<len(s) and not s[j].isalpha():
j+=1
l = s[j:]
s=s[j:]
while j < len(s) and l[j].isalpha():
j+=1
s=s[j:]
k.append(l[0:i])
print(k)
print(l)
On the other hand, I did parse the first word with the code below.
s=" The weather is so lovely today. Jack (our Jack) – Jason - and Alex went to park..?"
i = 0
j = 0
l=[]
k=[]
while j<len(s) and not s[j].isalpha():
j+=1
l = s[j:]
while i < len(l) and l[i].isalpha():
i+=1
s=s[i:]
k.append(l[0:i])
print(k)
print(l)
Thanks for your help.
By and large, if your goal is to parse a string and you find yourself modifying the string, you're probably doing it wrong. That's particularly true of languages like Python where strings are immutable, and modifying a string really means creating a new one, which takes time proportional to the length of the string. Doing that in a loop effectively turns a linear scan into a quadratic-time algorithm; you might not notice the dramatic consequences with a few short test cases, but sooner or later you (or someone) will try your code out on a significantly longer string, and the quadratic time will come back to bite you.
Anyway, there's no need. All you need to do is to look at the characters, or more accurately, look at each position between two characters, in order to find the positions of the beginnings of the words (where an alphabetic character follows a non-alphabetic character) and the ends of the words (where a non-alphabetic character follows an alphabetic character). Once the beginning and end of each word is discovered, the complete word can be added to the word list.
Note that we don't actually care what each character is, only whether it is alphabetic. So in the following code, I don't save the previous character; rather I save the boolean value of whether the previous character was alphabetic. At the start of the scan, previous_was_alphabetic is set to False, so if the first character in the string is alphabetic, that counts as the start of a word.
There's one little Python trick here, to handle the end of the string. If the last character in the string is alphabetic, then it's the end of a word, so it would be convenient to ensure that the string ends with a non-alphabetic character. But I don't really want to create a modified string, and I'd prefer not to have to write special purpose code for the end of the string. What I do instead is to use a slice; instead of looking at s[i] (the ith character), I use s[i:i+1], the one-character slice starting at position i. Conveniently, if i happens to be the length of s, then s[i:i+1] is an empty string, '', and even more conveniently, ''.isalpha() is False. So that will act as though there were an invisible non-alphabetic character at the end of the string.
This is not really very Pythonic, but your assignment seems to be insisting that you use a while loop rather than the much more natural for loop (which would require a different way of dealing with the end of the string).
def words_from(s):
"""Returns a list of the "words" (contiguous sequences of alphabetic
characters) from the string s
"""
words = []
previous_was_alphabetic = False
i = 0
while i <= len(s):
next_is_alphabetic = s[i:i+1].isalpha()
if not previous_was_alphabetic and next_is_alphabetic:
# i is the start of a word
start = i
elif previous_was_alphabetic and not next_is_alphabetic:
# i is the position after the end of a word
words.append(s[start:i])
# Move to the next position
previous_was_alphabetic = next_is_alphabetic
i += 1
return words
I think you might want sth like this:
s = "The weather is so lovely today. Jack (our Jack) – Jason - and Alex went to park..? "
punc = '''!()-[]{};:'"\,–,<>./?##$%^&*_~'''
# Removing punctuations in string
# Using loop + punctuation string
for i in s:
if i in punc:
s = s.replace(i, "")
print(s.split())
output:
['The', 'weather', 'is', 'so', 'lovely', 'today', 'Jack', 'our', 'Jack', 'Jason', 'and', 'Alex', 'went', 'to', 'park']

How can I write "EvenWord" Recursive in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
We tried to solve the following problem with friends but we couldn't come to a conclusion. How can we approach this question?
The full question is:
Even Words Problem: An even word is a word that contains an even number of copies of every letter. For example, the word "tattletale"
is an even word, since there are four copies of 't' and two copies of
'a,' 'e,' and 'l.' Similarly, "appeases" and arraigning" are even
words. However, "banana" is not an even word, because there is just
one 'b' and three copies of 'a.'
Write a function def isEvenWord(word) that accepts as input a string
representing a single word and returns whether or not that word is an
even word.
Your solution should be recursive and must not use any loops (e.g.
while, for). As a hint, this problem has a beautiful recursive
decomposition:
• The empty string is an even word, since it has 0 copies of every
letter.
• Otherwise, a word is an even word if there are at least two copies
of the first letter and the word formed by removing two copies of the
first letter is itself an even word.
For example, we can see that the word "appeases" is an even word using
the following logic:
"appeases" is an even word, because "ppeses" is an even word, because
"eses" is an even word, because "ss" is an even word, because "" is an
even word.
Screenshot of the problem description
I assume the tricky part is not using loop since this the solution must be recursive.
In this problem, you want to find whether the count of each letter can be divided by two.
Here are the steps you can follow:
1) Define your base condition
In this problem, the base condition is when there are no more letters in the word to check; in other words, your word is an empty string ''.
If the base condition is reached, it means that you have checked the count of all the letters in the word and the count was always even. So you can stop there. You've checked all the letters in the word and their count and they are even --> you return True and you are done.
2) Define what you do if the base condition is not reached:
In this problem, you need to check that the count of each letter in the word is even. You can store the letter in a variable and check its count in the word.
You must check the last letter each time, create a new word that doesn't contain the letter already checked, and run the isEvenWord(word) function again on the new word.
If when you check the count of the letter, it is not even, then you are done, you know that the word is not even since at least one letter in the word is not even so you return False.
If the count of the letter you are checking is even then you continue the check the next letter by calling your function again on the new word made of the remaining letters that you haven't checked yet.
Here is the code version of the explanation above:
def isEvenWord(word):
if word == '': # base condition
return True
else:
last_letter = word[-1]
if word.count(last_letter) % 2 != 0:
return False
else:
next_word = word[0:-1] #create the next word you want to test (ie the word except the last letter)
next_word = word.replace(last_letter, '') # in the new word, replace the variable last_letter (the letter you just counted) by an empty string '' which is like removing it
return isEvenWord(next_word)
Very nice little puzzle, thanks for sharing.
I hope this helps.

What code is needed to complete this task? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm new to python and I'm trying to teach myself how to use it by completing tasks. I am trying to complete the task below and have written the code beneath it. However, my code does not disregard the punctuation of the input sentence and does not store the sentence's words in a list. What do I need to add to it? (keep in mind, I am BRAND NEW to python, so I have very little knowledge)
Develop a program that identifies individual words in a sentence, stores these in a list and replaces each word in the original sentence with the position of that word in the list.
For example, the sentence:
ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR
COUNTRY
contains the words ASK, NOT, WHAT, YOUR, COUNTRY, CAN, DO, FOR, YOU
The sentence can be recreated from the positions of these words in this list using the sequence
1,2,3,4,5,6,7,8,9,1,3,9,6,7,8,4,5
Save the list of words and the positions of these words in the sentence as separate files or as a single
file.
Analyse the requirements for this system and design, develop, test and evaluate a program to:
• identify the individual words in a sentence and store them in a list
• create a list of positions for words in that list
• save these lists as a single file or as separate files.
restart = 'y'
while (True):
sentence = input("What is your sentence?: ")
sentence_split = sentence.split()
sentence2 = [0]
print(sentence)
for count, i in enumerate(sentence_split):
if sentence_split.count(i) < 2:
sentence2.append(max(sentence2) + 1)
else:
sentence2.append(sentence_split.index(i) +1)
sentence2.remove(0)
print(sentence2)
restart = input("would you like restart the programme y/n?").lower()
if (restart == "n"):
print ("programme terminated")
break
elif (restart == "y"):
pass
else:
print ("Please enter y or n")
Since this are several question in one, here's a few pointers (I won't help you with the file I/O as that's not really part of the problem).
First, to filter punctuation from a sentence, refer to this question.
Second, in order to get an ordered list of unique words and their first positions, you can use an ordered dictionary. Demonstration:
>>> from collections import OrderedDict
>>> s = 'ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY'
>>> words = s.split()
>>> word2pos = OrderedDict()
>>>
>>> for index, word in enumerate(words, 1):
... if word not in word2pos:
... word2pos[word] = index
...
>>> word2pos.keys()
['ASK', 'NOT', 'WHAT', 'YOUR', 'COUNTRY', 'CAN', 'DO', 'FOR', 'YOU']
If you are not allowed to use an ordered dictionary you will have to work a little harder and read through the answers of this question.
Finally, once you have a mapping of words to their first position, no matter how you acquired it, creating the list of positions is straight forward:
>>> [word2pos[word] for word in words]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 3, 9, 6, 7, 8, 4, 5]
You have to consider a few things before, such as what do you do with punctuation as you already noted. Now, considering that you are trying to teach yourself I will attempt to only give you some tips and information that you can look at.
The [strip] command can allow you to remove certain letters/numbers from a sentence, such as a , or ..
The split command will split a string into a list of smaller strings, based on your splitting command. However, to see the place they had in the original string you could look at the index of the list. For instance, in your sentence list you can get the first word by accessing sentence[0] and so forth.
However, considering that words can be repeated this will be a bit trickier, so you might look into something called a dictionary, which is perfect for what you want to do as it allows you do something as follows:
words = {'Word': 'Ask', 'Position': [1,10]}
Now if you stuck with the simplistic approach (using a list), you can iterate over the list with an index and process each word inidividually to write them to a file, for instance along the lines of (warning, this is pseudo code).
for index, word in sentence:
do things with word
write things to a file
To get a more 'true' starting point check the below spoiler
for index, word in enumerate(sentence):
filename = str(word)+".txt"
with open(filename,'w') as fw:
fw.write("Word: "+str(word)+"\tPlace: "+str(index))
I hope this gets you under way!

Python script search a text file for a word [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm writing a Python script. I need to search a text file for a word that end by " s , es or ies " and the word must be greater than three letters , need to konw number of words and the word it-self .....it's hard task i cant work with it, please help me
I agree with the comment that you need to go work on the basics. Here are some ideas to get you started.
1) You say "search a file." Open a file and read line by line like this:
with open ('myFile.txt', 'r') as infile:
for line in infile:
# do something to each line
2) You probably want to store each line in a data structure, like a list:
# before you open the file...
lines = []
# while handling the file:
lines.append(line)
3) You'll need to work with each word. look into the 'split' function of lists.
4) You'll need to look at individual letters of each word. Look into 'string slicing.'
All said and done, you can probably do this with 10 - 15 lines of code.
Try to divide the task into different tasks if it feels overwhelming.
The following code is by no means good, but hopefully it is clear enough so you can get the point.
1 First you need to get your text. If your text is in a file in your computer you need to put it into something that python can use.
# this code takes the content of "text.txt" and store it into my_text
with open("text.txt") as file:
my_text = file.read()
2 Now you need to work with every individual word. All your words are together in a string called my_text, and you would like them separated (split) into a list so you can work with them individually. Usually words are separated by spaces, so that's what you use to separate them:
# take the text and split it into words
my_words = my_text.split(" ")
3 I don't know exactly what you want, but let's suppose you want to store separately the words in different lists. Then you will need those lists:
# three list to store the words:
words_s = []
words_es = []
words_ies = []
4 Now you need to iterate through the words and do stuff with them. For that the easiest thing to do is to use a for loop:
#iterate through each word
for word in my_words:
# you're not interested in short words:
if len(word) <= 3:
continue # this means: do nothing with this word
# now, if the word's length is greater than 3, you classify it:
if word.endswith("ies"):
words_ies.append(word) # add it to the list
if word.endswith("es"):
words_es.append(word) # add it to the list
if word.endswith("s"):
words_s.append(word) # add it to the list
4 Finally, outside the for loop, you can print the list of words and also get the length of the list:
print(words_s)
print(len(words_s))
Something that you need to consider is if you want the words repeated or not. Note that the condition 'word that end by "s", "es" or "ies"' is equivalent to 'word that end by "s"'. The code above will get the words distributed in different lists redundantly. If a word ends with "ies" it also ends with "es" and "s", so it'll be stored in the three lists. If you want to avoid overlapping, you can substitute the if statements by else if statements.
Keep learning the basics as other answers suggest and soon you'll be able to understand scary code like this :D
with open("text.txt") as myfile:
words = [word for word in myfile.read().split(" ") if word.endswith("s") and len(word) > 3]
print("There are {} words ending with 's' and longer than 3".format(len(words)))

if allwords in title: match

Using python3, i have a list of words like:
['foot', 'stool', 'carpet']
these lists vary in length from 1-6 or so. i have thousands and thousands of strings to check, and it is required to make sure that all three words are present in a title. where:
'carpet stand upon the stool of foot balls.'
is a correct match, as all the words are present here, even though they are out of order.
ive wondered about this for a long time, and the only thing i could think of was some sort of iteration like:
for word in list: if word in title: match!
but this give me results like 'carpet cleaner' which is incorrect. i feel as though there is some sort of shortcut to do this, but i cant seem to figure it out without using excessivelist(), continue, break or other methods/terminology that im not yet familiar with. etc etc.
You can use all():
words = ['foot', 'stool', 'carpet']
title = "carpet stand upon the stool of foot balls."
matches = all(word in title for word in words)
Or, inverse the logic with not any() and not in:
matches = not any(word not in title for word in words)

Categories

Resources