Python script search a text file for a word [closed]

Python script search a text file for a word [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm writing a Python script. I need to search a text file for a word that end by " s , es or ies " and the word must be greater than three letters , need to konw number of words and the word it-self .....it's hard task i cant work with it, please help me

I agree with the comment that you need to go work on the basics. Here are some ideas to get you started.
1) You say "search a file." Open a file and read line by line like this:
with open ('myFile.txt', 'r') as infile:
for line in infile:
# do something to each line
2) You probably want to store each line in a data structure, like a list:
# before you open the file...
lines = []
# while handling the file:
lines.append(line)
3) You'll need to work with each word. look into the 'split' function of lists.
4) You'll need to look at individual letters of each word. Look into 'string slicing.'
All said and done, you can probably do this with 10 - 15 lines of code.

Try to divide the task into different tasks if it feels overwhelming.
The following code is by no means good, but hopefully it is clear enough so you can get the point.
1 First you need to get your text. If your text is in a file in your computer you need to put it into something that python can use.
# this code takes the content of "text.txt" and store it into my_text
with open("text.txt") as file:
my_text = file.read()
2 Now you need to work with every individual word. All your words are together in a string called my_text, and you would like them separated (split) into a list so you can work with them individually. Usually words are separated by spaces, so that's what you use to separate them:
# take the text and split it into words
my_words = my_text.split(" ")
3 I don't know exactly what you want, but let's suppose you want to store separately the words in different lists. Then you will need those lists:
# three list to store the words:
words_s = []
words_es = []
words_ies = []
4 Now you need to iterate through the words and do stuff with them. For that the easiest thing to do is to use a for loop:
#iterate through each word
for word in my_words:
# you're not interested in short words:
if len(word) <= 3:
continue # this means: do nothing with this word
# now, if the word's length is greater than 3, you classify it:
if word.endswith("ies"):
words_ies.append(word) # add it to the list
if word.endswith("es"):
words_es.append(word) # add it to the list
if word.endswith("s"):
words_s.append(word) # add it to the list
4 Finally, outside the for loop, you can print the list of words and also get the length of the list:
print(words_s)
print(len(words_s))
Something that you need to consider is if you want the words repeated or not. Note that the condition 'word that end by "s", "es" or "ies"' is equivalent to 'word that end by "s"'. The code above will get the words distributed in different lists redundantly. If a word ends with "ies" it also ends with "es" and "s", so it'll be stored in the three lists. If you want to avoid overlapping, you can substitute the if statements by else if statements.
Keep learning the basics as other answers suggest and soon you'll be able to understand scary code like this :D
with open("text.txt") as myfile:
words = [word for word in myfile.read().split(" ") if word.endswith("s") and len(word) > 3]
print("There are {} words ending with 's' and longer than 3".format(len(words)))

Related

Python 3 - Text file splitting by word, counting occurences and returning a list of sorted tuples

I've already made a post about this but since then I managed to solve the issues I had at first since just editing the old question just makes things more complicated.
I have a text file with about 10'000 words. The output of the function should be: a list of tuples that have the word and the amount of occurences of that word as a tuple in descending order. For example:
out = [("word1",10),("word3",8),("word2",5)...]
So this is my code so far: (Keep in mind, this does work currently to a certain extent it is just extremely inefficient)
def text(inp):
with open(inp,"r") as file:
content = file.readlines()
delimiters = ["\n"," ",",",".","?","!",":",";","-"]
words = content
spaces = ["","'",'']
out = []
for delimiter in delimiters:
new_words = []
for word in words:
if word in spaces:
continue
new_words += word.split(delimiter)
words = new_words
for word in words:
x = (words.count(word),word)
out.append(x)
return out
I found some help from older posts on Stackoverflow for the first few lines.
The input should be the file path. This does work in my case. The first part (the lines I found on here) work nicely. Although there are elements in the list such as empty strings. My questions now are:
How can I sort the output such that the word with the most occurences comes first and then from that on in a descending order? Currently it is random. Also, I'm not sure if the same word comes up multiple times in this list. If yes, how can I make it such that it only occurs once in the output?
Also, How can I make this code more efficient? I used time.time() to check and it took almost 419 seconds, which obviously is terribly inefficient, since the task stated that it should take less than 30sec.
I apologize in advance for any mistakes I made and my lack of knowledge on this

instead of running so many loops and conditions.
you can use re.split
import re
def text(inp):
with open(inp,"r") as file:
content = file.readlines()
#delimiters = ["\n"," ",",",".","?","!",":",";","-"]
words = content
spaces = ["","'",'']
out = []
temp_list=[]
for word in words:
#using re.split
temp_list.extend(re.split('[\n, .?!:;-]',word))
for word in set(temp_list):
x = (word,temp_list.count(word))
out.append(x)
return out

Removing a word that contains symbols such as "#", "#", or ":" in python

I have just started learning Python coding this semester and we are given some revision exercise. However i am stuck on one of the question. The text file given are tweets from US elections in 2016. Sample as below:
I wish they would show out takes of Dick Cheney #GOPdebates
Candidates went after #HillaryClinton 32 times in the #GOPdebate-but remained silent about the issues that affect us.
It seems like Ben Carson REALLY doesn't want to be there. #GOPdebates
RT #ColorOfChange: Or better said: #KKKorGOP #GOPDebate
The question requires me to write a Python program that reads from the file tweets.txt. Remember that each line contains one tweet. For each tweet, your program should remove any word that is less than 8 characters long, and also any word that contains a hash (#), at (#), or colon (:) character. What i have now:
for line in open("tweets.txt"):
aline=line.strip()
words=aline.split()
length=len(words)
remove=['#','#',':']
for char in words:
if "#" in char:
char=''
if "#" in char:
char=''
if ":" in char:
char=''
which did not work, and the resulting list still contains #,# or :. Any help appreciated! Thank you!

Assigning char='' in the loop does not change or remove the actual char (actually a word) in the list, it just assign a different value to the variable char.
Instead, you might use a list comprehension / generator expression for filtering the words that satisfy the conditions.
>>> tweet = "Candidates went after #HillaryClinton 32 times in the #GOPdebate-but remained silent about the issues that affect us."
>>> [w for w in tweet.split() if not any(c in w for c in "##:") and len(w) >= 8]
['Candidates', 'remained']
Optionally, use ' '.join(...) to join the remaining words back to a "sentence", although that might not make too much sense.

Use this code.
import re
tweet=re.sub(r'#', '',tweet )
tweet=re.sub(r'#', '',tweet )
tweet=re.sub(r':', '',tweet )

The below will open the file (it's usually better to use "with open" when working with files), loop through all the lines and remove the '##:' using translate. Then remove the words with less than 8 characters giving you the output "new_line".
with open('tweets.txt') as rf:
for sentence in rf:
line = sentence.strip()
line = line.translate({ord(i): None for i in '##:'})
line = line.split()
new_line = [ word for word in line if len(word) >= 8 ]
print(new_line)
It's not the most succinct way and there's definitely better ways to do it but it's probably a bit easier to read and understand seen as though you've just started learning, like me.

Writing a Python Function that returns a list of the most common words in a text file?

I'm having a bit of difficulty figuring this out. This function is supposed to do the following/adhere to the following guidelines:
def mostCommonWords(filename, N):
return "stub"
- Read the file from filename in your function and returns a dictionary
with the frequency of each word as its value.
- Words are separated by whitespace characters, but do not include
the following punctuation characters (,.!?;). You can assume contractions
count as one word (i.e. "don't", "you'll", etc. are one word).
- The split and strip functions may be useful.
- You can assume contractions count as one word
(i.e. "don't", "you'll", etc. are one word).
- Your function should open the file for reading, and close
the file before returning.
I have the helper function completed:
def wordFrequency(filename):
frequency = {}
file = open(filename, 'r')
for line in file.readlines():
for word in line.strip().split():
if word not in frequency:
frequency[word] = 0
frequency[word] += 1
file.close()
return frequency
However, I am unsure how to go from here. Could anyone provide some guidance?

I guess the next step would be to learn how to sort your result from highest to lowest, and then decide a way to represent this information to the user. Probably the simplest would be to print it.
I guess the N parameter intends to limit how many values(words) you print?

What code is needed to complete this task? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm new to python and I'm trying to teach myself how to use it by completing tasks. I am trying to complete the task below and have written the code beneath it. However, my code does not disregard the punctuation of the input sentence and does not store the sentence's words in a list. What do I need to add to it? (keep in mind, I am BRAND NEW to python, so I have very little knowledge)
Develop a program that identifies individual words in a sentence, stores these in a list and replaces each word in the original sentence with the position of that word in the list.
For example, the sentence:
ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR
COUNTRY
contains the words ASK, NOT, WHAT, YOUR, COUNTRY, CAN, DO, FOR, YOU
The sentence can be recreated from the positions of these words in this list using the sequence
1,2,3,4,5,6,7,8,9,1,3,9,6,7,8,4,5
Save the list of words and the positions of these words in the sentence as separate files or as a single
file.
Analyse the requirements for this system and design, develop, test and evaluate a program to:
• identify the individual words in a sentence and store them in a list
• create a list of positions for words in that list
• save these lists as a single file or as separate files.
restart = 'y'
while (True):
sentence = input("What is your sentence?: ")
sentence_split = sentence.split()
sentence2 = [0]
print(sentence)
for count, i in enumerate(sentence_split):
if sentence_split.count(i) < 2:
sentence2.append(max(sentence2) + 1)
else:
sentence2.append(sentence_split.index(i) +1)
sentence2.remove(0)
print(sentence2)
restart = input("would you like restart the programme y/n?").lower()
if (restart == "n"):
print ("programme terminated")
break
elif (restart == "y"):
pass
else:
print ("Please enter y or n")

Since this are several question in one, here's a few pointers (I won't help you with the file I/O as that's not really part of the problem).
First, to filter punctuation from a sentence, refer to this question.
Second, in order to get an ordered list of unique words and their first positions, you can use an ordered dictionary. Demonstration:
>>> from collections import OrderedDict
>>> s = 'ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY'
>>> words = s.split()
>>> word2pos = OrderedDict()
>>>
>>> for index, word in enumerate(words, 1):
... if word not in word2pos:
... word2pos[word] = index
...
>>> word2pos.keys()
['ASK', 'NOT', 'WHAT', 'YOUR', 'COUNTRY', 'CAN', 'DO', 'FOR', 'YOU']
If you are not allowed to use an ordered dictionary you will have to work a little harder and read through the answers of this question.
Finally, once you have a mapping of words to their first position, no matter how you acquired it, creating the list of positions is straight forward:
>>> [word2pos[word] for word in words]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 3, 9, 6, 7, 8, 4, 5]

You have to consider a few things before, such as what do you do with punctuation as you already noted. Now, considering that you are trying to teach yourself I will attempt to only give you some tips and information that you can look at.
The [strip] command can allow you to remove certain letters/numbers from a sentence, such as a , or ..
The split command will split a string into a list of smaller strings, based on your splitting command. However, to see the place they had in the original string you could look at the index of the list. For instance, in your sentence list you can get the first word by accessing sentence[0] and so forth.
However, considering that words can be repeated this will be a bit trickier, so you might look into something called a dictionary, which is perfect for what you want to do as it allows you do something as follows:
words = {'Word': 'Ask', 'Position': [1,10]}
Now if you stuck with the simplistic approach (using a list), you can iterate over the list with an index and process each word inidividually to write them to a file, for instance along the lines of (warning, this is pseudo code).
for index, word in sentence:
do things with word
write things to a file
To get a more 'true' starting point check the below spoiler
for index, word in enumerate(sentence):
filename = str(word)+".txt"
with open(filename,'w') as fw:
fw.write("Word: "+str(word)+"\tPlace: "+str(index))
I hope this gets you under way!

Duplicate words in a list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How can I return a duplicated word in a list?
I was asked to create a function, word_count(text, n). The text is converted into a list and then return the word that is repeated n times. I've tried to write it but it seems to return every single word.
>>> repeat_word_count("one one was a racehorse two two was one too", 3)
['one']
I've used the for loop and conditioned it. I'm wanted to post my code but I'm scared that my school will find the code online :(

I think I know what it is that you are wanting to do and without seeing your code I can't point out exactly where you're going wrong so I'll take you through creating this function step by step and you should be able to figure out where you went wrong.
I think you are trying to create this:
def function(a,b):
"""where a is a sentence and b is the target number.
The function will return to you each word in the
given sentence that occurs exactly b times."""
To do this we have to do the following:
converts the sentence into a list of words and removes punctuation,capitalization,and spaces.
iterates through each unique word in the sentence and print it if it occurs exactly b times in the sentence
put these together to make a function
so in your example your sentence was "one one was a racehorse two two was one too", and you're looking for all words that occur exactly 3 times, so the function should return the word "one"
We'll look at each step one at a time.
FIRST STEP -
we have to take the sentence or sentences and convert them into a list of words. Since I don't know if you will be using sentences with punctuation and/or capitalization I'll have to assume that its possible and plan to deal with them. We'll have to omit any punctuation/spaces from the list and also change all the letters in each word to lowercase if they happened to have a capital letter because even though "Cat" and "cat" are the same word, according to a computer brain, "Cat" does NOT equal any of these:
"cat" - lowercase c doesn't match uppercase C in "Cat"
" Cat" - there is an extra space at the start of the word
"Cat." - There is a period after the word
"Cat " - There is a space after the word
So if we use "One one was a racehorse two two was one, too." as our input we'll need to handle spaces, punctuation, and capitalization. Luckily all of this work can be done with 2 lines of code by using a regular expression and list comprehension to get rid of all the junk and create a list of words.
import re
wordlist=[i.lower() for i in re.findall(r"[\w']+",sentence)]
This gives us our list of words:
['one', 'one', 'was', 'a', 'racehorse', 'two', 'two', 'was', 'one', 'too']
SECOND STEP -
Now we need to iterate through each Unique word in the wordlist and see if it occurs exactly b times. Since we only need unique words we can create a list that only contains each word exactly once by converting the wordlist from a list to a set and looping through each word in the set and counting the number of times they appear in the wordlist. Any that occur exactly b number of times are our solutions. I'm not exactly sure how you were wanting the results returned but i'm going to assume you want to have each word that fits the criteria to be printed one at a time.
for word in set(wordlist):
if wordlist.count(word)==b:
print word
THIRD STEP -
Now I'll put all this together to create my function:
import re
def repeat_word_count(a,b):
wordlist=[i.lower() for i in re.findall(r"[\w']+",a)]
for word in set(wordlist):
if wordlist.count(word)==b:
print word
I hope this helps you understand a bit better

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python script search a text file for a word [closed] - python

Related

Python 3 - Text file splitting by word, counting occurences and returning a list of sorted tuples

Removing a word that contains symbols such as "#", "#", or ":" in python

Writing a Python Function that returns a list of the most common words in a text file?

What code is needed to complete this task? [closed]

Duplicate words in a list [closed]

Categories

Resources