I am a very beginning programmer looking for some help with what is probably a very simple problem. I am trying to write a program that will read a .txt file and then replace any words with an 'e' in them with 'xxxxx'.
Here is what I have so far:
def replace_ewords(text):
ntext = text.replace('e', 'xxxxx')
def main():
filename = "gb.txt"
text = readfile(filename)
new_text = replace_ewords(text)
print(new_text)
main()
Could someone help me with this any give me any critques/pointers?
def replace_ewords(text):
words = []
text = text.split(' ')
for word in text:
if "e" in text:
words.append("x"*len(text))
else:
words.append(text)
return " ".join(words)
with open('filename') as f: # Better way to open files :)
line_list = []
for line in file: # iterate line by line
word_list = []
for word in line.split() # split each line into words
if 'e' in word:
word = 'xxxxx' # replace content if word has 'e'
word_list.append(word) # create new list for the word content in new file
line_list.append(' '.join(word_list)) # list of modified line
# write this content to the file
One line for the loops can be written in the form of list comprehension as:
[' '.join([('xxxx' if 'e' in word else word) for word in line]) for line in file.readlines()]
one liner:
print "\n".join([" ".join([[word, "xxx"]["e" in word] for word in line.split()]) for line in open("file").readlines()])
Related
I have a txt file with lines of text like this, and I want to swap the word in
quotations with the last word that is separated from the sentence with a tab:
it looks like this:
This "is" a person are
She was not "here" right
"The" pencil is not sharpened a
desired output:
This "are" a person is
She was not "right" here
Some ideas:
#1: Use Numpy
Seperate all the words by whitespace with numpy-> ['This','"is"','a','person',\t,'are']
Problems:
How do I tell python the position of the quoted word
How to convert the list back to normal text. Concatenate all?
#2: Use Regex
Use regex and find the word in ""
with open('readme.txt','r') as x:
x = x.readlines()
swap = x[-1]
re.findall(\"(\w+)\", swap)
Problems:
I don't know what to read the txt file with regex. most examples I see here will assign the entire sentence to a variable.
Is it something like this?
with open('readme.txt') as f:
lines = f.readlines()
lines.findall(....)
Thanks guys
You don't really need re for something this trivial.
Assuming you want to rewrite the file:
with open('foo.txt', 'r+') as txt:
lines = txt.readlines()
for k, line in enumerate(lines):
words = line.split()
for i, word in enumerate(words[:-1]):
if word[0] == '"' and word[-1] == '"':
words[i] = f'"{words[-1]}"'
words[-1] = word[1:-1]
break
lines[k] = ' '.join(words[:-1]) + f'\t{words[-1]}'
txt.seek(0)
print(*lines, sep='\n', file=txt)
txt.truncate()
This is my solution:
regex = r'"[\s\S]*"'
import re
file1 = open('test.txt', 'r')
count = 0
while True:
# Get next line from file
line = file1.readline()
# if line is empty
# end of file is reached
if not line:
break
get_tab = line.strip().split('\t')[1]
regex = r'\"[\s\S]*\"'
print("original: {} mod ----> {}".format(line.strip(), re.sub(regex, get_tab, line.strip().split('\t')[0])))
Try:
import re
pat = re.compile(r'"([^"]*)"(.*\t)(.*)')
with open("your_file.txt", "r") as f_in:
for line in f_in:
print(pat.sub(r'"\3"\2\1', line.rstrip()))
Prints:
This "are" a person is
She was not "right" here
"a" pencil is not sharpened The
I guess this is also a way to solve it:
Input readme.txt contents:
This "is" a person are
She was not "here" right
"The" pencil is not sharpened a
Code:
import re
changed_text = []
with open('readme.txt') as x:
for line in x:
splitted_text = line.strip().split("\t") # ['This "is" a person', 'are'] etc.
if re.search(r'\".*\"', line.strip()): # If a quote is found
qouted_text = re.search(r'\"(.*)\"', line.strip()).group(1)
changed_text.append(splitted_text[0].replace(qouted_text, splitted_text[1])+"\t"+qouted_text)
with open('readme.txt.modified', 'w') as x:
for line in changed_text:
print(line)
x.write(line+"\n")
Result (readme.txt.modified):
Thare "are" a person is
She was not "right" here
"a" pencil is not sharpened The
This is what I tried. But I got an error message:
Error: <re.Match object; span=(74, 76), match='ai'>
The program should print out any words that contain two consecutive vowels.
Text.txt file content:
text = "This is a test file with a single word per line. Print any words that contain two vowels next to each other." a = text.split(" ").rstrip("\n")
my_file = open("test.txt", "w")
Python code:
reg = r"[aeiou][aeiou]"
with open("text.txt") as f:
for word in f:
word = word.strip()
print(re.search(reg, word, re.I))
you can go through like this:
import re
with open('test.txt') as f:
for line in f:
line = line.strip()
if re.search(r"[aeiou][aeiou]",line,re.I):
print(line)
You can do this without a regex as well:
with open(ur_file) as f:
for line in f:
for x,y in zip(line,line[1:]):
if all(e.lower() in 'aeiou' for e in (x,y)):
print(line.rstrip())
break
Also:
import re
words = "This is a test file with a single word per line. Print any words that contain two vowels next to each other.".split()
print([w for w in words if re.match(r".*[aeiouy][aeiouy].*",w)])
For a small number of words, you could create a list of all two-vowel pairs and check if any of them are in your target word. This reads a little cleaner than the other solutions, but is likely much less performant.
def has_vowel_pair(word):
vowels = 'aeiouy'
vowel_pairs = {x+y for x in vowels for y in vowels} + {y+x for x in vowels for y in vowels}
return any(pair in word for pair in vowel_pairs)
I am relatively new to Python and I am currently working on a compression program that uses lists containing positions of words in a lists and a list of words that make up the sentence. So far I have written my program inside two functions, the first function; 'compression', gets the words that make up the sentence and the positions of those words. My second function is called 'recreate', this function uses he lists to recreate the sentence. The recreated senetence is then stored in a file called recreate.txt. My issue is that the positions of words and the words that make up the sentence are not being written to their respective files and the 'recreate' file is not being created and written to. Any help would be greatly appreciated. Thanks :)
sentence = input("Input the sentence that you wish to be compressed")
sentence.lower()
sentencelist = sentence.split()
d = {}
plist = []
wds = []
def compress():
for i in sentencelist:
if i not in wds:
wds.append(i)
for i ,j in enumerate(sentencelist):
if j in (d):
plist.append(d[j])
else:
plist.append(i)
print (plist)
tsk3pos = open ("tsk3pos.txt", "wt")
for item in plist:
tsk3pos.write("%s\n" % item)
tsk3pos.close()
tsk3wds = open ("tsk3wds.txt", "wt")
for item in wds:
tsk3wds.write("%s\n" % item)
tsk3wds.close()
print (wds)
def recreate(compress):
compress()
num = list()
wds = list()
with open("tsk3wds.txt", "r") as txt:
for line in txt:
words += line.split()
with open("tsk3pos.txt", "r") as txt:
for line in txt:
num += [int(i) for i in line.split()]
recreate = ' '.join(words[pos] for pos in num)
with open("recreate.txt", "wt") as txt:
txt.write(recreate)
UPDATED
I have fixed all other problems except the recreate function which will not make the 'recreate' file and will not recreate the sentence with the words, although
it recreates the sentence with the positions.
def recreate(compress): #function that will be used to recreate the compressed sentence.
compress()
num = list()
wds = list()
with open("words.txt", "r") as txt: #with statement opening the word text file
for line in txt: #iterating over each line in the text file.
words += line.split() #turning the textfile into a list and appending it to num
with open("tsk3pos.txt", "r") as txt:
for line in txt:
num += [int(i) for i in line.split()]
recreate = ' '.join(wds[pos] for pos in num)
with open("recreate.txt", "wt") as txt:
txt.write(recreate)
main()
def main():
print("Do you want to compress an input or recreate a compressed input?")
user = input("Type 'a' if you want to compress an input. Type 'b' if you wan to recreate an input").lower()
if user not in ("a","b"):
print ("That's not an option. Please try again")
elif user == "a":
compress()
elif user == "b":
recreate(compress)
main()
main()
A simpler ( yet less efficient ) approach :
recreate_file_object = open ( "C:/FullPathToWriteFolder/recreate.txt" , "w" )
recreate_file_object.write ( recreate )
recreate_file_object.close ( )
How do I return all the unique words from a text file using Python?
For example:
I am not a robot
I am a human
Should return:
I
am
not
a
robot
human
Here is what I've done so far:
def unique_file(input_filename, output_filename):
input_file = open(input_filename, 'r')
file_contents = input_file.read()
input_file.close()
word_list = file_contents.split()
file = open(output_filename, 'w')
for word in word_list:
if word not in word_list:
file.write(str(word) + "\n")
file.close()
The text file the Python creates has nothing in it. I'm not sure what I am doing wrong
for word in word_list:
if word not in word_list:
every word is in word_list, by definition from the first line.
Instead of that logic, use a set:
unique_words = set(word_list)
for word in unique_words:
file.write(str(word) + "\n")
sets only hold unique members, which is exactly what you're trying to achieve.
Note that order won't be preserved, but you didn't specify if that's a requirement.
Simply iterate over the lines in the file and use set to keep only the unique ones.
from itertools import chain
def unique_words(lines):
return set(chain(*(line.split() for line in lines if line)))
Then simply do the following to read all unique lines from a file and print them
with open(filename, 'r') as f:
print(unique_words(f))
This seems to be a typical application for a collection:
...
import collections
d = collections.OrderedDict()
for word in wordlist: d[word] = None
# use this if you also want to count the words:
# for word in wordlist: d[word] = d.get(word, 0) + 1
for k in d.keys(): print k
You could also use a collection.Counter(), which would also count the elements you feed in. The order of the words would get lost though. I added a line for counting and keeping the order.
string = "I am not a robot\n I am a human"
list_str = string.split()
print list(set(list_str))
def unique_file(input_filename, output_filename):
input_file = open(input_filename, 'r')
file_contents = input_file.read()
input_file.close()
duplicates = []
word_list = file_contents.split()
file = open(output_filename, 'w')
for word in word_list:
if word not in duplicates:
duplicates.append(word)
file.write(str(word) + "\n")
file.close()
This code loops over every word, and if it is not in a list duplicates, it appends the word and writes it to a file.
Using Regex and Set:
import re
words = re.findall('\w+', text.lower())
uniq_words = set(words)
Other way is creating a Dict and inserting the words like keys:
for i in range(len(doc)):
frase = doc[i].split(" ")
for palavra in frase:
if palavra not in dict_word:
dict_word[palavra] = 1
print dict_word.keys()
The problem with your code is word_list already has all possible words of the input file. When iterating over the loop you are basically checking if a word in word_list is not present in itself. So it'll always be false. This should work.. (Note that this wll also preserve the order).
def unique_file(input_filename, output_filename):
z = []
with open(input_filename,'r') as fileIn, open(output_filename,'w') as fileOut:
for line in fileIn:
for word in line.split():
if word not in z:
z.append(word)
fileOut.write(word+'\n')
Use a set. You don't need to import anything to do this.
#Open the file
my_File = open(file_Name, 'r')
#Read the file
read_File = my_File.read()
#Split the words
words = read_File.split()
#Using a set will only save the unique words
unique_words = set(words)
#You can then print the set as a whole or loop through the set etc
for word in unique_words:
print(word)
try:
with open("gridlex.txt",mode="r",encoding="utf-8")as india:
for data in india:
if chr(data)==chr(data):
print("no of chrats",len(chr(data)))
else:
print("data")
except IOError:
print("sorry")
I would like to define a function scaryDict() which takes one parameter (a textfile) and returns the words from the textfile in alphabetical order, basically produce a dictionary but does not print any one or two letter words.
Here is what I have so far...it isn't much but I don't know the next step
def scaryDict(fineName):
inFile = open(fileName,'r')
lines = inFile.read()
line = lines.split()
myDict = {}
for word in inFile:
myDict[words] = []
#I am not sure what goes between the line above and below
for x in lines:
print(word, end='\n')
You are doing fine till line = lines.split(). But your for loop must loop through the line array, not the inFile.
for word in line:
if len(word) > 2: # Make sure to check the word length!
myDict[word] = 'something'
I'm not sure what you want with the dictionary (maybe get the word count?), but once you have it, you can get the words you added to it by,
allWords = myDict.keys() # so allWords is now a list of words
And then you can sort allWords to get them in alphabetical order.
allWords.sort()
I would store all of the words into a set (to eliminate dups), then sort that set:
#!/usr/bin/python3
def scaryDict(fileName):
with open(fileName) as inFile:
return sorted(set(word
for line in inFile
for word in line.split()
if len(word) > 2))
scaryWords = scaryDict('frankenstein.txt')
print ('\n'.join(scaryWords))
Also keep in mind as of 2.5 the 'with' file contains an enter and exit methods which can prevent some issues (such as that file never getting closed)
with open(...) as f:
for line in f:
<do something with line>
Unique set
Sort the set
Now you can put it all together.
sorry that i am 3 years late : ) here is my version
def scaryDict():
infile = open('filename', 'r')
content = infile.read()
infile.close()
table = str.maketrans('.`/()|,\';!:"?=-', 15 * ' ')
content = content.translate(table)
words = content.split()
new_words = list()
for word in words:
if len(word) > 2:
new_words.append(word)
new_words = list(set(new_words))
new_words.sort()
for word in new_words:
print(word)