I use this code to split an unstructured text file to its tokens and output each token in one line:
with open("C:\\...\\...\\...\\record-13.txt") as f:
lines = f.readlines()
for line in lines:
words = line.split()
for word in words:
print (word)
Now I want to save the output into a new text file instead of printing it, I modify the code to this:
with open("C:\\...\\...\\...\\record-13.txt") as f:
lines = f.readlines()
for line in lines:
words = line.split()
for word in words:
file = open ("tokens.txt", "w")
file.write (word)
file.close()
but it doesn't work. Would you please tell me what's wrong with that?
You are opening the file for each token, and because you are opening with mode 'w' the file is truncated. You can open with mode 'a' to append to the file, but that would be very inefficient.
A better way is to open the output file at the very start and let the context manager close it for you. There's also no need to read the entire file into memory at the start.
with open("in.txt") as in_file, open("tokens.txt", "w") as out_file:
for line in in_file:
words = line.split()
for word in words:
out_file.write(word)
out_file.write("\n")
I suspect you want each word to be on a different line, so make sure you also write a new line character.
Related
How to keep lines which contains specific string and remove other lines from .txt file?
Example: I want to keep the line which has word "hey" and remove others.
test.txt file:
first line
second one
heyy yo yo
fourth line
Code:
keeplist = ["hey"]
with open("test.txt") as f:
for line in f:
for word in keeplist:
Its hard to remove lines from a file. Its usually better to write a temporary file with the desired content and then change that to the original file name.
import os
keeplist = ["hey"]
with open("test.txt") as f, open("test.txt.tmp", "w") as outf:
for line in f:
for word in keeplist:
if word in line:
outf.write(line)
break
os.rename("test.txt.tmp", "test.txt")
I have a text file with the following content
this is the first line
this is the second line
this is the third line
this is the fourth line and contains the word fox.
The goal is to write a code that reads the file, extracts the line with
the word fox in it and saves that line to a new text file.Here is the code I have so far
import os
import re
my_absolute_path = os.path.abspath(os.path.dirname(__file__))
with open('textfile', 'r') as helloFile:
for line in helloFile:
if re.findall("fox",line):
print(line.strip())
This code prints the result of the parsed text but thats not really what I want it to do. Instead I would like the code to create a new text file with that line. Is there a way to accomplish this in python?
You can do:
with open('textfile', 'r') as in_file, open('outfile', 'a') as out_file:
for line in in_file:
if 'fox' in line:
out_file.write(line)
Here I've opened the outfile in append (a) mode to accomodate multiple writes. And also used the in (str.__contains__) check for substring existence (Regex is absolutely overkill here).
I am trying to to read and write to the same file. currently the data in 2289newsML.txt exists as normal sentences but I want to append the file so it stores only tokenized versions of the same sentences.
I used the code below but even tho it prints out tokenized sentences it doesnt write them to the file.
from pathlib import Path
from nltk.tokenize import word_tokenize
news_folder = Path("file\\path\\")
news_file = (news_folder / "2289newsML.txt")
f = open(news_file, 'r+')
data = f.readlines()
for line in data:
words = word_tokenize(line)
print(words)
f.writelines(words)
f.close
any help will be appreciated.
Thanks :)
from nltk.tokenize import word_tokenize
with open("input.txt") as f1, open("output.txt", "w") as f2:
f2.writelines(("\n".join(word_tokenize(line)) for line in f1.readlines()))
Using with comprehension ensures file handle will be taken care of. So you do not need f1.close()
This program is writing to a different file.
Of course, you can do it this way too:
f = open(news_file)
data = f.readlines()
file = open("output.txt", "w")
for line in data:
words = word_tokenize(line)
print(words)
file.write('\n'.join(words))
f.close
file.close
Output.txt will have the tokenized words.
I am trying to to read and write to the same file. currently the data
in 2289newsML.txt exists as normal sentences but I want to append the
file...
Because you are opening file in r+ mode.
'r+' Open for reading and writing. The stream is positioned at the beginning of the file.
If you want to append new text at the end of file consider opening file in a+ mode.
Read more about open
Read more about file modes
I'm quite new to python. With help from the stack over flow community, I have managed to do part of my task. My program removes a random word from a small text file and assigns it to a variable and puts it in another text file.
However, at the end of my program, I need to put that random word back into the text file for my program to work so someone can use multiple times.
All the words in the text file are in no specific order but each word is on and need to be on a separate line.
This is the program which removes the random word from the text file.
with open("words.txt") as f: #Open the text file
wordlist = [x.rstrip() for x in f]
replaced_word = random.choice(wordlist)
newwordlist = [word for word in wordlist if word != replaced_word]
with open("words.txt", 'w') as f: # Open file for writing
f.write('\n'.join(newwordlist))
If I have missed out any vital information which is needed I'm happy to provide that :)
Why not just copy the text file at the start of your program? Perform what you have so far on your copy so you will always leave the original file unaltered.
import shutil
shutil.copyfile("words.txt", "newwords.txt")
with open("newwords.txt") as f: #Open the text file
wordlist = [x.rstrip() for x in f]
replaced_word = random.choice(wordlist)
newwordlist = [word for word in wordlist if word != replaced_word]
with open("newwords.txt", 'w') as f: # Open file for writing
f.write('\n'.join(newwordlist))
You are replacing your words.txt file, therefore losing all the words. If you just create a new file for the random word you don't need to rewrite your original file. Something like :
...
with open("words_random.txt", 'w') as f:
w.write(replaced_word)
You will have a new text file with only the random word.
Basically, I want a script that opens a file, and then goes through the file and sees if the file contains any curse words. If a line in the file contains a curse word, then I want to replace that line with "CENSORED". So far, I think I'm just messing up the code somehow because I'm new to Python:
filename = input("Enter a file name: ")
censor = input("Enter the curse word that you want censored: ")
with open(filename)as fi:
for line in fi:
if censor in line:
fi.write(fi.replace(line, "CENSORED"))
print(fi)
I am new to this, so I'm probably just messing something up...
Line, as in This code (if "Hat" was a curse word):
There Is
A
Hat
Would be:
There Is
A
CENSORED
You cannot write to the same file your are reading, for two reasons:
You opened the file in read-only mode, you cannot write to such a file. You'd have to open the file in read-write mode (using open(filename, mode='r+')) to be able to do what you want.
You are replacing data as you read, with lines that are most likely going to be shorter or longer. You cannot do that in a file. For example, replacing the word cute with censored would create a longer line, and that would overwrite not just the old line but the start of the next line as well.
You need to write out your changed lines to a new file, and at the end of that process replace the old file with the new.
Note that your replace() call is also incorrect; you'd call it on the line:
line = line.replace(censor, 'CENSORED')
The easiest way for you to achieve what you want is to use the fileinput module; it'll let you replace a file in-place, as it'll handle writing to another file and the file swap for you:
import fileinput
filename = input("Enter a file name: ")
censor = input("Enter the curse word that you want censored: ")
for line in fileinput.input(filename, inplace=True):
line = line.replace(censor, 'CENSORED')
print(line, end='')
The print() call is a little magic here; the fileinput module temporarily replaces sys.stdout meaning that print() will write to the replacement file rather than your console. The end='' tells print() not to include a newline; that newline is already part of the original line read from the input file.
Consider:
filename = input("Enter a file name: ")
censor = input("Enter the curse word that you want censored: ")
# Open the file, iterate through the lines and censor them, storing them in lines list
with open(filename) as f:
lines = [line.replace(censor, 'CENSORED').strip() for line in f]
# If you want to re-write the censored file, re-open it, and write the lines
with open(filename, 'w') as f:
f.write('\n'.join(lines))
We're using a list comprehension to censor the lines of the file.
If you want to replace the entire line, and not just the word, replace
lines = [line.replace(censor, 'CENSORED').strip() for line in f]
with
lines = ['CENSORED' if censor in line else line.strip() for line in f]
filename = input("Enter a file name: ")
censor = input("Enter the curse word that you want censored: ")
with open(filename)as fi:
for line in fi:
if censor in line:
print("CENSORED")
else:
print(line)
with open('filename.txt', 'r') as data:
the_lines = data.readlines()
with open('filename.txt', 'w') as data:
for line_content in the_lines:
if curse_word in line_content:
data.write('Censored')
else:
data.write(line_content)
You have only opened the file for reading. Some options:
Read the whole file in, do the replacement, and write it over the original file again.
Read the file line-by-line, process and write the lines to a new file, then delete the old file and rename in the new file.
Use the fileinput module, which does all the work for you.
Here's an example of the last option:
import fileinput,sys
for line in fileinput.input(inplace=1):
line = line.replace('bad','CENSORED')
sys.stdout.write(line)
And use:
test.py file1.txt file2.txt file3.txt
Each file will be edited in place.