Issue with appending to txt file - python

I am trying to to read and write to the same file. currently the data in 2289newsML.txt exists as normal sentences but I want to append the file so it stores only tokenized versions of the same sentences.
I used the code below but even tho it prints out tokenized sentences it doesnt write them to the file.
from pathlib import Path
from nltk.tokenize import word_tokenize
news_folder = Path("file\\path\\")
news_file = (news_folder / "2289newsML.txt")
f = open(news_file, 'r+')
data = f.readlines()
for line in data:
words = word_tokenize(line)
print(words)
f.writelines(words)
f.close
any help will be appreciated.
Thanks :)

from nltk.tokenize import word_tokenize
with open("input.txt") as f1, open("output.txt", "w") as f2:
f2.writelines(("\n".join(word_tokenize(line)) for line in f1.readlines()))
Using with comprehension ensures file handle will be taken care of. So you do not need f1.close()
This program is writing to a different file.
Of course, you can do it this way too:
f = open(news_file)
data = f.readlines()
file = open("output.txt", "w")
for line in data:
words = word_tokenize(line)
print(words)
file.write('\n'.join(words))
f.close
file.close
Output.txt will have the tokenized words.

I am trying to to read and write to the same file. currently the data
in 2289newsML.txt exists as normal sentences but I want to append the
file...
Because you are opening file in r+ mode.
'r+' Open for reading and writing. The stream is positioned at the beginning of the file.
If you want to append new text at the end of file consider opening file in a+ mode.
Read more about open
Read more about file modes

Related

How to keep lines which contains specific string and remove other lines from .txt file?

How to keep lines which contains specific string and remove other lines from .txt file?
Example: I want to keep the line which has word "hey" and remove others.
test.txt file:
first line
second one
heyy yo yo
fourth line
Code:
keeplist = ["hey"]
with open("test.txt") as f:
for line in f:
for word in keeplist:
Its hard to remove lines from a file. Its usually better to write a temporary file with the desired content and then change that to the original file name.
import os
keeplist = ["hey"]
with open("test.txt") as f, open("test.txt.tmp", "w") as outf:
for line in f:
for word in keeplist:
if word in line:
outf.write(line)
break
os.rename("test.txt.tmp", "test.txt")

how to save extracted data to a text file

I have a text file with the following content
this is the first line
this is the second line
this is the third line
this is the fourth line and contains the word fox.
The goal is to write a code that reads the file, extracts the line with
the word fox in it and saves that line to a new text file.Here is the code I have so far
import os
import re
my_absolute_path = os.path.abspath(os.path.dirname(__file__))
with open('textfile', 'r') as helloFile:
for line in helloFile:
if re.findall("fox",line):
print(line.strip())
This code prints the result of the parsed text but thats not really what I want it to do. Instead I would like the code to create a new text file with that line. Is there a way to accomplish this in python?
You can do:
with open('textfile', 'r') as in_file, open('outfile', 'a') as out_file:
for line in in_file:
if 'fox' in line:
out_file.write(line)
Here I've opened the outfile in append (a) mode to accomodate multiple writes. And also used the in (str.__contains__) check for substring existence (Regex is absolutely overkill here).

Python Removing Custom Stop-Words from CSV files

Hi I am new to Python programing and I need help removing custom made stop-words from multiple files in a directory. I have read almost all the relevant posts online!!!!
I am using Python 2.7
Here are two sample lines of one of my files.I want to keep this format and just remove the stop-words from the rows:
"8806";"Demonstrators [in Chad] demand dissolution of Legis Assembly many hurt as police disperse crowd.";"19"
"44801";"Role that American oil companies played in Iraq's oil-for-food program is coming under greater scrutiny.";"19"
I have a list of stop-words in a dat file called Stopwords.
This is my code:
import io
import os
import os.path
import csv
os.chdir('/home/Documents/filesdirectory')
stopwords = open('/home/StopWords.dat','r').read().split('\n')
for i in os.listdir(os.getcwd()):
name= os.path.splitext(i)[0]
with open(i,"r") as fin:
with open(name,"w") as fout:
writer=csv.writer(fout)
for w in csv.reader(fin):
if w not in stopwords:
writer.writerow(w)
It does not give me any errors but creates empty files. Any help is very much appreciated.
import os
import os.path
os.chdir('/home/filesdirectory')
for i in os.listdir(os.getcwd()):
filein = open(i, 'r').readlines()
fileout = open(i, 'w')
stopwords= open('/home/stopwords.dat', 'r').read().split()
for line in filein:
linewords= line.split()
filteredtext1 = []
filteredtext1 = [t for t in linewords if t not in stopwords]
filteredtext = str(filteredtext1)
fileout.write(filteredtext + '\n')
Well, I solved the problem.
This code removes the stopwords(or any list of the words you give it) for each line, writes each line to a file with the same filenmae and at the end replaces the old file with a new file without stopwords. Here are the steps:
declare the working directory
enter a loop to go over each file
open the file to read and read each line using readlines()
open a file to write
read the stopwords file and split its words
enter to a for loop to deal with each line separately
split the line to words
create a list
write the words of the line as items of the list if they are not in the stopwords list
change the list to string
write the string to a file

How to add a removed word back into the original text file

I'm quite new to python. With help from the stack over flow community, I have managed to do part of my task. My program removes a random word from a small text file and assigns it to a variable and puts it in another text file.
However, at the end of my program, I need to put that random word back into the text file for my program to work so someone can use multiple times.
All the words in the text file are in no specific order but each word is on and need to be on a separate line.
This is the program which removes the random word from the text file.
with open("words.txt") as f: #Open the text file
wordlist = [x.rstrip() for x in f]
replaced_word = random.choice(wordlist)
newwordlist = [word for word in wordlist if word != replaced_word]
with open("words.txt", 'w') as f: # Open file for writing
f.write('\n'.join(newwordlist))
If I have missed out any vital information which is needed I'm happy to provide that :)
Why not just copy the text file at the start of your program? Perform what you have so far on your copy so you will always leave the original file unaltered.
import shutil
shutil.copyfile("words.txt", "newwords.txt")
with open("newwords.txt") as f: #Open the text file
wordlist = [x.rstrip() for x in f]
replaced_word = random.choice(wordlist)
newwordlist = [word for word in wordlist if word != replaced_word]
with open("newwords.txt", 'w') as f: # Open file for writing
f.write('\n'.join(newwordlist))
You are replacing your words.txt file, therefore losing all the words. If you just create a new file for the random word you don't need to rewrite your original file. Something like :
...
with open("words_random.txt", 'w') as f:
w.write(replaced_word)
You will have a new text file with only the random word.

How to save the output into a new txt file?

I use this code to split an unstructured text file to its tokens and output each token in one line:
with open("C:\\...\\...\\...\\record-13.txt") as f:
lines = f.readlines()
for line in lines:
words = line.split()
for word in words:
print (word)
Now I want to save the output into a new text file instead of printing it, I modify the code to this:
with open("C:\\...\\...\\...\\record-13.txt") as f:
lines = f.readlines()
for line in lines:
words = line.split()
for word in words:
file = open ("tokens.txt", "w")
file.write (word)
file.close()
but it doesn't work. Would you please tell me what's wrong with that?
You are opening the file for each token, and because you are opening with mode 'w' the file is truncated. You can open with mode 'a' to append to the file, but that would be very inefficient.
A better way is to open the output file at the very start and let the context manager close it for you. There's also no need to read the entire file into memory at the start.
with open("in.txt") as in_file, open("tokens.txt", "w") as out_file:
for line in in_file:
words = line.split()
for word in words:
out_file.write(word)
out_file.write("\n")
I suspect you want each word to be on a different line, so make sure you also write a new line character.

Categories

Resources