I want to delete text of multiple docx file using python language.
Let's say the contents of a file are:
This is line 1
This is line 2
This is line 3
This is line 4
I want to delete the very last line only i.e. This is line 4.
I've tried many code but getting errors.
Try 1:
with open(r"FILE_PATH.docx", 'r+', errors='ignore') as fp:
# read an store all lines into list
lines = fp.readlines()
# move file pointer to the beginning of a file
fp.seek(0)
# truncate the file
fp.truncate()
# start writing lines except the last line
# lines[:-1] from line 0 to the second last line
fp.writelines(lines[:-1])
Above code runs with 0 errors but getting some loss of data in the docx file.
See the relevant screenshots here and here.
You will not get the correct lines from a docx using that method, a docx is not the like a text file. (If you use your current method on a txt file it will work).
Do this and you can see what you are removing:
with open(r"FILE_PATH.docx", 'r+', errors='ignore') as fp:
# read an store all lines into list
lines = fp.readlines()
print(lines[-1]) # or print(lines) to see all the lines
You are not removing This is line 4 you are removing a part of the docx file.
Although there are ways to read a docx without additional libraries, using something like docx2txt or textract might be easier.
There are other questions in stack overflow that address how to read and modify a docx, take a look and you will find a way to adapt your code if a docx is still what you want to work with.
Related
and thank you for taking the time to read this post. This is literally my first time trying to use Python so bare with me.
My Target/Goal: Edit the original text file (Original .txt file) so that for every domain listed an "OR" is added in between them (below target formatting image). Any help is greatly appreciated.
I have been able to google the information to open and read the txt file, however, I am not sure how to do the formatting part.
Script
Original .txt file
Target formatting
You can achieve this in a couple lines as:
with open(my_file) as fd:
result = fd.read().replace("\n", " OR ")
You could then write this to another file with:
with open(formatted_file, "w") as fd:
fd.write(result)
something you could do is the following
import re
# This opens the file in read mode
with open('Original.txt', 'r') as file:
# Read the contents of the file
contents = file.read()
# Seems that your original file has line breaks to each domain so
# you could replace it with the word "OR" using a regular expression
contents = re.sub(r'\n+', ' OR ', contents)
# Then you should open the file in write mode
with open('Original.txt', 'w') as file:
# and finally write the modified contents to the file
file.write(contents)
a suggestion is, maybe you want to try first writing in a different file to see if you are happy with the results (or do a copy of Original.txt just in case)
with open('AnotherOriginal.txt', 'w') as file:
file.write(contents)
I am new to python and using it for my internship. My goal is to pull specific data from about 100 .ls documents (all in the same folder) and then write it to another .txt file and from there import it into excel. My problem is I can read all the files, but cannot figure out how to pull the specifics from that file into a list. From the list I want to write them into a .txt file and then import to excel.
Is there anyway to read set readlines() to only capture certain lines?
It's hard to know exactly what you want without an example or sample code/content. What you might do is create a list and append the desired line to it.
result_list = [] # Create an empty list
with open("myfile.txt", "r") as f:
Lines = f.readlines() # read the lines of the file
for line in Lines: # loop through the lines
if "desired_string" in line:
result_list.append(line) # if the line contains the string, the line is added
I have the following txt file which is generated by an instrument:
https://www.dropbox.com/s/01rx9fk5e64y4b1/Test.txt?dl=0
Every time I run an experiment the software of the instrument appended a header which starts with "//" followed by the acquired data to the same txt file. Therefore the txt file above contains many different experiments which are separated by the header. I need to divide the file above into different txt files each one containing only one experiment and including the header.
Would the best strategy be to read line by line the file and generate a new txt file every time the programme encounters the first line that starts with "//"?
Maybe there is a better way using Pandas?
Any suggestions would be very much appreciated.
Any suggestions would be very much appreciated.
I examined your file and it seems every header is preceded by empty line, thefore I propose exploiting that fact following way
fileno = 0
inputfile = open('Test.txt', 'r')
outfile = open(f'out{fileno}.txt', 'w')
for line in inputfile:
if not line.strip():
fileno += 1
outfile.close()
outfile = open(f'out{fileno}.txt', 'w')
else:
outfile.write(line)
outfile.close()
inputfile.close()
Explanation: process line-by-line so there is not need to load whole file to memory, if empty line encountered close current outfile and prepare new outfile with next number. Disclaimer: I used so-called f-strings so Python 3.6 or newer is required.
The code below is meant to find any xls or csv file used in a process. The .log file contains full paths with extensions and definitely contains multiple values with "xls" or "csv". However, Python can't find anything...Any idea? The weird thing is when I copy the content of the log file and paste it to another notepad file and save it as log, it works then...
infile=r"C:\Users\me\Desktop\test.log"
important=[]
keep_words=["xls","csv"]
with open(infile,'r') as f:
for line in f:
for word in keep_words:
if word in line:
important.append(line)
print(important)
I was able to figure it out...encoding issue...
with io.open(infile,encoding='utf16') as f:
You must change the line
for line in f:
to
for line in f.readlines():
You made the python search in the bytes opened file, not in his content, even in his lines (in a list, just like the readlines method);
I hope I was able to help (sorry about my bad English).
I have a existing file /tmp/ps/snaps.txt
It has following data:
key=default_value
I want the contents of this file to be:
key=default_value,value,value.....valuen
My code for this is (This runs everytime the main python code runs):
with open("/tmp/ps/snaps.txt", "a+") as text_file:
text_file.write("value")
But the output I get is :
key=default_value,
value,value.....value
Basically I dont want my values written on the next line,
Is there any solution for this ?
The line-terminator at the end of the original file is preventing you from appending on the same line.
You have 3 options:
remove that line terminator: your code will work as-is
open file in append mode as you do, seek back past the linefeed, and write from there (putting a linefeed for the next time or last char(s) will be overwritten:
code:
with open(filename, "a+") as text_file:
text_file.seek(os.path.getsize(filename)-len(os.linesep))
text_file.write("{},\n".format(snapshot_name))
read the file fully, strip the last linefeed (using str.rstrip()) and write the contents + the extra contents. The stablest option if you can afford the memory+read overhead for the existing contents.
code:
with open(filename,"r") as text_file:
contents = text_file.read().rstrip()
with open(filename,"w") as text_file:
text_file.write(contents)
text_file.write("{},".format(snapshot_name))
option 2 is a hack because it tries to edit a text file in read/write, not very good, but demonstrates that it can be done.