How to delete 'empty' lines in txt file - python

I think I've tried everything to make it work but I still can't get the results I want. I basically want to delete empty lines in txt file that my other script created. I've tried: .isspace(), deleting lines with n amount of spaces, deleting lines with '\n'. None of these worked can you guys help me? Here is part of txt file and my code:
Gmina Wiejska
Urząd Gminy Brzeziny
ul. Sienkiewicza 16a 95-060 Brzeziny
Łącko
Gmina Wiejska
Urząd Gminy Łącko
Łącko 445 33-390 Łącko
Węgliniec
Gmina Miejsko-wiejska
Urząd Gminy i Miasta Węgliniec
ul. Sikorskiego 3 59-940 Węgliniec```
code:
delete = ['<td align="center" class="top" colspan="3"><b>',
'</td>',
'<br/></b></td>',
'<br/></b>',
'None',
'brak',
'[]',
'\n'
]
with open('/Users/dominikgrzeskowiak/python/gminy/text/text1.txt','r+') as file:
for line in file:
print(a)
for i in delete:
line = line.replace(i,'')
print(i)
print(line)
if line != ' ' or line != ' \n' or line != ' ':
with open('/Users/dominikgrzeskowiak/python/gminy/text/text2.txt','a') as f:
f.write(line+'\n')

Just check if the line is not empty after removing blanks with strip
with open('text1.txt', 'r+', encoding='utf-8') as file, open('text2.txt', 'a', encoding='utf-8') as f:
for line in file:
if line.strip():
f.write(line)
You should open text2 once, not every line in the text1.

You could use regex for searching patterns
import re
with open('somefile.txt','r') as file:
txt = file.readlines()for i in txt:
if re.fullmatch("\\s*",i):
continue
print(i,end="")
but you could do it with pure python too
with open('somefile.txt','r') as file:
txt = file.readlines()
for i in txt:
if i.strip() == '':
continue
print(i, end='')

Related

Not printing file contents

I'm trying to read and print the contents of a text file, but nothing shows up:
coffee = open('coffeeInventory.txt' , 'r')
coffee.seek(0)
line = coffee.readline()
while line != '':
print(line)
coffee.close()
Thank you for any advice.
Try this:
with open('coffeeInventory.txt') as inf:
for line in inf:
print(line, end='')
readline leaves a newline on the end of the line, so use end='' to prevent print from appending its own newline.
Try this code for each line, please:
file = open('coffeeInventory.txt')
lines = file.readlines()
for line in lines:
print(line)
file.close()

How can we write a text file from variable using python?

I am working on NLP project and have extracted the text from pdf using PyPDF2. Further, I removed the blank lines. Now, my output is being shown on the console but I want to populate the text file with the same data which is stored in my variable (file).
Below is the code which is removing the blank lines from a text file.
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file=line
print(file)
Output on Console:
Eclipse,
Visual Studio 2012,
Arduino IDE,
Java
,
HTML,
CSS
2013
Excel
.
Now, I want the same data in my (resume1.txt) text file. I have used three methods but all these methods print a single dot in my resume1.txt file. If I see at the end of the text file then there is a dot which is being printed.
Method 1:
with open("resume1.txt", "w") as out_file:
out_file.write(file)
Method 2:
print(file, file=open("resume1.txt", 'w'))
Method 3:
pathlib.Path('resume1.txt').write_text(file)
Could you please be kind to assist me in populating the text file. Thank you for your cooperation.
First of all, note that you are writing to the same file losing the old data, I don't know if you want to do that. Other than that, every time you write using those methods, you are overwriting the data you previously wrote to the output file. So, if you want to use these methods, you must write just 1 time (write all the data).
SOLUTIONS
Using method 1:
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
with open("resume1.txt", "w") as out_file:
out_file.write(to_save)
Using method 2:
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
print(to_save, file=open("resume1.txt", 'w'))
Using method 3:
import pathlib
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
pathlib.Path('resume1.txt').write_text(to_save)
In these 3 methods, I have used to_save = '\n'.join(to_file) because I'm assuming you want to separate each line of other with an EOL, but if I'm wrong, you can just use ''.join(to_file) if you want not space, or ' '.join(to_file) if you want all the lines in a single one.
Other method
You can do this by using other file, let's say 'output.txt'.
out_file = open('output.txt', 'w')
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
out_file.write(file)
out_file.write('\n') # EOL
out_file.close()
Also, you can do this (I prefer this):
with open('output.txt', 'w') as out_file:
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
out_file.write(file)
out_file.write('\n') # EOL
First post on stack, so excuse the format
new_line = ""
for line in open('resume1.txt', "r"):
for char in line:
if char != " ":
new_line += char
print(new_line)
with open('resume1.txt', "w") as f:
f.write(new_line)

Python to read txt files and delete lines that contains same part

I have a tons (1000+) of txt files that looks like this
TextTextText('aaa/bbb`ccc' , "ddd.eee");
TextTextText('yyy/iii`ooo' , "rrr.ttt");
TextTextText('aaa/fff`ggg' , "hhh.jjj");
What I want to achieve is to delete all lines that contains same "aaa" part, and leave only one line with it (remove all duplicates).
my code so far:
import os
from collections import Counter
sourcepath = os.listdir('Process_Directory3/')
for file in sourcepath:
inputfile = 'Process_Directory3/' + file
outputfile = "Output_Directory/" + file
lines_seen = set()
outfile = open(outputfile, "w")
for line in open(inputfile, "r"):
print(line)
cut_line = line.split("'")
new_line = cut_line[1]
cut_line1 = new_line.split("/")
new_line1 = cut_line1[0]
if new_line1 not in lines_seen:
outfile.write(new_line1)
lines_seen.add(new_line1)
outfile.close()
My code is not working at all, I dont get any results
Console Report:
Line13 in <module>
new_line = cut_line[1]
IndexError: list index out of range
Sorry for my bad writing, it's my first post so far :D
Best Regards
Update:
I added
startPattern = "TextTextText"
if(startPattern in line):
to make sure i target only lines that begins with "TextTextText", but for some reason I am getting .txt in destination folder that contains only 1 line of content "aaa".
In the end of the day, here is a fully working code:
import os
sourcepath = os.listdir('Process_Directory3/')
for file in sourcepath:
inputfile = 'Process_Directory3/' + file
outputfile = "Output_Directory/" + file
lines_seen = set()
outfile = open(outputfile, "w")
for line in open(inputfile, "r"):
if line.startswith("TextTextText"):
try:
cut_line = line.split("'")
new_line = cut_line[1]
cut_line1 = new_line.split("/")
new_line1 = cut_line1[0]
if new_line1 not in lines_seen:
outfile.write(line)
lines_seen.add(new_line1)
except:
pass
else:
outfile.write(line)
outfile.close()
Thanks for a great help guys!
Use a try-except block in inner for loop. This will prevent your program from being interrupted if any error is encountered due to any line which doesn't contain ' or /.
Update:
I've tried the code given below and it worked fine for me.
sourcepath = os.listdir('Process_Directory3/')
for file in sourcepath:
inputfile = 'Process_Directory3/' + file
outputfile = "Output_Directory/" + file
lines_seen = set()
outfile = open(outputfile, "w")
for line in open(inputfile, "r"):
try:
cut_line = line.split("'")
new_line = cut_line[1]
cut_line1 = new_line.split("/")
new_line1 = cut_line1[0]
if new_line1 not in lines_seen:
outfile.write(line) # Replaced new_line1 with line
lines_seen.add(new_line1)
except:
pass
outfile.close() # This line was having bad indentation
Input file:
TextTextText('aaa/bbb`ccc' , "ddd.eee");
TextTextText('yyy/iii`ooo' , "rrr.ttt");
TextTextText('aaa/fff`ggg' , "hhh.jjj");
TextTextText('WWW/fff`ggg' , "hhh.jjj");
TextTextText('yyy/iii`ooo' , "rrr.ttt");
Output File:
TextTextText('aaa/bbb`ccc' , "ddd.eee");
TextTextText('yyy/iii`ooo' , "rrr.ttt");
TextTextText('WWW/fff`ggg' , "hhh.jjj");
It looks like you encountered line inside your file which has not ', in this case line.split("'") produce list with single element, for example
line = "blah blah blah"
cut_line = line.split("'")
print(cut_line) # ['blah blah blah']
so trying to get cut_line[1] result in error as there is only cut_line[0]. As this piece of your code is inside loop you might avoid that by skipping to next iteration using continue word, if cut_line has not enough elements, just replace:
cut_line = line.split("'")
new_line = cut_line[1]
by:
cut_line = line.split("'")
if len(cut_line) < 2:
continue
new_line = cut_line[1]
This will result in ignoring all lines without '.
I think using a regular expression would make it easier. I have made a simplified working code using re.
import re
lines = [
"",
"dfdsa sadfsadf sa",
"TextTextText('aaa/bbb`ccc' ,dsafdsafsA ",
"TextTextText('yyy/iii`ooo' ,SDFSDFSDFSA ",
"TextTextText('aaa/fff`ggg' ,SDFSADFSDF ",
]
lines_seen = set()
out_lines = []
for line in lines:
# SEARCH FOR 'xxx/ TEXT in the line -----------------------------------
re_result = re.findall(r"'[a-z]+\/", line)
if re_result:
print(f're_result {re_result[0]}')
if re_result[0] not in lines_seen:
print(f'>>> newly found {re_result[0]}')
lines_seen.add(re_result[0])
out_lines.append(line)
print('------------')
for line in out_lines:
print(line)
Result
re_result 'aaa/
>>> newly found 'aaa/
re_result 'yyy/
>>> newly found 'yyy/
re_result 'aaa/
------------
TextTextText('aaa/bbb`ccc' ,dsafdsafsA
TextTextText('yyy/iii`ooo' ,SDFSDFSDFSA
You can experiment with regular expressions here regex101.com.
Try r"'.+/" any character between ' and /, or r"'[a-zA-Z]+/" lower and uppercase letters between ' and /.

Open a JS file and edit a line with Python

I'm trying to modify a specific line in a js file using python.
Here's the js file :
...
hide: [""]
...
Here's my python code :
with open('./config.js','r') as f:
lines = f.readlines()
with open('./config.js','w') as f:
for line in lines:
line = line.replace('hide', 'something')
f.write(line)
So it works but this is not what I want to do.
I want to write 'something' between the brackets and not replace 'hide'.
So I don't know how to do it: Do I have to replace the whole line or can I just add a word between the brackets?
Thanks
If you want to replace text at this exact line you could just do:
with open('./config.js','r') as f:
lines = f.readlines()
with open('./config.js','w') as f:
  new_value = 'Something New'
for line in lines:
if line.startswith('hide'):
line = 'hide: ["{}"]'.format(new_value)
f.write(line)
or alternatively in the conditional
if line.startswith('hide'):
line = line.replace('""', '"Something new"')
Here's way to replace any value in brackets for hide that starts with any spacing.
lines = '''\
first line
hide: [""]
hide: ["something"]
last line\
'''
new_value = 'new value'
for line in lines.splitlines():
if line.strip().startswith('hide'):
line = line[:line.index('[')+2] + new_value + line[line.index(']')-1:]
print(line)
Output:
first line
hide: ["new value"]
hide: ["new value"]
last line
You can use fileinput and replace it inplace:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
replaceAll("config.js",'hide: [""]','hide: ["something"]')
Reference
If hide: [""] is not ambiguous, you could simply load the whole file, replace and write it back:
newline = 'Something new'
with open('./config.js','r') as f:
txt = f.read()
txt = txt.replace('hide: [""]', 'hide: ["' + newline + '"]')
with open('./config.js','w') as f:
f.write(txt)
As long as you don't have "hide" anywhere else in the file, then you could just do
with open('/config.js','r') as f:
lines = f.readlines()
with open('./config.js','w') as f:
for line in lines:
line = line.replace('hide [""]', 'hide ["something"]')
f.write(line)
You can do this using re.sub()
import re
with open('./config.js','r') as f:
lines = f.readlines()
with open('./config.js','w') as f:
for line in lines:
line = re.sub(r'(\[")("\])', r'\1' + 'something' + r'\2', line)
f.write(line)
It works by searching for a regular expression, but forms a group out of what you want on the left ((\[")) and the right (("\])). You then concatenate these either side of the text you want to insert (in this example 'something').
The bounding ( ) makes a group which can be accessed in the replace with r'\1', then second group is r'\2'.

Why is my code printing incorrectly to the text file?

I have this code:
with open("pool2.txt", "r") as f:
content = f.readlines()
for line in content:
line = line.strip().split(' ')
try:
line[0] = float(line[0])+24
line[0] = "%.5f" % line[0]
line = ' ' + ' '.join(line)
except:
pass
with open("pool3.txt", "w") as f:
f.writelines(content)
It should take lines that look like this:
-0.597976 -6.85293 8.10038
Into a line that has 24 added to the first number. Like so:
23.402024 -6.85293 8.10038
When I use print in the code to print the line, the line is correct, but when it prints to the text file, it prints as the original.
The original text file can be found here.
When you loop through an iterable like:
for line in content:
line = ...
line is a copy1 of the element. So if you modify it, the changes won't affect to content.
What can you do? You can iterate through indices, so you access directly to the current element:
for i in range(len(content)):
content[i] = ...
1: See #MarkRansom comment.

Categories

Resources