Delete x-line paragraphs from text file with Python - python

I have a long text file with paragraph with 6 and 7 lines each. I need to take all seven line paragraphs and write them to a file and take six line paragraphs and write them to a file.
Or delete 6-line (7-line) paragraphs.
Each paragraph is separated with blank line (or two blank lines).
Text file example:
Firs Name Last Name
address1
Address2
Note 1
Note 2
Note3
Note 4
First Name LastName
add 1
add 2
Note2
Note3
Note4
etc...
I want to use python 3 for windows. Any help is welcome. Thanks!

As a welcome on stackoverflow, and because I think you have now searched more for a code , I propose you the following code.
It verifies that the paragraphs have not more than 7 lines and not less than 6 lines. It warns when such paragraphs exist in the source.
You'll remove all the prints to have a clean code, but with them you can follow the algorithm.
I think there is no bug in it, but don't take that as 100 % sure.
It isn't the only manner to do , but I choosed the way that can be used for all types of files, big or not: iterating one line at a time. Reading the entire file in one pass could be done, and then split into a list of lines, or treated with help of regexes; however , when a file is enormous, reading it all in one time is memory consuming.
with open('source.txt') as fsource,\
open('SIX.txt','w') as six, open('SEVEN.txt','w') as seven:
buf = []
cnt = 0
exceeding7paragraphs = 0
tinyparagraphs = 0
line = 'go'
while line:
line = fsource.readline()
cnt += 1
buf.append(line)
if len(buf)<6 and line.rstrip('\n\r')=='':
tinyparagraphs += 1
print cnt,repr(line),"this line of paragraph < 6 is void,"+\
"\nthe treatment of all this paragraph is skipped\n"+\
'\n# '+str(cnt)+' '+ repr(line)+" skipped line "
buf = []
while line and line.rstrip('\n\r')=='':
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '#',cnt,repr(line)
else:
buf.append(line)
print '!',cnt,repr(line),' put in void buf'
else:
print cnt,repr(line),' put in buf'
if len(buf)==6:
line = fsource.readline() # reading a potential seventh line of a paragraph
cnt += 1
if line.rstrip('\n\r'): # means the content of the seventh line isn't void
buf.append(line)
print cnt,repr(line),'seventh line put in buf'
line = fsource.readline()
cnt += 1
if line.rstrip('\n\r'): # means the content of the eighth line isn't void
exceeding7paragraphs += 1
print cnt,repr(line),"the eight line isn't void,"+\
"\nthe treatment of all this paragraph is skipped"+\
"\neighth line skipped"
buf = []
while line and line.rstrip('\n\r'):
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '\n#',cnt,repr(line)
else:
print str(cnt) + ' ' + repr(line)+' skipped line'
else:
if line=='':
print cnt,"line is '' , EOF -> the program will be stopped\n"
else: # line.rstrip('\n\r') is ''
print cnt,'eighth line is void',repr(line)
seven.write(''.join(buf) + '\n')
print buf,'\n',len(buf),'lines recorded in file SEVEN\n'
buf = []
else:
print cnt,repr(line),'seventh line: void'
six.write(''.join(buf) + '\n')
print buf,'\n',len(buf),'lines recorded in file SIX'
buf = []
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
else:
print '\nthe line is',cnt, repr(line)
while line and line.rstrip('\n\r')=='':
line = fsource.readline()
cnt += 1
if line=='':
print "line",cnt,"is '' , EOF -> the program will be stopped"
elif line.rstrip('\n\r')=='':
print '#',cnt,repr(line)
else: # line.rstrip('\n\r') != ''
buf.append(line)
print '!',cnt,repr(line),' put in void buf'
if exceeding7paragraphs>0:
print '\nWARNING :'+\
'\nThere are '+str(exceeding7paragraphs)+' paragraphs whose number of lines exceeds 7.'
if tinyparagraphs>0:
print '\nWARNING :'+\
'\nThere are '+str(tinyparagraphs)+' paragraphs whose number of lines is less than 6.'
print '\n===================================================================='
print 'File SIX\n'
with open('SIX.txt') as six:
print six.read()
print '===================================================================='
print 'File SEVEN\n'
with open('SEVEN.txt') as seven:
print seven.read()
I also upvote your question because it is a problem not so easy that it's seems to solve, and to not let you with one post and one downvote, it is demoralizing as a beginning. Try to make your presentation better next time, as other said.
.
EDIT:
here's a simplified code for a text containing only paragraphs of 6 or 7 lines precisely, separated by 1 or 2 lines exactly, as stated in the problem's wording
with open('source2.txt') as fsource,\
open('SIX.txt','w') as six, open('SEVEN.txt','w') as seven:
buf = []
line = fsource.readline()
while not line: # to go to the first non empty line
line = fsource.readline()
while True:
buf.append(line) # this line is the first of a paragraph
print '\n- first line of a paragraph',repr(line)
for i in xrange(5):
buf.append(fsource.readline())
# at this point , 6 lines of a paragraph have been read
print '-- buf 6 : ',buf
line = fsource.readline()
print '--- line seventh',repr(line),id(line)
if line.rstrip('\r\n'):
buf.append(line)
seven.write(''.join(buf) + '\n')
buf = []
line = fsource.readline()
else:
six.write(''.join(buf) + '\n')
buf = []
# at this point, line is the empty line after a paragraph or EOF
print '---- line after',repr(line),id(line)
line = fsource.readline()
print '----- second line after',repr(line)
# at this point, line is an empty line after a paragraph or EOF
# or the first line of a new paragraph
if not line: # it is EOF
break
if not line.rstrip('\r\n'): # it is a second empty line
line = fsource.readline()
# now line is the first of a new paragraph
print '\n===================================================================='
print 'File SIX\n'
with open('SIX.txt') as six:
print six.read()
print '===================================================================='
print 'File SEVEN\n'
with open('SEVEN.txt') as seven:
print seven.read()

Related

Python write blank

I'm trying to make a program which would replace tags in a markdown file (.md) as follow :
If it's an opening $ tag, replace it by \(, if it's a closing $ tag, replace it by \), copy every other characters.
Unfortunately, when I try it, the file written is really strange. Some lines are copied but others aren't. First, the first and last line of every of my test files weren't copied. Other lines in the middle weren't as well. Same text on different line are not both copied.
Here is my program :
import os
def conv1(path):
"""convert $$ tags to \( \)"""
file = open(path, mode ='r') # open lesson with $ (.md)
new = open(path + '.tmp', mode = 'w') # open blank file
test = 0
for lines in file:
line = file.readline()
i = 0
length = len(line)
while i < length:
if line[i] == '$':
if test % 2 == 0: # replace opening tag
line = line[:i] + '\(' + line [i + 1:]
elif test % 2 == 1: # replace closing tag
line = line[:i] + '\)' + line [i + 1:]
test +=1
i += 2
length += 1
else :
i += 1
new.write(line + '\n')
file.close()
new.close()
os.rename(str(path) + '.tmp', str(path))
print('Done!')
Do you have any idea how to fix my issue?
Thanks in advance
EloiLmr
These line are causing every other line to be skipped:
for lines in file:
line = file.readline()
Calling file.readline() unnecessarily advances the file pointer by one line. It's enough to iterate over the file:
for line in file:
...

How to write Python Shell to an output text file?

I need to write my Python shell to an output text file. I have some of it written into an output text file but all I need is to now add the number of lines and numbers in each line to my output text file.
I have tried to add another for loop outside the for loop. I've tried putting it inside the for loop and it was just complicated.
Text file list of numbers:
1.0, 1.12, 1.123
1.0,1.12,1.123
1
Code:
import re
index = 0
comma_string = ', '
outfile = "output2.txt"
wp_string = " White Space Detected"
tab_string = " tab detected"
mc_string = " Missing carriage return"
ne_string = " No Error"
baconFile = open(outfile,"wt")
with open("Version2_file.txt", 'r') as f:
for line in f:
flag = 0
carrera = ""
index = index +1
print("Line {}: ".format(index))
baconFile.write("Line {}: ".format(index))
if " " in line: #checking for whitespace
carrera = carrera + wp_string + comma_string + carrera
flag = 1
a = 1
if "\t" in line: #checking for tabs return
carrera = carrera + tab_string + comma_string + carrera
flag = 1
if '\n' not in line:
carrera = carrera + mc_string + ne_string + carrera
flag = 1
if flag == 0: #checking if no error is true by setting flag equal to zero
carrera = ne_string
print('\t'.join(str(len(g)) for g in re.findall(r'\d+\.?(\d+)?', line )))
print (carrera)
baconFile.write('\t'.join(str(len(g)) for g in re.findall(r'\d+\.?(\d+)?', line ) ))
baconFile.write(carrera + "\n")
with open("Version2_file.txt", 'r') as f:
content = f.readlines()
print('Number of Lines: {}'.format(len(content)))
for i in range(len(content)):
print('Numbers in Line {}: {}'.format(i+1, len(content[i].split(','))))
baconFile.write('Number of lines: {}'.format(len(content)))
baconFile.write('Numbers in Line {}: {}'.format(i+1, len(content[i].split(','))))
baconFile.close()
Expected to write in output file:
Line 1: 1 2 3 Tab detected, whitespace detected
Line 2: 1 2 3 No error
Line 3: 1 Missing carriage return No error
Number of Lines: 3
Numbers in Line 1: 3
Numbers in Line 2: 3
Numbers in Line 3: 1
Actual from output file:
Line 1: 1 3 2White Space Detected, tab detected, White Space Detected,
Line 2: 1 3 2No Error
Line 3: 0Missing carriage returnNo Error
Number of lines: 3Numbers in Line 1: 3Number of lines: 3Numbers in Line 2: 3Numb
You have closed baconFile in the first open block, but do not open it again in the second open block. Additionally, you never write to baconFile in the second open block, which makes sense considering you've not opened it there, but then you can't expect to have written to it. It seems you simply forgot to add some write statements. Perhaps you confused write with print. Add those write statements in and you should be golden.
baconFile = open(outfile,"wt")
with open("Version2_file.txt", 'r') as f:
for line in f:
# ... line processing ...
baconFile.write(...) # line format info here
# baconFile.close() ## <-- move this
with open("Version2_file.txt", 'r') as f:
content = f.readlines()
baconFile.write(...) # number of lines info here
for i in range(len(content)):
baconFile.write(...) # numbers in each line info here
baconFile.close() # <-- over here
Here's a useful trick you can use to make print statements send their output to a specified file instead of the screen (i.e. stdout):
from contextlib import contextmanager
import os
import sys
#contextmanager
def redirect_stdout(target_file):
save_stdout = sys.stdout
sys.stdout = target_file
yield
sys.stdout = save_stdout
# Sample usage
with open('output2.txt', 'wt') as target_file:
with redirect_stdout(target_file):
print 'hello world'
print 'testing', (1, 2, 3)
print 'done' # Won't be redirected.
Contents of output2.txt file after running the above:
hello world
testing (1, 2, 3)

Why is my program not reading the first line of code in the referenced file(fileName)?

def main():
read()
def read():
fileName=input("Enter the file you want to count: ")
infile=open(fileName , "r")
text=infile.readline()
count=0
while text != "":
text=str(count)
count+=1
text=infile.readline()
print(str(count)+ ": " + text)
infile.close()
main()
-the referenced .txt file has only two elements
44
33
-the output of this code should look like
1: 44
2: 33
-my output is
1: 33
2:
im not sure why the program is not picking up the first line in the referenced .txt file. The line numbers are correct however 33 should be second to 44.
The reason is explained in the comments:
def main():
read()
def read():
fileName=input("Enter the file you want to count: ")
infile=open(fileName , "r")
text=infile.readline() ##Reading the first line here but not printing
count=0
while text != "":
text=str(count)
count+=1
text=infile.readline() ##Reading the 2nd line here
print(str(count)+ ": " + text) ##Printing the 2nd line here, missed the first
##line
infile.close()
main()
Modify the program as:
def main():
read()
def read():
fileName= input("Enter the file you want to count: ")
infile = open(fileName , "r")
text = infile.readline()
count = 1 # Set count to 1
while text != "":
print(str(count)+ ": " + str(text)) # Print 1st line here
count = count + 1 # Increment count to 2
text = infile.readline() # Read 2nd line
infile.close() # Close the file
main()
def main():
read()
def read():
fileName=input("Enter the file you want to count: ")
with open(fileName,'r') as f:
print('\n'.join([' : '.join([str(i+1),v.rstrip()]) for i,v in enumerate(f.readlines())]))
main()
I'm very confused by your read function. You start by reading the first line into text:
text=infile.readline()
Presumable at this point text contains 44.
You then immediately demolish this value before you've done anything with it by overwriting it with:
text = str(count)
ie you read two lines before printing anything at all.
You should print the value of text before you overwrite it with the next readline.
Simply move the print statement before readline:
while text != "":
count+=1
print(str(count)+ ": " + text)
text=infile.readline()

Script is skipping function

I have no idea whats going on with this code, for some reason it seems to just skip the function entirely.
try:
readHandle = open(fileName, 'r')
except IOError, ioe:
print "Cannot open file: ", fileName,"\n"
print "%s" %ioe
raise
lines = readHandle.readlines()
lineNum = 1
#read file line by line
for line in lines:
if line.startswith(':'):
#remove : from line
bits0 = line.partition(':')
#remove \n newlines
bits1 = bits0[2].partition('\n')
#split in to an array using , as delimiter
bits2 = bits1[0].split(',')
DrvrNum = bits2[0]
DrvrNam = bits2[1]
# Debug
if DBUG == 1:
print "DrvrNum and DrvrNam variable values"
print DrvrNum, DrvrNam
crcDrvr(DrvrNum, DrvrNam)
elif line.startswith('#'):
#Comment line
pass
elif line.startswith('Ss'):
#Crc line
pass
elif line.startswith('Zz'):
#end of file
pass
else:
print '\nError: line', lineNum , 'is an illegal entry'
print '\nPlease Check'
sys,exit(0)
lineNum = lineNum + 1
This is the function that is being skipped:
def crcDrvr(number,name):
convNum = int(number,16)
convNam = ''
for char in name:
hexChar = char.encode("hex")
print hexChar
can anyone tell me where I've gone wrong to cause my code to skip?
Sample data:
#DrvrDB
#
#
#
Ss1234
:744,Bob Hope
:747,Testy Tester
:777,Extra Guy
:0,dummy
Zz
#Driver#,DriverName
#end of file padding 1
I figured it out, some genius create the function crcDrvr twice with only a variable declaration so it must have been hitting that one
– Jim

Read special characters from .txt file in python

The goal of this code is to find the frequency of words used in a book.
I am tying to read in the text of a book but the following line keeps throwing my code off:
precious protégés. No, gentlemen; he'll always show 'em a clean pair
specifically the é character
I have looked at the following documentation, but I don't quite understand it: https://docs.python.org/3.4/howto/unicode.html
Heres my code:
import string
# Create word dictionary from the comprehensive word list
word_dict = {}
def create_word_dict ():
# open words.txt and populate dictionary
word_file = open ("./words.txt", "r")
for line in word_file:
line = line.strip()
word_dict[line] = 1
# Removes punctuation marks from a string
def parseString (st):
st = st.encode("ascii", "replace")
new_line = ""
st = st.strip()
for ch in st:
ch = str(ch)
if (n for n in (1,2,3,4,5,6,7,8,9,0)) in ch or ' ' in ch or ch.isspace() or ch == u'\xe9':
print (ch)
new_line += ch
else:
new_line += ""
# now remove all instances of 's or ' at end of line
new_line = new_line.strip()
print (new_line)
if (new_line[-1] == "'"):
new_line = new_line[:-1]
new_line.replace("'s", "")
# Conversion from ASCII codes back to useable text
message = new_line
decodedMessage = ""
for item in message.split():
decodedMessage += chr(int(item))
print (decodedMessage)
return new_line
# Returns a dictionary of words and their frequencies
def getWordFreq (file):
# Open file for reading the book.txt
book = open (file, "r")
# create an empty set for all Capitalized words
cap_words = set()
# create a dictionary for words
book_dict = {}
total_words = 0
# remove all punctuation marks other than '[not s]
for line in book:
line = line.strip()
if (len(line) > 0):
line = parseString (line)
word_list = line.split()
# add words to the book dictionary
for word in word_list:
total_words += 1
if (word in book_dict):
book_dict[word] = book_dict[word] + 1
else:
book_dict[word] = 1
print (book_dict)
# close the file
book.close()
def main():
wordFreq1 = getWordFreq ("./Tale.txt")
print (wordFreq1)
main()
The error that I received is as follows:
Traceback (most recent call last):
File "Books.py", line 80, in <module>
main()
File "Books.py", line 77, in main
wordFreq1 = getWordFreq ("./Tale.txt")
File "Books.py", line 60, in getWordFreq
line = parseString (line)
File "Books.py", line 36, in parseString
decodedMessage += chr(int(item))
OverflowError: Python int too large to convert to C long
When you open a text file in python, the encoding is ANSI by default, so it doesn't contain your é chartecter. Try
word_file = open ("./words.txt", "r", encoding='utf-8')
The best way I could think of is to read each character as an ASCII value, into an array, and then take the char value. For example, 97 is ASCII for "a" and if you do char(97) it will output "a". Check out some online ASCII tables that provide values for special characters also.
Try:
def parseString(st):
st = st.encode("ascii", "replace")
# rest of code here
The new error you are getting is because you are calling isalpha on an int (i.e. a number)
Try this:
for ch in st:
ch = str(ch)
if (n for n in (1,2,3,4,5,6,7,8,9,0) if n in ch) or ' ' in ch or ch.isspace() or ch == u'\xe9':
print (ch)

Categories

Resources