Removing extra space from text file

Removing extra space from text file - python

I am currently keeping high scores into a text file called "score.txt". The prgoram works fine, updating the file with the new high scores as normal. Except that every time the program updates the file, there is always one blank line before the first high score, creating an error when I try to save the scores the next time. The code:
scores_list = []
score = 10
def take_score():
# Save old scores into list
f = open("score.txt", "r")
lines = f.readlines()
for line in lines:
scores_list.append(line)
print scores_list
f.close()
take_score()
def save_score():
# Clear file
f = open("score.txt", "w")
print >> f, ""
f.close()
# Rewrite scores into text files
w = open("score.txt", "a")
for i in range(0, len(scores_list)):
new_string = scores_list[i].replace("\n", "")
scores_list[i] = int(new_string)
if score > scores_list[i]:
scores_list[i] = score
for p in range(0, len(scores_list)):
print >> w, str(scores_list[p])
print repr(str(scores_list[p]))
save_score()
The problem mentioned happens in the save_score() function. I have tried this related question: Removing spaces and empty lines from a file Using Python, but it requires I open the file in "r" mode. Is there a way to accomplish the same thing except when the file is opened in "a" mode (append)?

You are specifically printing an empty line as soon as you create the file.
print >> f, ""
You then append to it, keeping the empty line.
If you just want to clear the contents every time you run this, get rid of this:
# Clear file
f = open("score.txt", "w")
print >> f, ""
f.close()
And modify the opening to this:
w = open("score.txt", "w")
The 'w' mode truncates already, as you were already using. There's no need to truncate, write an empty line, close, then append lines. Just truncate and write what you want to write.
That said, you should use the with construct and file methods for working with files:
with open("score.txt", "w") as output: # here's the with construct
for i in xrange(len(scores_list)):
# int() can handle leading/trailing whitespace
scores_list[i] = int(scores_list[i])
if score > scores_list[i]:
scores_list[i] = score
for p in xrange(len(scores_list)):
output.write(str(scores_list[p]) + '\n') # writing to the file
print repr(str(scores_list[p]))
You will then not need to explicitly close() the file handle, as with takes care of that automatically and more reliably. Also note that you can simply send a single argument to range and it will iterate from 0, inclusive, until that argument, exclusive, so I've removed the redundant starting argument, 0. I've also changed range to the more efficient xrange, as range would only be reasonably useful here if you wanted compatibility with Python 3, and you're using Python 2-style print statements anyway, so there isn't much point.

print appends a newline to what you print. In the line
print >> f, ""
You're writing a newline to the file. This newline still exists when you reopen in append mode.
As #Zizouz212 mentions, you don't need to do all this. Just open in write mode, which'll truncate the file, then write what you need.

Your opening a file, clearing it, but then you open the same file again unnecessarily. When you open the file, you print a newline, even if you don't think so. Here is the offending line:
print >> f, ""
In Python 2, it really does this.
print "" + "\n"
This is because Python adds a newline at the end of the string to each print statement. To stop this, you could add a comma to the end of the statement:
print "",
Or just write directly:
f.write("my data")
However, if you're trying to save a Python data type, and it does not have to be human-readable, you may have luck using pickle. It's really simple to use:
def save_score():
with open('scores.txt', 'w') as f:
pickle.dump(score_data, f):

It is not really answer for question.
It is my version of your code (not tested). And don't avoid rewriting everything ;)
# --- functions ---
def take_score():
'''read values and convert to int'''
scores = []
with open("score.txt", "r") as f
for line in f:
value = int(line.strip())
scores.append(value)
return scores
def save_score(scores):
'''save values'''
with open("score.txt", "w") as f
for value in scores:
write(value)
write("\n")
def check_scores(scores, min_value):
results = []
for value in scores:
if value < min_value:
value = min_value
results.append(value)
return resulst
# --- main ---
score = 10
scores_list = take_score()
scores_list = check_scores(scores_list, score)
save_score(scores_list)

Related

Python Question - How to extract text between {textblock}{/textblock} of a .txt file?

I want to extract the text between {textblock_content} and {/textblock_content}.
With this script below, only the 1st line of the introtext.txt file is going to be extracted and written in a newly created text file. I don't know why the script does not extract also the other lines of the introtext.txt.
f = open("introtext.txt")
r = open("textcontent.txt", "w")
for l in f.readlines():
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
f.close()
r.close()
How to solve this problem?

Your code actually working fine, assuming you have begin and end block in your line. But I think this is not what you dreamed of. You can't read multiple blocks in one line, and you can't read block which started and ended in different lines.
First of all take a look at the object which returned by open function. You can use method read in this class to access whole text. Also take a look at with statements, it can help you to make actions with file easier and safely. And to rewrite your code so it will read something between {textblockcontent} and {\textblockcontent} we should write something like this:
def get_all_tags_content(
text: str,
tag_begin: str = "{textblock_content}",
tag_end: str = "{/textblock_content}"
) -> list[str]:
useful_text = text
ans = []
# Heavy cicle, needs some optimizations
# Works in O(len(text) ** 2), we can better
while tag_begin in useful_text:
useful_text = useful_text.split(tag_begin, 1)[1]
if tag_end not in useful_text:
break
block_content, useful_text = useful_text.split(tag_end, 1)
ans.append(block_content)
return ans
with open("introtext.txt", "r") as f:
with open("textcontent.txt", "w+") as r:
r.write(str(get_all_tags_content(f.read())))
To write this function efficiently, so it can work with a realy big files on you. In this implementation I have copied our begin text every time out context block appeared, it's not necessary and it's slow down our program (Imagine the situation where you have millions of lines with content {textblock_content}"hello world"{/textblock_content}. In every line we will copy whole text to continue out program). We can use just for loop in this text to avoid copying. Try to solve it yourself

When you call file.readlines() the file pointer will reach the end of the file. For further calls of the same, the return value will be an empty list so if you change your code to sth like one of the below code snippets it should work properly:
f = open("introtext.txt")
r = open("textcontent.txt", "w")
f_lines = f.readlines()
for l in f_lines:
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
f.close()
r.close()
Also, you can implement it through with context manager like the below code snippet:
with open("textcontent.txt", "w") as r:
with open("introtext.txt") as f:
for line in f:
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)

How can I print the lines from the text file next to random generated numbers

The text file is "ics2o.txt" and I don't know how to print numbers next to the lines
import random
print ("----------------------------------------------------------")
print ("Student Name Student Mark")
print ("----------------------------------------------------------")
f = open("ics2o.txt")
for line in f:
x = len(f.readlines())
for i in range (x):
contents = f.read()
print(str(contents) + str(random.randint(75,100)))

for line in f:
x = len(f.readlines())
for i in range (x):
contents = f.read()
print(str(contents) + str(random.randint(75,100)))
The problem is that you are reading the file in at least 3 different ways which causes none of them to work the way you want. In particular, f.readlines() consumes the entire file buffer, so when you next do f.read() there is nothing left to read. Don't mix and match these. Instead, you should use line since you are iterating over the file already:
for line in f:
print(line + str(random.randint(75,100)))
The lesson here is don't make things any more complicated than they need to be.

Firstly, doing print("----...") is a bad practice, at least use string multiplication:print("-"*10)
Secondly, always open files using 'with' keyword. (u can google it up why)
Thirdly, the code:
with open("ics2o.txt") as f:
for i,j in enumerate(f):
print(i,j)

Python: replace string in each line

I'm trying to replace each "zzz" with a number. There is no output at execution and the file content remains the same. Here's what I have:
n = 0
file = open("filename", "a")
for line in file:
if "zzz" in line:
line.replace("zzz", n, 1)
n += 1

There are a few issues here. First, the replace method doesn't alter the string in place but rather creates a new string. Therefore, if you don't assign the new value to something, you're going to lose it.
Second, you're trying to read from the file but you're opening it in append mode. You need to open it with "r" instead of "a".
Third, you can't pass an integer (n) as the second argument to replace. You need to convert it to a string.
Finally, you're not writing the contents back to the file. That's why it's unchanged. I recommend reading all of the data in, altering it, and then re-opening the file in write mode.
n = 0
with open("filename", "r") as f:
lines = f.readlines()
for k, line in enumerate(lines):
if "zzz" in line:
lines[k] = line.replace("zzz", str(n), 1)
n += 1
with open("filename", "w") as f:
f.write("".join(lines))

Write all lines for each set of a range to new file each time the range changes Python 3.6

trying to find a way of making this process work pythonically or at all. Basically, I have a really long text file that is split into lines. Every x number of lines there is one that is mainly uppercase, which should roughly be the title of that particular section. Ideally, I'd want the title and everything after to go into a text file using the title as the name for the file. This would have to happen 3039 in this case as that is as many titles will be there.
My process so far is this: I created a variable that reads through a text file tells me if it's mostly uppercase.
def mostly_uppercase(text):
threshold = 0.7
isupper_bools = [character.isupper() for character in text]
isupper_ints = [int(val) for val in isupper_bools]
try:
upper_percentage = np.mean(isupper_ints)
except:
return False
if upper_percentage >= threshold:
return True
else:
return False
Afterwards, I made a counter so that I could create an index and then I combined it:
counter = 0
headline_indices = []
for line in page_text:
if mostly_uppercase(line):
print(line)
headline_indices.append(counter)
counter+=1
headlines_with_articles = []
headline_indices_expanded = [0] + headline_indices + [len(page_text)-1]
for first, second in list(zip(headline_indices_expanded, headline_indices_expanded[1:])):
article_text = (page_text[first:second])
headlines_with_articles.append(article_text)
All of that seems to be working fine as far as I can tell. But when I try to print the pieces that I want to files, all I manage to do is print the entire text into all of the txt files.
for i in range(100):
out_pathname = '/sharedfolder/temp_directory/' + 'new_file_' + str(i) + '.txt'
with open(out_pathname, 'w') as fo:
fo.write(articles_filtered[2])
Edit: This got me halfway there. Now, I just need a way of naming each file with the first line.
for i,text in enumerate(articles_filtered):
open('/sharedfolder/temp_directory' + str(i + 1) + '.txt', 'w').write(str(text))

One conventional way of processing a single input file involves using a Python with statement and a for loop, in the following way. I have also adapted a good answer from someone else for counting uppercase characters, to get the fraction you need.
def mostly_upper(text):
threshold = 0.7
## adapted from https://stackoverflow.com/a/18129868/131187
upper_count = sum(1 for c in text if c.isupper())
return upper_count/len(text) >= threshold
first = True
out_file = None
with open('some_uppers.txt') as some_uppers:
for line in some_uppers:
line = line.rstrip()
if first or mostly_upper(line):
first = False
if out_file: out_file.close()
out_file = open(line+'.txt', 'w')
print(line, file=out_file)
out_file.close()
In the loop, we read each line, asking whether it's mostly uppercase. If it is we close the file that was being used for the previous collection of lines and open a new file for the next collection, using the contents of the current line as a title.
I allow for the possibility that the first line might not be a title. In this case the code creates a file with the contents of the first line as its names, and proceeds to write everything it finds to that file until it does find a title line.

Read a multielement list, look for an element and print it out in python

I am writing a python script in order to write a tex file. But I had to use some information from another file. Such file has names of menus in each line that I need to use. I use split to have a list for each line of my "menu".
For example, I had to write a section with the each second element of my lists but after running, I got anything, what could I do?
This is roughly what I am doing:
texfile = open(outputtex.tex', 'w')
infile = open(txtfile.txt, 'r')
for line in infile.readlines():
linesplit = line.split('^')
for i in range(1,len(infile.readlines())):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
By the way, in the inclugraphics line, I had to increace the number after pg_ from "0001" to "25050". Any clues??
I really appreciate your help.

I don't quite follow your question. But I see several errors in your code. Most importantly:
for line in infile.readlines():
...
...
for i in range(1,len(infile.readlines())):
Once you read a file, it's gone. (You can get it back, but in this case there's no point.) That means that the second call to readlines is yielding nothing, so len(infile.readlines()) == 0. Assuming what you've written here really is what you want to do (i.e. write file_len * (file_len - 1) + 1 lines?) then perhaps you should save the file to a list. Also, you didn't put quotes around your filenames, and your indentation is strange. Try this:
with open('txtfile.txt', 'r') as infile: # (with automatically closes infile)
in_lines = infile.readlines()
in_len = len(in_lines)
texfile = open('outputtex.tex', 'w')
for line in in_lines:
linesplit = line.split('^')
for i in range(1, in_len):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
Perhaps you don't actually want nested loops?
infile = open('txtfile.txt', 'r')
texfile = open('outputtex.tex', 'w')
for line_number, line in enumerate(infile):
linesplit = line.split('^')
texfile.write('\section{{{0}}}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' % line_number)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
infile.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing extra space from text file - python

print appends a newline to what you print. In the line print >> f, "" You're writing a newline to the file. This newline still exists when you reopen in append mode. As #Zizouz212 mentions, you don't need to do all this. Just open in write mode, which'll truncate the file, then write what you need.

Related

Python Question - How to extract text between {textblock}{/textblock} of a .txt file?

How can I print the lines from the text file next to random generated numbers

Python: replace string in each line

Write all lines for each set of a range to new file each time the range changes Python 3.6

Read a multielement list, look for an element and print it out in python

Categories

Resources