Convert spaces to new lines in a text file - python

I'm trying to convert all white spaces to new line in a text file so I will have a list of all the words in the text in the end.
with open('keywords.txt', 'w+') as g:
replace = string.replace(" ","\n")
replace.writelines
This is not working for me sadly.
I'm open for any tips or ideas, I can't believe that I can't get something to work that requires 3-5 lines of code.

'w+' will empty your file and you never read in the current contents and string.replace does not work like that.
with open('keywords.txt', 'r+') as g:
s = g.read()
s = s.replace(" ", "\n")
g.seek(0)
g.truncate()
g.write(s)

Related

How to delete empty space from a text file using python?

Image of the text file
Now what I want in this text file is the complete text written in a single line so that it looks like this one:What I want it to be like
How do I do this using python? I tried strip function ,replace function etc. It isnt just working.
You just need to read the file in, remove the new lines and write it again.
with open("./foo.txt", "r") as f:
formatted = ""
for line in f.readlines():
formatted += line.replace('\n', " ") # Removes the new lines, add spaces instead
formatted.replace(" ", " ") # Replace double space with one space
# Writes single line to textfile
with open("./bar.txt", "w") as out:
out.write(formatted)

How do I remove the "\n" characters when reading my file but deleting one variable then replacing it

I am trying to make a program that reads from a file and deletes one specific line inside of it and then puts all the data stored back to the file separated with a new line. The file uses this format:
Jones|20|20|00
bob|30|19|90
James|40|19|80
So I want to delete (backup contains this and is the line I want to delete)
bob|30|19|90
but the code that I am using takes away the new line and doesnt replace it but when I try to add \n to it the file doesn't want to read as it does this (adds 2 "\n"s):
Jones|20|20|00
James|40|19|80
I am using this code below:
def deleteccsaver(backup):
lockaccount =""
lockaccount = lockaccount.strip("\n")
with open('accounts_project.txt','r+') as f:
newline=[]
for line in f.readlines():
newline.append(line.replace(backup, lockaccount).strip("\n"))
with open('accounts_project.txt','w+') as f:
for line in newline:
f.writelines(line +"\n")
f.close()
resetlogin()
Please help as I dont know how to add the \n back without it appearing as "\n\n"
Without the "\n "it appears as:
Jones|20|20|00James|40|19|80
Any suggestions:
What I am doing here is reading the entire file at once, please don't do this if you have a very very big file. After reading all file contents at once, I am making a list out of it using "\n" as a delimiter. Read about split function in python to know more about it. Then from the list I am replacing the backup with lockaccount, as you have been doing the same, these are the names of variables that you are using, hope I did not confuse between them in this case. Then it will be saved to a new file after adding new line after each element of list, i.e. each line of the previous file. This will cause the result file to have all the contents as previous file, but removing what you wanted to remove. I see that lockaccount is itself an empty string, so adding it might create a newline in your file. In case you dont want lockaccount to replace the backup variable in the file, just remove the backup from the list using contents.remove(backup) instead of contents[contents.index(backup)] == lockaccount keeping the rest of the code same. Hope this explains better.
def deleteccsaver(backup):
lockaccount =""
lockaccount = lockaccount.strip("\n")
with open('accounts_project.txt','r+') as f:
contents = f.read().split("\n")
if backup in contents:
contents[contents.index(backup)] = lockaccount
new_contents = "\n".join(contents)
with open('accounts_project.txt','w+') as f:
f.write(new_contents)
resetlogin()
You are priting a newline character after each element in the list. So, if you replace a line with the empty string, well, you will get an empty line.
Try to simply skip over the line you want to delete:
if line == backup:
contiune
else:
lines.append(...)
PS. There is room for improvment in the code above, but I'm on the phone, I will get back with an edit later if nobody gets ahead of me
You can try to add newline = '\n'.join(newline) after your first for loop and then just write it into the accounts_project.txt file without a loop.
The code should then look like:
def deleteccsaver(backup):
lockaccount =""
lockaccount = lockaccount.strip("\n")
with open('accounts_project.txt','r+') as f:
newline=[]
for line in f.readlines():
newline.append(line.replace(backup, lockaccount).strip("\n"))
newline = '\n'.join(newline)
with open('accounts_project.txt','w+') as f:
f.write(newline)
f.close() # you don't necessarily need it inside a with statement
resetlogin()
Edit:
Above code still results in
Jones|20|20|00
James|40|19|80
as output.
That's because during the replacement loop an empty string will be appended to newline (like newline: ['Jones|20|20|00','','James|40|19|80']) and newline = '\n'.join(newline) will then result in 'Jones|20|20|00\n\nJames|40|19|80'.
A possible fix can be to replace:
for line in f.readlines():
newline.append(line.replace(backup, lockaccount).strip("\n"))
with
for line in f.readlines():
line = line.strip('\n')
if line != backup:
newline.append(line)
def deleteccsaver(backup):
lockaccount =""
lockaccount = lockaccount.strip("\n")
with open('accounts_project.txt','r+') as f:
contents = f.read().split("\n")
if backup in contents:
contents.remove(backup)
new_contents = "\n".join(contents)
with open('accounts_project.txt','w+') as f:
f.write(new_contents)
resetlogin()

Multiple line file into one string

Hello I'm making a python program that takes in a file. I want this to be set to a single string. My current code is:
with open('myfile.txt') as f:
title = f.readline().strip();
content = f.readlines();
The text file (simplified) is:
Title of Document
asdfad
adfadadf
adfadaf
adfadfad
I want to strip the title (which my program does) and then make the rest one string. Right now the output is:
['asdfad\n', 'adfadadf\n', ect...]
and I want:
asdfadadfadadf ect...
I am new to python and I have spent some time trying to figure this out but I can't find a solution that works. Any help would be appreciated!
You can do this:
with open('/tmp/test.txt') as f:
title=f.next() # strip title line
data=''.join(line.rstrip() for line in f)
Use list.pop(0) to remove the first line from content.
Then str.join(iterable). You'll also need to strip off the newlines.
content.pop(0)
done = "".join([l.strip() for l in content])
print done
Another option is to read the entire file, then remove the newlines instead of joining together:
with open('somefile') as fin:
next(fin, None) # ignore first line
one_big_string = fin.read().replace('\n', '')
If you want the rest of the file in a single chunk, just call the read() function:
with open('myfile.txt') as f:
title = f.readline().strip()
content = f.read()
This will read the file until EOF is encountered.

How can I remove carriage return from a text file with Python?

The things I've googled haven't worked, so I'm turning to experts!
I have some text in a tab-delimited text file that has some sort of carriage return in it (when I open it in Notepad++ and use "show all characters", I see [CR][LF] at the end of the line). I need to remove this carriage return (or whatever it is), but I can't seem to figure it out. Here's a snippet of the text file showing a line with the carriage return:
firstcolumn secondcolumn third fourth fifth sixth seventh
moreoftheseventh 8th 9th 10th 11th 12th 13th
Here's the code I'm trying to use to replace it, but it's not finding the return:
with open(infile, "r") as f:
for line in f:
if "\n" in line:
line = line.replace("\n", " ")
My script just doesn't find the carriage return. Am I doing something wrong or making an incorrect assumption about this carriage return? I could just remove it manually in a text editor, but there are about 5000 records in the text file that may also contain this issue.
Further information:
The goal here is select two columns from the text file, so I split on \t characters and refer to the values as parts of an array. It works on any line without the returns, but fails on the lines with the returns because, for example, there is no element 9 in those lines.
vals = line.split("\t")
print(vals[0] + " " + vals[9])
So, for the line of text above, this code fails because there is no index 9 in that particular array. For lines of text that don't have the [CR][LF], it works as expected.
Depending on the type of file (and the OS it comes from, etc), your carriage return might be '\r', '\n', or '\r'\n'. The best way to get rid of them regardless of which one they are is to use line.rstrip().
with open(infile, "r") as f:
for line in f:
line = line.rstrip() # strip out all tailing whitespace
If you want to get rid of ONLY the carriage returns and not any extra whitespaces that might be at the end, you can supply the optional argument to rstrip:
with open(infile, "r") as f:
for line in f:
line = line.rstrip('\r\n') # strip out all tailing whitespace
Hope this helps
Here's how to remove carriage returns without using a temporary file:
with open(file_name, 'r') as file:
content = file.read()
with open(file_name, 'w', newline='\n') as file:
file.write(content)
Python opens files in so-called universal newline mode, so newlines are always \n.
Python is usually built with universal newlines support; supplying 'U'
opens the file as a text file, but lines may be terminated by any of
the following: the Unix end-of-line convention '\n', the Macintosh
convention '\r', or the Windows convention '\r\n'. All of these
external representations are seen as '\n' by the Python program.
You iterate through file line-by-line. And you are replacing \n in the lines. But in fact there are no \n because lines are already separated by \n by iterator and each line contains no \n.
You can just read from file f.read(). And then replace \n in it.
with open(infile, "r") as f:
content = f.read()
content = content.replace('\n', ' ')
#do something with content
Technically, there is an answer!
with open(filetoread, "rb") as inf:
with open(filetowrite, "w") as fixed:
for line in inf:
fixed.write(line)
The b in open(filetoread, "rb") apparently opens the file in such a way that I can access those line breaks and remove them. This answer actually came from Stack Overflow user Kenneth Reitz off the site.
Thanks everyone!
I've created a code to do it and it works:
end1='C:\...\file1.txt'
end2='C:\...\file2.txt'
with open(end1, "rb") as inf:
with open(end2, "w") as fixed:
for line in inf:
line = line.replace("\n", "")
line = line.replace("\r", "")
fixed.write(line)

Delete a specific string (not line) from a text file python

I have a text file with two lines in a text file:
<BLAHBLAH>483920349<FOOFOO>
<BLAHBLAH>4493<FOOFOO>
Thats the only thing in the text file. Using python, I want to write to the text file so that i can take away BLAHBLAH and FOOFOO from each line. It seems like a simple task but after refreshing my file manipulation i cant seem to find a way to do it.
Help is greatly appreciated :)
Thanks!
If it's a text file as you say, and not HTML/XML/something else, just use replace:
for line in infile.readlines():
cleaned_line = line.replace("BLAHBLAH","")
cleaned_line = cleaned_line.replace("FOOFOO","")
and write cleaned_line to an output file.
f = open(path_to_file, "w+")
f.write(f.read().replace("<BLAHBLAH>","").replace("<FOOFOO>",""))
f.close()
Update (saving to another file):
f = open(path_to_input_file, "r")
output = open(path_to_output_file, "w")
output.write(f.read().replace("<BLAHBLAH>","").replace("<FOOFOO>",""))
f.close()
output.close()
Consider the regular expressions module re.
result_text = re.sub('<(.|\n)*?>',replacement_text,source_text)
The strings within < and > are identified. It is non-greedy, ie it will accept a substring of the least possible length. For example if you have "<1> text <2> more text", a greedy parser would take in "<1> text <2>", but a non-greedy parser takes in "<1>" and "<2>".
And of course, your replacement_text would be '' and source_text would be each line from the file.

Categories

Resources