Python: How to remove all quotations from a text file? - python

I have a text file which has quotations in the form ' and ". This text file is of the form.
"string1", "string2", 'string3', 'string4', ''string5'', etc.
How can I remove all of the quotations ' and " while leaving the rest of the file the way it is now?
I suspect it should be something like this:
with open('input.txt', 'r') as f, open('output.txt', 'w') as fo:
for line in f:
fo.write(line.strip())
Where line.strip() somehow strips the strings of the quotation marks. Is anything else required?

You're close. Instead of str.strip(), try str.replace():
with open('input.txt', 'r') as f, open('output.txt', 'w') as fo:
for line in f:
fo.write(line.replace('"', '').replace("'", ""))

Related

Trying to remove multiple space in txt file using python [duplicate]

So I have this crazy long text file made by my crawler and it for some reason added some spaces inbetween the links, like this:
https://example.com/asdf.html (note the spaces)
https://example.com/johndoe.php (again)
I want to get rid of that, but keep the new line. Keep in mind that the text file is 4.000+ lines long. I tried to do it myself but figured that I have no idea how to loop through new lines in files.
Seems like you can't directly edit a python file, so here is my suggestion:
# first get all lines from file
with open('file.txt', 'r') as f:
lines = f.readlines()
# remove spaces
lines = [line.replace(' ', '') for line in lines]
# finally, write lines in the file
with open('file.txt', 'w') as f:
f.writelines(lines)
You can open file and read line by line and remove white space -
Python 3.x:
with open('filename') as f:
for line in f:
print(line.strip())
Python 2.x:
with open('filename') as f:
for line in f:
print line.strip()
It will remove space from each line and print it.
Hope it helps!
Read text from file, remove spaces, write text to file:
with open('file.txt', 'r') as f:
txt = f.read().replace(' ', '')
with open('file.txt', 'w') as f:
f.write(txt)
In #Leonardo Chirivì's solution it's unnecessary to create a list to store file contents when a string is sufficient and more memory efficient. The .replace(' ', '') operation is only called once on the string, which is more efficient than iterating through a list performing replace for each line individually.
To avoid opening the file twice:
with open('file.txt', 'r+') as f:
txt = f.read().replace(' ', '')
f.seek(0)
f.write(txt)
f.truncate()
It would be more efficient to only open the file once. This requires moving the file pointer back to the start of the file after reading, as well as truncating any possibly remaining content left over after you write back to the file. A drawback to this solution however is that is not as easily readable.
I had something similar that I'd been dealing with.
This is what worked for me (Note: This converts from 2+ spaces into a comma, but if you read below the code block, I explain how you can get rid of ALL whitespaces):
import re
# read the file
with open('C:\\path\\to\\test_file.txt') as f:
read_file = f.read()
print(type(read_file)) # to confirm that it's a string
read_file = re.sub(r'\s{2,}', ',', read_file) # find/convert 2+ whitespace into ','
# write the file
with open('C:\\path\\to\\test_file.txt', 'w') as f:
f.writelines('read_file')
This helped me then send the updated data to a CSV, which suited my need, but it can help for you as well, so instead of converting it to a comma (','), you can convert it to an empty string (''), and then [or] use a read_file.replace(' ', '') method if you don't need any whitespaces at all.
Lets not forget about adding back the \n to go to the next row.
The complete function would be :
with open(str_path, 'r') as file :
str_lines = file.readlines()
# remove spaces
if bl_right is True:
str_lines = [line.rstrip() + '\n' for line in str_lines]
elif bl_left is True:
str_lines = [line.lstrip() + '\n' for line in str_lines]
else:
str_lines = [line.strip() + '\n' for line in str_lines]
# Write the file out again
with open(str_path, 'w') as file:
file.writelines(str_lines)

Cannot save string to file with `\n` characters

The following code produces a file with content test\\nstring, but I need the file to contain test\nstring. I can't figure out a way to replace the \\symbol either.
s = "test\nstring"
with open('test.txt', 'w') as f:
f.write(s)
How can I make sure that the file contains only \n instead of \\n?
use s = "test\\nstring"
I tried with the following code and worked.
s = "test\\nstring"
with open('test.txt', 'w') as f:
f.write(s)
and the test.txt file contains
test\nstring
Besides of escaping and raw string, you can encode it (2 or 3) with 'string_escape':
s = "test\nstring".encode('string_escape')
with open('test.txt', 'w') as f:
f.write(s)
The raw strings may help
s = r"test\nstring"
with open('test.txt', 'w') as f:
f.write(s)

How to read quoted string from File and write it without quotes?

I am trying to write a python script to convert rows in a file to json output, where each line contains a json blob.
My code so far is:
with open( "/Users/me/tmp/events.txt" ) as f:
content = f.readlines()
# strip to remove newlines
lines = [x.strip() for x in content]
i = 1
for line in lines:
filename = "input" + str(i) + ".json"
i += 1
f = open(filename, "w")
f.write(line)
f.close()
However, I am running into an issue where if I have an entry in the file that is quoted, for example:
client:"mac"
This will be output as:
"client:""mac"""
Using a second strip on writing to file will give:
client:""mac
But I want to see:
client:"mac"
Is there any way to force Python to read text in the format ' "something" ' without appending extra quotes around it?
Instead of creating an auxiliary list to strip the newline from content, just open the input and output files at the same time. Write to the output file as you iterate through the lines of the input and stripping whatever you deem necessary. Try something like this:
with open('events.txt', 'rb') as infile, open('input1.json', 'wb') as outfile:
for line in infile:
line = line.strip('"')
outfile.write(line)

Removing all spaces in text file with Python 3.x

So I have this crazy long text file made by my crawler and it for some reason added some spaces inbetween the links, like this:
https://example.com/asdf.html (note the spaces)
https://example.com/johndoe.php (again)
I want to get rid of that, but keep the new line. Keep in mind that the text file is 4.000+ lines long. I tried to do it myself but figured that I have no idea how to loop through new lines in files.
Seems like you can't directly edit a python file, so here is my suggestion:
# first get all lines from file
with open('file.txt', 'r') as f:
lines = f.readlines()
# remove spaces
lines = [line.replace(' ', '') for line in lines]
# finally, write lines in the file
with open('file.txt', 'w') as f:
f.writelines(lines)
You can open file and read line by line and remove white space -
Python 3.x:
with open('filename') as f:
for line in f:
print(line.strip())
Python 2.x:
with open('filename') as f:
for line in f:
print line.strip()
It will remove space from each line and print it.
Hope it helps!
Read text from file, remove spaces, write text to file:
with open('file.txt', 'r') as f:
txt = f.read().replace(' ', '')
with open('file.txt', 'w') as f:
f.write(txt)
In #Leonardo Chirivì's solution it's unnecessary to create a list to store file contents when a string is sufficient and more memory efficient. The .replace(' ', '') operation is only called once on the string, which is more efficient than iterating through a list performing replace for each line individually.
To avoid opening the file twice:
with open('file.txt', 'r+') as f:
txt = f.read().replace(' ', '')
f.seek(0)
f.write(txt)
f.truncate()
It would be more efficient to only open the file once. This requires moving the file pointer back to the start of the file after reading, as well as truncating any possibly remaining content left over after you write back to the file. A drawback to this solution however is that is not as easily readable.
I had something similar that I'd been dealing with.
This is what worked for me (Note: This converts from 2+ spaces into a comma, but if you read below the code block, I explain how you can get rid of ALL whitespaces):
import re
# read the file
with open('C:\\path\\to\\test_file.txt') as f:
read_file = f.read()
print(type(read_file)) # to confirm that it's a string
read_file = re.sub(r'\s{2,}', ',', read_file) # find/convert 2+ whitespace into ','
# write the file
with open('C:\\path\\to\\test_file.txt', 'w') as f:
f.writelines('read_file')
This helped me then send the updated data to a CSV, which suited my need, but it can help for you as well, so instead of converting it to a comma (','), you can convert it to an empty string (''), and then [or] use a read_file.replace(' ', '') method if you don't need any whitespaces at all.
Lets not forget about adding back the \n to go to the next row.
The complete function would be :
with open(str_path, 'r') as file :
str_lines = file.readlines()
# remove spaces
if bl_right is True:
str_lines = [line.rstrip() + '\n' for line in str_lines]
elif bl_left is True:
str_lines = [line.lstrip() + '\n' for line in str_lines]
else:
str_lines = [line.strip() + '\n' for line in str_lines]
# Write the file out again
with open(str_path, 'w') as file:
file.writelines(str_lines)

Using multiple re.sub() calls in one file with Python

I have a file with a large amount of random strings contained with in it. There are certain patterns that I wan't to remove, so I decided to use RegEX to check for them. So far this code, does exactly what I want it to:
#!/usr/bin/python
import csv
import re
import sys
import pdb
f=open('output.csv', 'w')
with open('retweet.csv', 'rb') as inputfile:
read=csv.reader(inputfile, delimiter=',')
for row in read:
f.write(re.sub(r'#\s\w+', ' ', row[0]))
f.write("\n")
f.close()
f=open('output2.csv', 'w')
with open('output.csv', 'rb') as inputfile2:
read2=csv.reader(inputfile2, delimiter='\n')
for row in read2:
a= re.sub('[^a-zA-Z0-9]', ' ', row[0])
b= str.split(a)
c= "+".join(b)
f.write("http://www.google.com/webhp#q="+c+"&btnI\n")
f.close()
The problem is, I would like to avoid having to open and close a file as this can get messy if I need to check for more patterns. How can I perform multiple re.sub() calls on the same file and write it out to a new file with all substitutions?
Thanks for any help!
Apply all your substitutions in one go on the current line:
with open('retweet.csv', 'rb') as inputfile:
read=csv.reader(inputfile, delimiter=',')
for row in read:
text = row[0]
text = re.sub(r'#\s\w+', ' ', text)
text = re.sub(another_expression, another_replacement, text)
# etc.
f.write(text + '\n')
Note that opening a file with csv.reader(..., delimiter='\n') sounds awfully much as if you are treating that file as a sequence of lines; you could just loop over the file:
with open('output.csv', 'rb') as inputfile2:
for line in inputfile2:

Categories

Resources