Search and replace lines in csv with double quotes using Python - python

I need to process some .csv files. Some of them have field entries of 1 double quote (") or possibly several mixed in with other text. I need to escape them all. So far I'm doing this:
def process_file():
input_path = 'input.txt'
output_path = 'output.txt'
with open(input_path) as input_file, open(output_path, 'w+') as output_file:
for line in input_file:
newline = line.replace('"', '""""')
output_file.write(newline)
How can I make sure that the replace only happens with single characters and does not replace "" or """" for example.
I'd like to use python instead of any command line solution. Also, these files are very large, which is why I'm looping through lines instead of loading the whole thing into memory.

Thanks to #mkrieger1 and this question, I was able to put together this solution:
def process_file():
input_path = 'input.txt'
output_path = 'output.txt'
with open(input_path) as input_file, open(output_path, 'w+') as output_file:
for line in input_file:
newline = re.sub(r'(?<!")"(?!")', '""""', line)
output_file.write(newline)

You can use a regular expression:
import re
newline = re.sub(r'^"$', '"""', line)

Related

Python Regex to find CRLF

I'm trying to write a regex that will find any CRLF in python.
I am able to successfully open the file and use newlines to determine what newlines its using CRLF or LF. My numerous regex attempts have failed
with open('test.txt', 'rU') as f:
text = f.read()
print repr(f.newlines)
regex = re.compile(r"[^\r\n]+", re.MULTILINE)
print(regex.match(text))
I've done numerous iterations on the regex and in every case it till either detect \n as \r\n or not work at all.
You could try using the re library to search for the \r & \n patterns.
import re
with open("test.txt", "rU") as f:
for line in f:
if re.search(r"\r\n", line):
print("Found CRLF")
regex = re.compile(r"\r\n")
line = regex.sub("\n", line)
if re.search(r"\r", line):
print("Found CR")
regex = re.compile(r"\r")
line = regex.sub("\n", line)
if re.search(r"\n", line):
print("Found LF")
regex = re.compile(r"\n")
line = regex.sub("\n", line)
print(line)
Assuming your test.txt file looks something like this:
This is a test file
with a line break
at the end of the file.
As I mentioned in a comment, you're opening the file with universal newlines, which means that Python will automatically perform newline conversion when reading from or writing to the file. Your program therefore will not see CR-LF sequences; they will be converted to just LF.
Generally, if you want to portably observe all bytes from a file unchanged, then you must open the file in binary mode:
In Python 2:
from __future__ import print_function
import re
with open('test.txt', 'rb') as f:
text = f.read()
regex = re.compile(r"[^\r\n]+", re.MULTILINE)
print(regex.match(text))
In Python 3:
import re
with open('test.txt', 'rb') as f:
text = f.read()
regex = re.compile(rb"[^\r\n]+", re.MULTILINE)
print(regex.match(text))

Write() starts halfway through

I am new to python and I really don't understand why this is happening: when I run my code, the lower() is only applied to half (or less) of the text file. How I can fix this?
import glob, os, string, re
list_of_files = glob.glob("/Users/louis/Downloads/assignment/data2/**/*.txt")
for file_name in list_of_files:
f = open(file_name, 'r+')
for line in f:
line = line.lower()
f.write(line)
The problem is most probably because you are reading and writing at the same time. And you need to return to the start of the file to write in place of the original content. Try this:
for file_name in list_of_files:
with open(file_name, 'r+') as f:
content = f.read().lower()
f.seek(0, 0) # returns to the start of the file
f.write(content)

How to print ASCII white space from file

I try to found on google but was useless... I use pycharm and I want to print file lines with ascii white space like "\n", "\t", "\r", etc. but all I get is just the normal string like I see it with any text editor but I want to see and that characters to to know how to reproduce that text file for some personal projects.
I do not what to make them by my self so I want to shoe them on console and to just copy-paste them.
Thanks in advance.
# Local variable
fileName = r"C:\Users\...\textFile.txt"
# Create .mcmeta file
def mcmetaMaker(fileName=fileName):
try:
with open(file=fileName, mode="r", encoding="utf-8") as f:
line = f.readline()
while line:
print("\b{}".format(line))
line = f.readline()
finally:
f.close()
# Local variable
fileName = r"C:\Users\...\textFile.txt"
# Create .mcmeta file
def mcmetaMaker(fileName):
try:
with open(fileName, mode='rb') as f:
line = f.readline()
while line:
betterline = str(line)
print(betterline[2:len(betterline)-1])
line = f.readline()
finally:
f.close()
This worked for me. The 'b' in 'rb' stands for binary mode, which doesn't change the original bytes of the string.
print(line) produces b'asdf\r\n' so I converted it to string and sliced it. Hope it doesn't run too slow this way.

Replacing a line in an already opened file python [duplicate]

I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
The original file is moved to a backup file
The standard output is redirected to the original file within the loop
Thus any print statements write back into the original file
fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.
While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
Here's another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
Based on the answer by Thomas Watnedal.
However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.
fileinput is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as file:
for line in file:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
end='' in print statement is to eliminate intermediate blank new lines.
You can used it as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.
Expanding on #Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
if you remove the indent at the like below, it will search and replace in multiple line.
See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)

How do I read / write a file in Python (3) on Windows without introducing carriage returns?

I want to open a file using Python on Windows, perform some regex operations, optionally alter the content and then write the result back to a file.
I can create an example file which looks right (based on the comments on using binary mode in other posts on SO and within the documentation). What I can't see is how I convert the 'binary' data to a usable form without introducing '\r' characters.
An example:
import re
# Create an example file which represents the one I'm actually working on (a Jenkins config file if you're interested).
testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
output_file.write(b'this\nis\na\ntest')
# Try and read the file in as I would in the script I was trying to write.
content = ""
with open(testFileName, 'rb') as content_file:
content = content_file.read()
# Do something to the content
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content) # <-- Fails because it won't operate on 'binary data'
# Write the file back to disk and then realise, frustratingly that something in this process has introduced carriage returns onto every line.
outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
output_file.write(content)
I presume you mean, your text file has return carriages and you don't want them included in the text.
If you use
with open(fileName, 'r', encoding="utf-8", errors="ignore", newline="\r\n") as content_file
or more specifically, set newline="\r\n" in your open call, it should consume the return carriages on new lines.
Edit: Or if you want to operate only on \n then this working example should do it.
import re
testFileName = 'testFile.txt'
with open(testFileName, 'w', newline='\n') as output_file:
output_file.write('this\nis\na\ntest')
content = ""
with open(testFileName, 'r', newline='\n') as content_file:
content = content_file.read()
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)
outputFilename = 'output_'+testFileName
with open(outputFilename, 'w', newline='\n') as output_file:
output_file.write(content)
If I interpreted the question correctly, I first decoded the bytes to string, then did the regex sub. Next, I encoded the string into bytes to be written into the output file.
import re
testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
output_file.write(b'this\nis\na\ntest')
content = ""
with open(testFileName, 'rb') as content_file:
content = content_file.read().decode('utf-8')
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)
outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
output_file.write(content.encode('utf-8'))

Categories

Resources