Python Regex to find CRLF - python

I'm trying to write a regex that will find any CRLF in python.
I am able to successfully open the file and use newlines to determine what newlines its using CRLF or LF. My numerous regex attempts have failed
with open('test.txt', 'rU') as f:
text = f.read()
print repr(f.newlines)
regex = re.compile(r"[^\r\n]+", re.MULTILINE)
print(regex.match(text))
I've done numerous iterations on the regex and in every case it till either detect \n as \r\n or not work at all.

You could try using the re library to search for the \r & \n patterns.
import re
with open("test.txt", "rU") as f:
for line in f:
if re.search(r"\r\n", line):
print("Found CRLF")
regex = re.compile(r"\r\n")
line = regex.sub("\n", line)
if re.search(r"\r", line):
print("Found CR")
regex = re.compile(r"\r")
line = regex.sub("\n", line)
if re.search(r"\n", line):
print("Found LF")
regex = re.compile(r"\n")
line = regex.sub("\n", line)
print(line)
Assuming your test.txt file looks something like this:
This is a test file
with a line break
at the end of the file.

As I mentioned in a comment, you're opening the file with universal newlines, which means that Python will automatically perform newline conversion when reading from or writing to the file. Your program therefore will not see CR-LF sequences; they will be converted to just LF.
Generally, if you want to portably observe all bytes from a file unchanged, then you must open the file in binary mode:
In Python 2:
from __future__ import print_function
import re
with open('test.txt', 'rb') as f:
text = f.read()
regex = re.compile(r"[^\r\n]+", re.MULTILINE)
print(regex.match(text))
In Python 3:
import re
with open('test.txt', 'rb') as f:
text = f.read()
regex = re.compile(rb"[^\r\n]+", re.MULTILINE)
print(regex.match(text))

Related

In python, why slicing doesn't work in readline()? [duplicate]

I'm using Python 3 to loop through lines of a .txt file that contains strings. These strings will be used in a curl command. However, it is only working correctly for the last line of the file. I believe the other lines end with newlines, which throws the string off:
url = https://
with open(file) as f:
for line in f:
str = (url + line)
print(str)
This will return:
https://
endpoint1
https://
endpoint2
https://endpoint3
How can I resolve all strings to concatonate like the last line?
I've looked at a couple of answers like How to read a file without newlines?, but this answer converts all content in the file to one line.
Use str.strip
Ex:
url = https://
with open(file) as f:
for line in f:
s = (url + line.strip())
print(s)
If the strings end with newlines you can call .strip() to remove them. i.e:
url = https://
with open(file) as f:
for line in f:
str = (url + line.strip())
print(str)
I think str.strip() will solve your problem

How to print ASCII white space from file

I try to found on google but was useless... I use pycharm and I want to print file lines with ascii white space like "\n", "\t", "\r", etc. but all I get is just the normal string like I see it with any text editor but I want to see and that characters to to know how to reproduce that text file for some personal projects.
I do not what to make them by my self so I want to shoe them on console and to just copy-paste them.
Thanks in advance.
# Local variable
fileName = r"C:\Users\...\textFile.txt"
# Create .mcmeta file
def mcmetaMaker(fileName=fileName):
try:
with open(file=fileName, mode="r", encoding="utf-8") as f:
line = f.readline()
while line:
print("\b{}".format(line))
line = f.readline()
finally:
f.close()
# Local variable
fileName = r"C:\Users\...\textFile.txt"
# Create .mcmeta file
def mcmetaMaker(fileName):
try:
with open(fileName, mode='rb') as f:
line = f.readline()
while line:
betterline = str(line)
print(betterline[2:len(betterline)-1])
line = f.readline()
finally:
f.close()
This worked for me. The 'b' in 'rb' stands for binary mode, which doesn't change the original bytes of the string.
print(line) produces b'asdf\r\n' so I converted it to string and sliced it. Hope it doesn't run too slow this way.

trying to print human readable ascii string

I am trying to print a string which is human readable ascii but not getting any output. What am i missing?
import string
file = open("file.txt", "r")
data = file.read()
data = data.split("\n")
for line in data:
if line not in string.printable:
continue
else:
print line
If your file's content is text, you should read files like this:
import string
with open("file.txt", "r") as file:
for line in file:
if all( c in string.printable for c in line):
print line
You must check every character individually to see if it is printable. There is another post about checking that string is printable: Test if a python string is printable
Also, you can read about context manager about how to open file right way: What is the most pythonic way to open a file?

Search and replace lines in csv with double quotes using Python

I need to process some .csv files. Some of them have field entries of 1 double quote (") or possibly several mixed in with other text. I need to escape them all. So far I'm doing this:
def process_file():
input_path = 'input.txt'
output_path = 'output.txt'
with open(input_path) as input_file, open(output_path, 'w+') as output_file:
for line in input_file:
newline = line.replace('"', '""""')
output_file.write(newline)
How can I make sure that the replace only happens with single characters and does not replace "" or """" for example.
I'd like to use python instead of any command line solution. Also, these files are very large, which is why I'm looping through lines instead of loading the whole thing into memory.
Thanks to #mkrieger1 and this question, I was able to put together this solution:
def process_file():
input_path = 'input.txt'
output_path = 'output.txt'
with open(input_path) as input_file, open(output_path, 'w+') as output_file:
for line in input_file:
newline = re.sub(r'(?<!")"(?!")', '""""', line)
output_file.write(newline)
You can use a regular expression:
import re
newline = re.sub(r'^"$', '"""', line)

Replacing a line in an already opened file python [duplicate]

I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
The original file is moved to a backup file
The standard output is redirected to the original file within the loop
Thus any print statements write back into the original file
fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.
While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
Here's another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
Based on the answer by Thomas Watnedal.
However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.
fileinput is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as file:
for line in file:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
end='' in print statement is to eliminate intermediate blank new lines.
You can used it as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.
Expanding on #Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
if you remove the indent at the like below, it will search and replace in multiple line.
See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)

Categories

Resources