how to replace CRLF in python? [duplicate]

how to replace CRLF in python? [duplicate] - python

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 2 years ago.
>>> print 'aaa\rbbb'.replace('\r','ccc')
aaacccbbb
>>> print 'aaa\rbbb'.replace('\\r','ccc')
bbb
>>> print 'aaa\rbbb'.replace(r'\r','ccc')
bbb
>>>
I am wondering a reason for last two statment. and I am confusing what

The last two variations do not replace the line return character, so it prints out the original 'aaa\rbbb'. It first prints aaa, then the line return character, moving the cursor back to the beginning of the line, and then it prints bbb over the existing aaa.
The reason '\\r' and r'\r' don't replace '\r' is because '\r' is the line return character, but both '\\r' and r'\r' are a backslash followed by the letter "r".

That's because '\\r' is literally the character '\' followed by the character 'r'
The same for r'\r'. the r prefix is used for raw strings.
Both won't match the character \r (because it's a single character)

Related

Regex to exclude specific special characters, spaces and alphabets [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 4 years ago.
I want a regular expression which converts this:
91009-01-28-00 Maximum (c/s)................ 1543.5
to this:
91009-01-28-00 1543.5
So basically, a regular expression that escapes alphabets, spaces, forward slashes and brackets.
I have written the following python code so far:
with open('lcstats.txt', 'r') as lcstats_file:
with open (lcstats_full_path + '_lcstats_full.txt', "a+") as lcstats_full_file:
lcstats_full_file.write(obsid )
for line in lcstats_file.readlines():
if not re.search(r'Maximum [(c/s)]', line):
continue
line = (re.sub(**REGEX**,'',line))
lcstats_full_file.write(line)

It appears you want to have first and last part of the string. If that is the case for every line than spliting it accordingly can be helpful, as in the following code
import re
line = "91009-01-28-00 Maximum (c/s) ................ 1543.5"
line=line.split(' ')
line=line[0]+' '+ line[-1]
print(line)
Output:
91009-01-28-00 1543.5

In your code you are using search to check if you can match Maximum (c/s) and then you want to use a regex to remove that.
I think with your regex Maximum [(c/s)] you mean Maximum \(c/s\). The square brackets make it a character class and (c/s) captures c/s in a capturing group which is not required if you only want to match it.
Wat you could do is match Maximum (c/s) and match one or more times a whitespace or a comma using a character class [ .]+ and replace with an empty string.
Maximum \(c/s\)[ .]+
import re
s = "91009-01-28-00 Maximum (c/s)................ 1543.5"
print( re.sub(r"Maximum \(c/s\)[ .]+", "", s))
Demo

Try using this regex /\s[^0-9]+/ This will match from the first space followed by 1 or more not digit characters. You will need to add a space in the replacement string to keep the two bits of remaining data separate.

Regex:
((?<!\d)\D)
Match all non digits\D which is not followed by a digit \d

Weird behavior of strip() [duplicate]

This question already has answers here:
How do the .strip/.rstrip/.lstrip string methods work in Python?
(4 answers)
Closed 6 years ago.
>>> adf = "123 ABCD#"
>>> df = "<ABCD#>"
>>> adf.strip(df)
>>> '123 '
>>> xc = "dfdfd ABCD#!"
>>> xc.strip(df)
>>> 'dfdfd ABCD#!'
Why does strip() take out ABCD# in adf?
Does strip completely ignore "<" and ">" ?Why does it remove the chars when no "<" and ">" are there in the original string?

The method strip() returns a copy of the string in which all chars have been stripped from the beginning and the end of the string (default whitespace characters).
The characters that are in df, they occur at the end in the string adf. This is not the case in string xc where the first and last character are ! and d.
str.strip([chars]); => If any character in str occurs in chars at last or first index, then that character is stripped off from str. Then it again checks. When no character is stripped, it stops.

Regex can't escape question mark? [duplicate]

This question already has an answer here:
match trailing slash with Python regex
(1 answer)
Closed 8 years ago.
I can't match the question mark character although I escaped it.
I tried escaping with multiple backslashes and also using re.escape().
What am I missing?
Code:
import re
text = 'test?'
result = ''
result = re.match(r'\?',text)
print ("input: "+text)
print ("found: "+str(result))
Output:
input: test?
found: None

re.match only matches a pattern at the begining of string; as in the docs:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
so, either:
>>> re.match(r'.*\?', text).group(0)
'test?
or re.search
>>> re.search(r'\?', text).group(0)
'?'

How to treat \t as a regular string in python [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 8 years ago.
I need to remove \t from a string that is being written however when ever I do
str(contents).replace('\t', ' ')
it just removes all of the tabs. I understand this is because \t is how you write tabs but I want to know how to just treat it like a regular string.

You can prefix the string with r and create a raw-string:
str(contents).replace(r'\t', ' ')
Raw-strings do not process escape sequences. Below is a demonstration:
>>> mystr = r'a\t\tb' # Escape sequences are ignored
>>> print(mystr)
a\t\tb
>>> print(mystr.replace('\t', ' ')) # This replaces tab characters
a\t\tb
>>> print(mystr.replace(r'\t', ' ')) # This replaces the string '\t'
a b
>>>

Python: Split a string, respect and preserve quotes [duplicate]

This question already has answers here:
Split a string by spaces -- preserving quoted substrings -- in Python
(16 answers)
Closed 7 years ago.
Using python, I want to split the following string:
a=foo, b=bar, c="foo, bar", d=false, e="false"
This should result in the following list:
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false'"']
When using shlex in posix-mode and splitting with ", ", the argument for cgets treated correctly. However, it removes the quotes. I need them because false is not the same as "false", for instance.
My code so far:
import shlex
mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'
splitter = shlex.shlex(mystring, posix=True)
splitter.whitespace += ','
splitter.whitespace_split = True
print list(splitter) # ['a=foo', 'b=bar', 'c=foo, bar', 'd=false', 'e=false']

>>> s = r'a=foo, b=bar, c="foo, bar", d=false, e="false", f="foo\", bar"'
>>> re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', s)
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false"', 'f="foo\\", bar"']
The regex pattern "[^"]*" matches a simple quoted string.
"(?:\\.|[^"])*" matches a quoted string and skips over escaped quotes because \\. consumes two characters: a backslash and any character.
[^\s,"] matches a non-delimiter.
Combining patterns 2 and 3 inside (?: | )+ matches a sequence of non-delimiters and quoted strings, which is the desired result.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to replace CRLF in python? [duplicate] - python

That's because '\\r' is literally the character '\' followed by the character 'r' The same for r'\r'. the r prefix is used for raw strings. Both won't match the character \r (because it's a single character)

Related

Regex to exclude specific special characters, spaces and alphabets [duplicate]

Weird behavior of strip() [duplicate]

Regex can't escape question mark? [duplicate]

How to treat \t as a regular string in python [duplicate]

Python: Split a string, respect and preserve quotes [duplicate]

Categories

Resources