How do you remove \r and \n using re.py? - python

I'm trying to figure out how to remove \r's and \n's and "\ from a json url site but everytime I try it keeps getting cut off when I output the results. There are:
\r\n\r\n
\n\n
\n
\r
"\wordhere"\
If you can help me I would appreciated.

use strict=False when loading, see python json docs:
>>> s
'\n{\n\r\n\r\n\n\n\n\n\n\n\r\n"wordhere": 0}\n'
>>> json.loads(s, strict=False)
{u'wordhere': 0}

You don't need regex for this.
You could use the replace method from string class.
string = 'abc\r\n\r\n\\\\'
string = string.replace('\r', '')
string = string.replace('\n', '')
string = string.replace('\\', '')
But if you really want to use regex, a possible approach would be:
string = re.sub('\\r*\\n*\\\\*', '', string)
When matching special characters, they need to be escaped with a backslash. When matching a backslash, though, you need to use four backslashes.

Related

Find a character immediately before a match using regex

I need to find a regex where I can reliably find a " that happens before a "" but there are a lot of " before it as well.
For example:
{"Field":"String data "Other String Data""}
I need to fix an error I'm getting in the JSON raw string. I need to make that "" into " and remove that extra " inside the value pair. If I don't remove these I can't make the the string into an object so I can iterate through it.
I am importing this string into Python.
I have tried to figure out some lookbacks and lookarounds but they don't seem to be working.
For example, I tried this: (?=(?=(")).*"")
Have you tried just finding all "" and replacing them with "
re.sub('""', '"', s)
Though this will work for your example it can cause issues if the double double quote is intended in a string.
You could use re.split to break down your string into parts that are between quotes, then replace the non-escaped inside quotes with properly escaped ones.
To break the string apart, you can use an expression that will find quoted character sequences that are followed by one of the JSON delimiter that can appear after a closing quote (i.e.: : , ] }):
s='{"Field":"String data "Other String Data""}'
import re
parts = re.split(r'(".*?"(?=[:,}\]]))',s)
fixed = "".join(re.sub(r'(?<!^)"(?!$)',r'\"',p) for p in parts)
print(parts) # ['{', '"Field"', ':', '"String data "Other String Data""', '}']
print(fixed) # {"Field":"String data \"Other String Data\""}
Obviously this will not cover all possible edge cases (otherwise JSON wouldn't need to escape quotes as it does) but, depending on your data it may be sufficient.

Python re.sub(): trying to replace escaped characters only

With Python 3.x, I need to replace escaped double quotes in some text with some custom pattern, leaving non-escaped double quotes as is. So I write as trivial code as:
text = 'These are "quotes", and these are \"escaped quotes\"'
print(re.sub(r'\"', '~', text))
And expect to see:
These are "quotes", and these are ~escaped quotes~
But instead of above, I get:
These are ~quotes~, and these are ~escaped quotes~
So, what't the correct pattern to replace escaped quotes only?
Background of this issue is an attempt to read 'invalid' JSON file containing Javascript function in it, placed with line feeds as is, but with escaped quotes. If there is easier way to parse JSON with newline characters in key values, I appreciate a hint on that.
First, you need to use a raw string to assign text, so that the backslashes will be kept literally (or you can escape the backslashes).
text = r'These are "quotes", and these are \"escaped quotes\"'
Second, you need to escape the backslash in the regexp so that it will be treated literally by the regexp engine.
print(re.sub(r'\\"', '~', text))
using raw text might help.
import re
text = r'These are "quotes", and these are \"escaped quotes\"'
print(re.sub(r'\\"', '~', text))

Simple python regex

I have a text file and a I want to replace the following pattern:
\"
with:
"
The initial version of what I'm looking at looks like:
{"latestProfileVersion":51,
"scannerAccess":true,
"productRatings":"[{\"7H65018000\":{\"reviewCount\":0,\"avgRating\":0}}
So someone embedded a JSON string inside a JSON response.
This is what I have currently:
rawAuthResponseTextFile = open(rawAuthResponseFilename,'r')
formattedAuthResponse = open('formattedAuthResponse.txt', 'w')
try:
stringVersionOfAuthResponse = rawAuthResponseTextFile.read().replace('\n','')
cleanedStringVersionOfAuthResponse = re.sub(r'\"', '"', stringVersionOfAuthResponse)
jsonVersionOfAuthResponse = json.dumps(cleanedStringVersionOfAuthResponse)
formattedAuthResponse.write(jsonVersionOfAuthResponse)
finally:
rawAuthResponseTextFile.close()
formattedAuthResponse.close
Using http://pythex.org/ I have found that r'\"' should match only \", but this is not the case when I look at the output which appears to be adding additional escape characters.
I know I am doing something wrong because I cannot get the quotes around the embedded string to look like the quotes in the regular JSON no matter how much I tweek it, escape characters or no.
You need to use this regex
\\"
You need to escape \ with \

remove this unidentified character "\" from string python

I want to remove this string "\". But i can't remove it because it's needed to do "\t or \n". Then i try this one """"\"""". But the python still don't do anything. I think the binary is not get this string. This is the code
remove = string.replace(""""\"""", " ")
And I want to replace
"\workspace\file.txt" become "workspace file.txt"
Anyone have any idea? Thanks in advance
You're trying to replace a backslash, but since Python uses backslashes as escape characters, you actually have to escape the backslash itself.
remove = str.replace("\\", " ")
DEMO:
In [1]: r"\workspace\file.txt".replace("\\", " ")
Out[1]: ' workspace file.txt'
Note the leading space. You may want to call str.strip on the end result.
You have to escape the backslash, as it has special meaning in strings. Even in your original string, if you leave it like that, it'll come out...not as you expect:
"\workspace\file.txt" --> '\\workspace\x0cile.txt'
Here's something that will break the string up by a backslash, join them together with a space where the backslash was, and trim the whitespace before and after the string. It also contains the correctly escaped string you need.
" ".join("\\workspace\\file.txt".split("\\")).strip()
View this way you can archive this,
>>> str = r"\workspace\file.txt"
>>> str.replace("\\", " ").strip()
'workspace file.txt'
>>>

Use string as input to re.compile

I want to use a variable in a regex, like this:
variables = ['variableA','variableB']
for i in range(len(variables)):
regex = r"'('+variables[i]+')[:|=|\(](-?\d+(?:\.\d+)?)(?:\))?'"
pattern_variable = re.compile(regex)
match = re.search(pattern_variable, line)
The problem is that python adds an extra backslash character for each backslash character in my regex string (ipython), and makes my regex invalid:
In [76]: regex
Out[76]: "'('+variables[i]+')[:|=|\\(](-?\\d+(?:\\.\\d+)?)(?:\\))?'"
Any tips on how I can avoid this?
No, it only displays extra backslashes so that the string could be read in again and have the correct number of backslashes. Try
print regex
and you will see the difference.
There is no problem there. What you're seeing is the output of the repr() of the string. Since the repr is supposed to be more-or-less reversible back into the original object, it doubles up all backslashes, as well as escaping the type of quote used at the ends of the repr.

Categories

Resources