When calling external API I get this kind of response It is 2 lines of 44 characters total 88. Which is perfect.
r.text = "P<RUSBASZNAGDCIEWS<<AZIZAS<<<<<<<<<<<<<<<<<<"
"00000000<ORUS5911239F160828525911531023<<<10"
But some times I get this kind of response and I need to make it the same as in example 1. 2 lines of 44 characters.
All this big く should be replaced with normal < and spaces also removed
r.text = "P<RUSALUZAFEE<<ZUZILLAS<<<<
くくくくくくくくくく、
00000000<ORUS7803118 F210127747803111025<<<64"
expected OUTPUT:
string = "P<RUSALUZAFEE<<ZUZILLAS<<<<<<<<<<<<<<<<<<<<<
00000000<ORUS7803118F210127747803111025<<<64"
Here is best attempt guess you will find it helpful
import re
txt =""" P<RUSALUZAFEE<<ZUZILLAS<<<<
くくくくくくくくくく、
00000000<ORUS7803118 F210127747803111025<<<64"""
txt_1 = re.sub('(く |く)', '<', txt).replace('、','')
txt_2 = re.sub(r'\s+', '', txt_1)
regex = r"(\w<?\w+<+\w+<+)(\w*<?\w+<+\w+)"
result = re.match(regex, txt_2)
print(f'{result.group(1)}\n{result.group(2)}')
Output
P<RUSALUZAFEE<<ZUZILLAS<<<<<<<<<<<<<<
00000000<ORUS7803118F210127747803111025<<<64
import re
pattern = r'\n.*く.*\n'
s = re.compile(pattern)
string = s.sub('\n', r.text)
you can do it with re.sub from the module re like the following
new_txt = re.sub("<", "く", old_txt)
or with str.replace like the following
new_str = OldStr.replace("く", "<")
or use regex and combine it with if else like
if pattern:
re.sub # or str.replace
else:
pass
Related
i have string like this 'approved:rakeshc#IAD.GOOGLE.COM'
i would like extract text after ':' and before '#'
in this case the test to be extracted is rakeshc
it can be done using split method - 'approved:rakeshc#IAD.GOOGLE.COM'.split(':')[1].split('#')[0]
but i would want this be done using regular expression.
this is what i have tried so far.
import re
iptext = 'approved:rakeshc#IAD.GOOGLE.COM'
re.sub('^(.*approved:)',"", iptext) --> give everything after ':'
re.sub('(#IAD.GOOGLE.COM)$',"", iptext) --> give everything before'#'
would want to have the result in single expression. expression would be used to replace a string with only the middle string
Here is a regex one-liner:
inp = "approved:rakeshc#IAD.GOOGLE.COM"
output = re.sub(r'^.*:|#.*$', '', inp)
print(output) # rakeshc
The above approach is to strip all text from the start up, and including, the :, as well as to strip all text from # until the end. This leaves behind the email ID.
Use a capture group to copy the part between the matches to the result.
result = re.sub(r'.*approved:(.*)#IAD\.GOOGLE\.COM$', r'\1', iptext)
Hope this works for you:
import re
input_text = "approved:rakeshc#IAD.GOOGLE.COM"
out = re.search(':(.+?)#', input_text)
if out:
found = out.group(1)
print(found)
You can use this one-liner:
re.sub(r'^.*:(\w+)#.*$', r'\1', iptext)
Output:
rakeshc
here my data string :
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL
i use this code, but when i run it it shows empty at DATANORMAL
mydata = re.findall(r'MYDATA=(.*)' r'_.*', mystring)
print mydata
and it just shows : NOTNORMAL
i want both to work, and displays data like this:
DATANORMAL
NOTNORMAL
how do i do it? Thanks.
Try it online!
import re
mystring = """
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL
"""
mydata = re.findall(r'^\s*MYDATA=(?:.+_)?(.+?)\s*$', mystring, re.M)
print(mydata)
In case if you need word before _, not after, then use regex r'^\s*MYDATA=(.+?)(?:_.+)?\s*$' in code above, you may try this second variant here.
Based on what you describe, you might want to use an alternation here:
\bMYDATA=((?:DATA|(?:DATA_))\S+)\b
Script:
inp = "some text MYDATA=DATANORMAL more text MYDATA=DATA_NOTNORMAL"
mydata = re.findall(r'\bMYDATA=((?:DATA|(?:DATA_))\S+)\b', inp)
print(mydata)
This prints:
['DATANORMAL', 'DATA_NOTNORMAL']
I guess you need to add flags=re.M?
import re
mystring = """
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL"""
pattern = re.compile("MYDATA=(?:DATA_)?(\w+)",flags=re.M)
print(pattern.findall(mystring))
I'm trying to repair a JSON feed using re.sub() regex expressions in Python. (I'm also working with the feed provider to fix it). I have two expressions to fix:
1.
"milepost": "
"milepost": "723.46
which are missing an end quote, and
2.
},
}
which shouldn't have the comma. Note, there is no blank line between them, it's just "},\n }" (trouble with this editor...)
I have a short snippet of the feed, located at:
http://hardhat.ahmct.ucdavis.edu/tmp/test.txt
Sample code below. Here, I have tests for finding the patterns, and then for doing the replacements. The match for #2 gives some odd results, but I can't see why:
Brace matches found:
[('}', '\r\n }')]
The match for #1 seems good.
Main problem is, when I do the re.sub, my resulting string has "\x01\x02" in it. I have no clue where this is coming from. Any advice greatly appreciated.
Sample code:
import urllib2
import json
import re
if __name__ == "__main__":
# wget version of real feed:
# url = "http://hardhat.ahmct.ucdavis.edu/tmp/test.json"
# Short text, for milepost and brace substitution test:
url = "http://hardhat.ahmct.ucdavis.edu/tmp/test.txt"
request = urllib2.urlopen(url)
rawResponse = request.read()
# print("Raw response:")
# print(rawResponse)
# Find extra comma after end of records:
p1 = re.compile('(}),(\r?\n *})')
l1 = p1.findall(rawResponse)
print("Brace matches found:")
print(l1)
# Check milepost:
#p2 = re.compile('( *\"milepost\": *\")')
p2 = re.compile('( *\"milepost\": *\")([0-9]*\.?[0-9]*)\r?\n')
l2 = p2.findall(rawResponse)
print("Milepost matches found:")
print(l2)
# Do brace substitutions:
subst = "\1\2"
response = re.sub(p1, subst, rawResponse)
# Do milepost substitutions:
subst = "\1\2\""
response = re.sub(p2, subst, response)
print(response)
You need to use raw strings, or "\1\2" will be interpreted by the Python string processor as ASCII 01 ASCII 02 instead of backslash 1 backslash 2.
Instead of
subst = "\1\2"
use
subst = r"\1\2" # or subst = "\\1\\2"
Things get a bit trickier with the second replacement:
subst = "\1\2\""
needs to become
subst = r'\1\2"' # or subst = "\\1\\2\""
I have a file with lines of this form:
ClientsName(0) = "SUPERBRAND": ClientsName(1) = "GREATSTUFF": cClientsNames.Add Key:="SUPER", Item:=ClientsName
and I would like to capture the names in quotes "" after ClientsName(0) = and ClientsName(1) =.
So far, I came up with this code
import re
f = open('corrected_clients_data.txt', 'r')
result = ''
re_name = "ClientsName\(0\) = (.*)"
for line in f:
name = re.search(line, re_name)
print (name)
which is returning None at each line...
Two sources of error can be: the backslashes and the capture sequence (.*)...
You can do that more easily using re.findall and using \d instead of 0 to make it more general:
import re
s = '''ClientsName(0) = "SUPERBRAND": ClientsName(1) = "GREATSTUFF": cClientsNames.Add Key:="SUPER", Item:=ClientsName'''
>>> print re.findall(r'ClientsName\(\d\) = "([^"]*)"', s)
['SUPERBRAND', 'GREATSTUFF']
Another thing you must note is that your order of arguments to search() or findall() is wrong. It should be as follows: re.search(pattern, string)
You can use re.findall and just take the first two matches:
>>> s = '''ClientsName(0) = "SUPERBRAND": ClientsName(1) = "GREATSTUFF": cClientsNames.Add Key:="SUPER", Item:=ClientsName'''
>>> re.findall(r'\"([^"]+)\"' , s)[:2]
['SUPERBRAND', 'GREATSTUFF']
try this
import re
text_file = open("corrected_clients_data.txt", "r")
text = text_file.read()
matches=re.findall(r'\"(.+?)\"',text)
text_file.close()
if you notice the question mark(?) indicates that we have to stop reading the string
at the first ending double quotes encountered.
hope this is helpful.
Use a lookbehind to get the value of ClientsName(0) and ClientsName(1) through re.findall function,
>>> import re
>>> str = '''ClientsName(0) = "SUPERBRAND": ClientsName(1) = "GREATSTUFF": cClientsNames.Add Key:="SUPER", Item:=ClientsName'''
>>> m = re.findall(r'(?<=ClientsName\(0\) = \")[^"]*|(?<=ClientsName\(1\) = \")[^"]*', str)
>>> m
['SUPERBRAND', 'GREATSTUFF']
Explanation:
(?<=ClientsName\(0\) = \") Positive lookbehind is used to set the matching marker just after to the string ClientsName(0) = "
[^"]* Then it matches any character not of " zero or more times. So it match the first value ie, SUPERBRAND
| Logical OR operator used to combine two regexes.
(?<=ClientsName\(1\) = \")[^"]* Matches any character just after to the string ClientsName(1) = " upto the next ". Now it matches the second value ie, GREATSTUFF
I would like to replace (and not remove) all punctuation characters by " " in a string in Python.
Is there something efficient of the following flavour?
text = text.translate(string.maketrans("",""), string.punctuation)
This answer is for Python 2 and will only work for ASCII strings:
The string module contains two things that will help you: a list of punctuation characters and the "maketrans" function. Here is how you can use them:
import string
replace_punctuation = string.maketrans(string.punctuation, ' '*len(string.punctuation))
text = text.translate(replace_punctuation)
Modified solution from Best way to strip punctuation from a string in Python
import string
import re
regex = re.compile('[%s]' % re.escape(string.punctuation))
out = regex.sub(' ', "This is, fortunately. A Test! string")
# out = 'This is fortunately A Test string'
This workaround works in python 3:
import string
ex_str = 'SFDF-OIU .df !hello.dfasf sad - - d-f - sd'
#because len(string.punctuation) = 32
table = str.maketrans(string.punctuation,' '*32)
res = ex_str.translate(table)
# res = 'SFDF OIU df hello dfasf sad d f sd'
There is a more robust solution which relies on a regex exclusion rather than inclusion through an extensive list of punctuation characters.
import re
print(re.sub('[^\w\s]', '', 'This is, fortunately. A Test! string'))
#Output - 'This is fortunately A Test string'
The regex catches anything which is not an alpha-numeric or whitespace character
Replace by ''?.
What's the difference between translating all ; into '' and remove all ;?
Here is to remove all ;:
s = 'dsda;;dsd;sad'
table = string.maketrans('','')
string.translate(s, table, ';')
And you can do your replacement with translate.
In my specific way, I removed "+" and "&" from the punctuation list:
all_punctuations = string.punctuation
selected_punctuations = re.sub(r'(\&|\+)', "", all_punctuations)
print selected_punctuations
str = "he+llo* ithis& place% if you * here ##"
punctuation_regex = re.compile('[%s]' % re.escape(selected_punctuations))
punc_free = punctuation_regex.sub("", str)
print punc_free
Result: he+llo ithis& place if you here