Find replacing dictionaries in regex - python

Okay, so I have the following piece of code.
out = out + re.sub('\{\{([A-z]+)\}\}', values[re.search('\{\{([A-z]+)\}\}',item).group().strip('{}')],item) + " "
Or, more broken down:
out = out + re.sub(
'\{\{([A-z]+)\}\}',
values[
re.search(
'\{\{([A-z]+)\}\}',
item
).group().strip('{}')
],
item
) + " "
So, basically, if you give it a string which contains {{reference}}, it will find instances of that, and replace them with the given reference. The issue with it in it's current form is that it can only work based on the first reference. For example, say my values dictionary was
values = {
'bob': 'steve',
'foo': 'bar'
}
and we passed it the string
item = 'this is a test string for {{bob}}, made using {{foo}}'
I want it to put into out
'this is a test string for steve, made using bar'
but what it currently outputs is
'this is a test string for steve, made using steve'
How can I change the code such that it takes into account the position in the loop.
It should be noted, that doing a word split would not work, as the code needs to work even if the input is {{foo}}{{steve}}

I got the output using the following code,
replace_dict = { 'bob': 'steve','foo': 'bar'}
item = 'this is a test string for {{foo}}, made using {{steve}}'
replace_lst = re.findall('\{\{([A-z]+)\}\}', item)
out = ''
for r in replace_lst:
if r in replace_dict:
item = item.replace('{{' + r + '}}', replace_dict[r])
print item

How's this?
import re
values = {
'bob': 'steve',
'foo': 'bar'
}
item = 'this is a test string for {{bob}}, made using {{foo}}'
pat = re.compile(r'\{\{(.*?)\}\}')
fields = pat.split(item)
fields[1] = values[fields[1]]
fields[3] = values[fields[3]]
print ''.join(fields)

If you could change the format of reference from {{reference}} to {reference}, you could achieve your needs just with format method (instead of using regex):
values = {
'bob': 'steve',
'foo': 'bar'
}
item = 'this is a test string for {bob}, made using {foo}'
print(item.format(**values))
# prints: this is a test string for steve, made using bar

In your code, re.search will start looking from the beginning of the string each time you call it, thus always returning the first match {{bob}}.
You can access the match object you are currently replacing by passing a function as replacement to re.sub:
values = { 'bob': 'steve','foo': 'bar'}
item = 'this is a test string for {{bob}}, made using {{foo}}'
pattern = r'{{([A-Za-z]+)}}'
# replacement function
def get_value(match):
return values[match.group(1)]
result = re.sub(pattern, get_value, item)
# print result => 'this is a test string for steve, made using bar'

Related

Replace a python string using format function

I have a string which gets replaced in the backend code. ${} indicates that the string pattern is to be replaced. Example -
I am going to ${location} for ${days}
I have a dict with values to be replaced below. I want to find if ${location} is present in the text and replace it with the key value in str_replacements. Below is my code. The string replacement does not work using .format. It works using %s but i do not want to use it.
text = "I am going to ${location} for ${days}"
str_replacements = {
'location': 'earth',
'days': 100,
'vehicle': 'car',
}
for key, val in str_replacements.iteritems():
str_to_replace = '${{}}'.format(key)
# str_to_replace returned is ${}. I want the key to be present here.
# For instance the value of str_to_replace needs to be ${location} so
# that i can replace it in the text
if str_to_replace in text:
text = text.replace(str_to_replace, val)
I do not want to use %s to substitute the string. I want to achieve the functionality with .format function.
Use an extra {}
Ex:
text = "I am going to ${location} for ${days}"
str_replacements = {
'location': 'earth',
'days': 100,
'vehicle': 'car',
}
for key, val in str_replacements.items():
str_to_replace = '${{{}}}'.format(key)
if str_to_replace in text:
text = text.replace(str_to_replace, str(val))
print(text)
# -> I am going to earth for 100
You could use a small regular expression instead:
import re
text = "I am going to ${location} for ${days} ${leave_me_alone}"
str_replacements = {
'location': 'earth',
'days': 100,
'vehicle': 'car',
}
rx = re.compile(r'\$\{([^{}]+)\}')
text = rx.sub(lambda m: str(str_replacements.get(m.group(1), m.group(0))), text)
print(text)
This would yield
I am going to earth for 100 ${leave_me_alone}
You can do it in two ways:
Parameterised - Order of parameters is not followed strictly
Non Parametrised - Order of parameters is not followed strictly
Example as follows:

Replacing string by using regex and for loop value in python

I want to replace the value of first variable using second variable but i want to keep the commas. i used regex, but i don't know if its possible cause i'm still learning it. so here is my code.
import re
names = 'Mat,Rex,Jay'
nicknames = 'AgentMat LegendRex KillerJay'
split_nicknames = nicknames.split(' ')
for a in range(len(split_nicknames)):
replace = re.sub('\\w+', split_nicknames[a], names)
print(replace)
my output is:
KillerJay,KillerJay,KillerJay
and i want a output like this:
AgentMat,LegendRex,KillerJay
I suspect what you are looking for should resemble something like this:
import re
testString = 'This is my complicated test string where Mat, Rex and Jay are all having a lark, but MatReyRex is not changed'
mapping = { 'Mat' : 'AgentMat',
'Jay' : 'KillerJay',
'Rex' : 'LegendRex'
}
reNames = re.compile(r'\b('+'|'.join(mapping)+r')\b')
res = reNames.sub(lambda m: mapping[m.group(0)], testString)
print(res)
Executing this results in the mapped result:
This is my complicated test string where AgentMat, LegendRex and KillerJay are all having a lark, but MatReyRex is not changed
We can build the mapping as follows :
import re
names = 'Mat,Rex,Jay'
nicknames = 'AgentMat LegendRex KillerJay'
my_dict = dict(zip(names.split(','), nicknames.split(' ')))
replace = re.sub(r'\b\w+\b', lambda m:my_dict[m[0]], names)
print(replace)
Then use lambda to apply the mapping.

Using Python Parse to get a string of numbers, letters, whitespace, and symbols

I am attempting to parse a log using the Parse library from Python. (https://pypi.python.org/pypi/parse) For my purposes I need to use the type specifiers in the format string, however, some of the data that I am parsing might be a combination of several of those types.
For example:
"4.56|test-1 Cool|dog"
I can parse the number of the front using the format specifier g (general number) and w (word) for "dog" at the end. However, the middle phrase "test-1 Cool" is a number, letters, whitespace, and a dash. Using any of the specifiers alone doesn't seem to work (have tried W,w,s, and S). I would like to extract that phrase as a string.
Without the problem phrase, I would just do this:
test = "|4.56|dog|"
result = parse('|{number:g}|{word:w}|', test)
EDIT: I have had some success using a custom type conversion shown below:
def SString(string):
return string
test = "|4.56|test-1 Cool|dog|"
result = parse('|{number:g}|{other:SString}|{word:w}|', test, dict(SString=SString))
You can do that with some code like this:
from parse import *
test = "4.56|test-1 Cool|dog"
result = parse('{number:g}|{other}|{word:w}', test)
print result
#<Result () {'other': 'test-1 Cool', 'word': 'dog', 'number': 4.56}>
Also, for type checking you can use re module (for example):
from parse import *
import re
def SString(string):
if re.match('\w+-\d+ \w+',string):
return string
else:
return None
test = "|4.56|test-1 Cool|dog|"
result = parse('|{number:g}|{other:SString}|{word:w}|', test, dict(SString=SString))
print(result)
#<Result () {'other': 'test-1 Cool', 'word': 'dog', 'number': 4.56}>
test = "|4.56|t3est Cool|dog|"
result = parse('|{number:g}|{other:SString}|{word:w}|', test, dict(SString=SString))
print(result)
#<Result () {'other': None, 'word': 'dog', 'number': 4.56}>
How about trying
test.split("|")

match only exact word/string in python

How to match exact string/word while searching a list. I have tried, but its not correct. below I have given the sample list, my code and the test results
list = ['Hi, friend', 'can you help me?']
my code
dic=dict()
for item in list:
for word in item.split():
dic.setdefault(word, list()).append(item)
print dic.get(s)
test results:
s = "can" ~ expected output: 'can you help me?' ~ output I get: 'can you help me?'
s = "you" ~ expected output: *nothing* ~ output I get: 'can you help me?'
s = "Hi," ~ expected output: 'Hi, friend' ~ output I get: 'Hi, friend'
s = "friend" ~ expected output: *nothing* ~ output I get: 'Hi, friend'
My list contains 1500 strings. Anybody can help me??
Looks like you need a map of sentences and their starting word, so you don't need to map all words in that sentence but only the first one.
from collections import defaultdict
sentences = ['Hi, friend', 'can you help me?']
start_sentence_map = defaultdict(list)
for sentence in sentences:
start = sentence.split()[0]
start_sentence_map[start].append(sentence)
for s in ["can", "you", "Hi,", "friend"]:
print s,":",start_sentence_map.get(s)
output:
can : ['can you help me?']
you : None
Hi, : ['Hi, friend']
friend : None
Also note few things from the code above
Don't use name list as name of variable because python uses it for list class
Use default dict which makes it easy to directly add entries to dictionary instead of first adding a default entry
Better descriptive names instead of mylist, or dic
In case if you just want to see if the sentence starts with a given words you can try startswith if you don;t want the searched word to be at word boundary or split()[0] if you want it to match at word boundary. As an example
>>> def foo(s): # # word boundary
return [x for x in l if x.split()[0]==s]
>>> def bar(s): # Prefix
return [x for x in l if x.startswith(s)]
Also refrain from overlaying python global name-space like what you did when you named your list as list. I have called it l in my example.

Simple way to convert a string to a dictionary

What is the simplest way to convert a string of keyword=values to a dictionary, for example the following string:
name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"
to the following python dictionary:
{'name':'John Smith', 'age':34, 'height':173.2, 'location':'US', 'avatar':':,=)'}
The 'avatar' key is just to show that the strings can contain = and , so a simple 'split' won't do. Any ideas? Thanks!
This works for me:
# get all the items
matches = re.findall(r'\w+=".+?"', s) + re.findall(r'\w+=[\d.]+',s)
# partition each match at '='
matches = [m.group().split('=', 1) for m in matches]
# use results to make a dict
d = dict(matches)
I would suggest a lazy way of doing this.
test_string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
eval("dict({})".format(test_string))
{'age': 34, 'location': 'US', 'avatar': ':,=)', 'name': 'John Smith', 'height': 173.2}
Hope this helps someone !
Edit: since the csv module doesn't deal as desired with quotes inside fields, it takes a bit more work to implement this functionality:
import re
quoted = re.compile(r'"[^"]*"')
class QuoteSaver(object):
def __init__(self):
self.saver = dict()
self.reverser = dict()
def preserve(self, mo):
s = mo.group()
if s not in self.saver:
self.saver[s] = '"%d"' % len(self.saver)
self.reverser[self.saver[s]] = s
return self.saver[s]
def expand(self, mo):
return self.reverser[mo.group()]
x = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
qs = QuoteSaver()
y = quoted.sub(qs.preserve, x)
kvs_strings = y.split(',')
kvs_pairs = [kv.split('=') for kv in kvs_strings]
kvs_restored = [(k, quoted.sub(qs.expand, v)) for k, v in kvs_pairs]
def converter(v):
if v.startswith('"'): return v.strip('"')
try: return int(v)
except ValueError: return float(v)
thedict = dict((k.strip(), converter(v)) for k, v in kvs_restored)
for k in thedict:
print "%-8s %s" % (k, thedict[k])
print thedict
I'm emitting thedict twice to show exactly how and why it differs from the required result; the output is:
age 34
location US
name John Smith
avatar :,=)
height 173.2
{'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)',
'height': 173.19999999999999}
As you see, the output for the floating point value is as requested when directly emitted with print, but it isn't and cannot be (since there IS no floating point value that would display 173.2 in such a case!-) when the print is applied to the whole dict (because that inevitably uses repr on the keys and values -- and the repr of 173.2 has that form, given the usual issues about how floating point values are stored in binary, not in decimal, etc, etc). You might define a dict subclass which overrides __str__ to specialcase floating-point values, I guess, if that's indeed a requirement.
But, I hope this distraction doesn't interfere with the core idea -- as long as the doublequotes are properly balanced (and there are no doublequotes-inside-doublequotes), this code does perform the required task of preserving "special characters" (commas and equal signs, in this case) from being taken in their normal sense when they're inside double quotes, even if the double quotes start inside a "field" rather than at the beginning of the field (csv only deals with the latter condition). Insert a few intermediate prints if the way the code works is not obvious -- first it changes all "double quoted fields" into a specially simple form ("0", "1" and so on), while separately recording what the actual contents corresponding to those simple forms are; at the end, the simple forms are changed back into the original contents. Double-quote stripping (for strings) and transformation of the unquoted strings into integers or floats is finally handled by the simple converter function.
Here is a more verbose approach to the problem using pyparsing. Note the parse actions
which do the automatic conversion of types from strings to ints or floats. Also, the
QuotedString class implicitly strips the quotation marks from the quoted value. Finally,
the Dict class takes each 'key = val' group in the comma-delimited list, and assigns
results names using the key and value tokens.
from pyparsing import *
key = Word(alphas)
EQ = Suppress('=')
real = Regex(r'[+-]?\d+\.\d+').setParseAction(lambda t:float(t[0]))
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
qs = QuotedString('"')
value = real | integer | qs
dictstring = Dict(delimitedList(Group(key + EQ + value)))
Now to parse your original text string, storing the results in dd. Pyparsing returns an
object of type ParseResults, but this class has many dict-like features (support for keys(),
items(), in, etc.), or can emit a true Python dict by calling asDict(). Calling dump()
shows all of the tokens in the original parsed list, plus all of the named items. The last
two examples show how to access named items within a ParseResults as if they were attributes of
a Python object.
text = 'name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
dd = dictstring.parseString(text)
print dd.keys()
print dd.items()
print dd.dump()
print dd.asDict()
print dd.name
print dd.avatar
Prints:
['age', 'location', 'name', 'avatar', 'height']
[('age', 34), ('location', 'US'), ('name', 'John Smith'), ('avatar', ':,=)'), ('height', 173.19999999999999)]
[['name', 'John Smith'], ['age', 34], ['height', 173.19999999999999], ['location', 'US'], ['avatar', ':,=)']]
- age: 34
- avatar: :,=)
- height: 173.2
- location: US
- name: John Smith
{'age': 34, 'height': 173.19999999999999, 'location': 'US', 'avatar': ':,=)', 'name': 'John Smith'}
John Smith
:,=)
The following code produces the correct behavior, but is just a bit long! I've added a space in the avatar to show that it deals well with commas and spaces and equal signs inside the string. Any suggestions to shorten it?
import hashlib
string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"'
strings = {}
def simplify(value):
try:
return int(value)
except:
return float(value)
while True:
try:
p1 = string.index('"')
p2 = string.index('"',p1+1)
substring = string[p1+1:p2]
key = hashlib.md5(substring).hexdigest()
strings[key] = substring
string = string[:p1] + key + string[p2+1:]
except:
break
d = {}
for pair in string.split(', '):
key, value = pair.split('=')
if value in strings:
d[key] = strings[value]
else:
d[key] = simplify(value)
print d
Here is a approach with eval, I considered it is as unreliable though, but its works for your example.
>>> import re
>>>
>>> s='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
>>>
>>> eval("{"+re.sub('(\w+)=("[^"]+"|[\d.]+)','"\\1":\\2',s)+"}")
{'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999}
>>>
Update:
Better use the one pointed by Chris Lutz in the comment, I believe Its more reliable, because even there is (single/double) quotes in dict values, it might works.
Here's a somewhat more robust version of the regexp solution:
import re
keyval_re = re.compile(r'''
\s* # Leading whitespace is ok.
(?P<key>\w+)\s*=\s*( # Search for a key followed by..
(?P<str>"[^"]*"|\'[^\']*\')| # a quoted string; or
(?P<float>\d+\.\d+)| # a float; or
(?P<int>\d+) # an int.
)\s*,?\s* # Handle comma & trailing whitespace.
|(?P<garbage>.+) # Complain if we get anything else!
''', re.VERBOSE)
def handle_keyval(match):
if match.group('garbage'):
raise ValueError("Parse error: unable to parse: %r" %
match.group('garbage'))
key = match.group('key')
if match.group('str') is not None:
return (key, match.group('str')[1:-1]) # strip quotes
elif match.group('float') is not None:
return (key, float(match.group('float')))
elif match.group('int') is not None:
return (key, int(match.group('int')))
It automatically converts floats & ints to the right type; handles single and double quotes; handles extraneous whitespace in various locations; and complains if a badly formatted string is supplied
>>> s='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"'
>>> print dict(handle_keyval(m) for m in keyval_re.finditer(s))
{'age': 34, 'location': 'US', 'name': 'John Smith', 'avatar': ':,=)', 'height': 173.19999999999999}
do it step by step
d={}
mystring='name="John Smith", age=34, height=173.2, location="US", avatar=":,=)"';
s = mystring.split(", ")
for item in s:
i=item.split("=",1)
d[i[0]]=i[-1]
print d
I think you just need to set maxsplit=1, for instance the following should work.
string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"'
newDict = dict(map( lambda(z): z.split("=",1), string.split(", ") ))
Edit (see comment):
I didn't notice that ", " was a value under avatar, the best approach would be to escape ", " wherever you are generating data. Even better would be something like JSON ;). However, as an alternative to regexp, you could try using shlex, which I think produces cleaner looking code.
import shlex
string = 'name="John Smith", age=34, height=173.2, location="US", avatar=":, =)"'
lex = shlex.shlex ( string )
lex.whitespace += "," # Default whitespace doesn't include commas
lex.wordchars += "." # Word char should include . to catch decimal
words = [ x for x in iter( lex.get_token, '' ) ]
newDict = dict ( zip( words[0::3], words[2::3]) )
Always comma separated? Use the CSV module to split the line into parts (not checked):
import csv
import cStringIO
parts=csv.reader(cStringIO.StringIO(<string to parse>)).next()

Categories

Resources