Regex How to match Empty - python

I have log structure looks like
a b c|
so for example:
Mozilla 5.0 white|
should be matched/extracted to sth like
a: Mozilla, b: 5.0, c: white
but there is an entry in my log is:
iOS|
which can be explained as
a:iOS, b:null, c:null
I am using python3 re, doing match with named group ?P
is there any way to achieve this?

>>> m = re.match(r"(?P<a>[^\s]+)(\s+(?P<b>[^\s]+))?(\s+(?P<c>[^\s]+))?\s*\|")
>>> m.groups()
('iOS', None, None)
>>> m.groupdict()
{'c': None, 'a': 'iOS', 'b': None}
>>> m = re.match(r"(?P<a>[^\s]+)(\s+(?P<b>[^\s]+))?(\s+(?P<c>[^\s]+))?\s*\|")
>>> m.groups()
('Mozilla', ' 5.0', ' white')
>>> m.groupdict()
{'c': 'white', 'a': 'Mozilla', 'b': '5.0'}
UPDATE:
I noticed that the previous version included spaces in the returned groups - I had factored the \s+ into the (?P<>...) to save a couple bytes, but it had that side effect. So I fixed that and also made it tolerant of spaces before the final '|'

You can put your patterns in a list like following :
>>> pattern = ['a', 'b', 'c']
Then use re.findall() to find all the relative parts, then use zip and dict to create the relative dictionary:
>>> s = "IOS|"
>>> dict(zip(pattern,re.findall('([^\s]+)?\s?([^\s]+)?\s?([^\s]+)?\|',s)[0]))
{'a': 'IOS', 'c': '', 'b': ''}
>>>
>>> s = "Mozilla 5.0 white|"
>>>
>>> dict(zip(pattern,re.findall('([^\s]+)?\s?([^\s]+)?\s?([^\s]+)?\|',s)[0]))
{'a': 'Mozilla', 'c': 'white', 'b': '5.0'}

Related

python regex: string with maximum one whitespace

Hello I would like to know how to create a regex pattern with a sting which might contain maximum one white space. More specificly:
s = "a b d d c"
pattern = "(?P<a>.*) +(?P<b>.*) +(?P<c>.*)"
print(re.match(pattern, s).groupdict())
returns:
{'a': 'a b d d', 'b': '', 'c': 'c'}
I would like to have:
{'a': 'a', 'b': 'b d d', 'c': 'c'}
Another option could be to use zip and a dict and generate the characters based on the length of the matches.
You can get the matches which contain at max one whitespace using a repeating pattern matching a non whitespace char \S and repeat 0+ times a space followed by a non whitespace char:
\S(?: \S)*
Regex demo | Python demo
For example:
import re
a=97
regex = r"\S(?: \S)*"
test_str = "a b d d c"
matches = re.findall(regex, test_str)
chars = list(map(chr, range(a, a+len(matches))))
print(dict(zip(chars, matches)))
Result
{'a': 'a', 'b': 'b d d', 'c': 'c'}
With the help of The fourth birds answer I managed to do it in a way I imagened it to be:
import re
s = "a b d d c"
pattern = "(?P<a>\S(?: \S)*) +(?P<b>\S(?: \S)*) +(?P<c>\S(?: \S)*)"
print(re.match(pattern, s).groupdict())
Looks like you just want to split your string with 2 or more spaces. You can do it this way:
s = "a b d d c"
re.split(r' {2,}', s)
will return you:
['a', 'b d d', 'c']
It's probably easier to use re.split, since the delimiter is known (2 or more spaces), but the patterns in-between are not. I'm sure someone better at regex than myself can work out the look-aheads, but by splitting on \s{2,}, you can greatly simplify the problem.
You can make your dictionary of named groups like so:
import re
s = "a b d d c"
x = dict(zip('abc', re.split('\s{2,}', s)))
x
{'a': 'a', 'b': 'b d d', 'c': 'c'}
Where the first arg in zip is the named groups. To extend this to more general names:
groups = ['group_1', 'another group', 'third_group']
x = dict(zip(groups, re.split('\s{2,}', s)))
{'group_1': 'a', 'another group': 'b d d', 'third_group': 'c'}
I found an other solution I even like better:
import re
s = "a b dll d c"
pattern = "(?P<a>(\S*[\t]?)*) +(?P<b>(\S*[\t ]?)*) +(?P<c>(\S*[\t ]?)*)"
print(re.match(pattern, s).groupdict())
here it's even possible to have more than one letter.

assigning a value from dictionary to equation

i want to assign a value to a string equation but i am stuck with the logic.
dic1 = {'d': '2', 'a': '1', 'c': '3', 'b': '2'}
equation_string = 'ab+cd'
i want to the output like:
'12+32' = 44
My Logic:
1 -> writing a for loop to assign the values to the string but i don't know how to skip '+' sign in the string.
for itr in range(0,len(equation_string)):
equation_String[itr] = dict1[equation_str[itr]]
In order to achieve it. You need to firstly replace the key in equation_string of dic1 with the corresponding value using string.replace(). When you made all the replacements within the string, execute the string expression using eval(). Below is the sample code:
>>> dic1 = {'d': '2', 'a': '1', 'c': '3', 'b': '2'}
>>> equation_string = 'ab+cd'
>>> for k, v in dic1.iteritems():
... equation_string = equation_string.replace(k, v)
...
>>> equation_string
'12+32' # Updated value of equation_string
>>> eval(equation_string)
44

How can I ignore a string in python regex group matching?

Say I have the following string
>>> mystr = 'A-ABd54-Bf657'
(a random string of dash-delimited character groups) and want to match the opening part, and the rest of the string, in separate groups. I can use
>>> re.match('(?P<a>[a-zA-Z0-9]+)-(?P<b>[a-zA-Z0-9-]+)', mystr)
This produces a groupdict() like this:
{'a': 'A', 'b': 'ABd54-Bf657'}
How can I get the same regex to match group b but separately match a specific suffix (or set of suffices) if it exists (they exist)? Ideally something like this
>>> myregex = <help me here>
>>> re.match(myregex, 'A-ABd54-Bf657').groupdict()
{'a': 'A', 'b': 'ABd54-Bf657', 'test': None}
>>> re.match(myregex, 'A-ABd54-Bf657-blah').groupdict()
{'a': 'A', 'b': 'ABd54-Bf657-blah', 'test': None}
>>> re.match(myregex, 'A-ABd54-Bf657-test').groupdict()
{'a': 'A', 'b': 'ABd54-Bf657', 'test': 'test'}
Thanks.
mystr = 'A-ABd54-Bf657'
re.match('(?P<a>[a-zA-Z0-9]+)-(?P<b>[a-zA-Z0-9-]+?)(?:-(?P<test>test))?$', mystr)
^ ^
The first indicated ? makes the + quantifier non-greedy, so that it consumes the minimum possible.
The second indicated ? makes the group optional.
The $ is necessary or else the non-greediness plus optionality will match nothing.

Python Parse JSON value with space delimited subfields

I need to parse JSON like this:
{
"entity": " a=123455 b=234234 c=S d=CO e=1 f=user1 timestamp=null",
"otherField": "text"
}
I want to get values for a, b, c, d, e, timestamp separately. Is there a better way than assigning the entity value to a string, then parsing with REGEX?
There is nothing to the JSON standard that parses that value for you, you'll have to do this in Python.
It could be easier to just split that string on whitespace, then on =:
entities = dict(keyvalue.split('=', 1) for keyvalue in data['entity'].split())
This results in:
>>> data = {'entity': " a=123455 b=234234 c=S d=CO e=1 f=user1 timestamp=null"}
>>> dict(keyvalue.split('=', 1) for keyvalue in data['entity'].split())
{'a': '123455', 'c': 'S', 'b': '234234', 'e': '1', 'd': 'CO', 'f': 'user1', 'timestamp': 'null'}
What about this:
>>> dic = dict(item.split("=") for item in s['entity'].strip().split(" "))
>>> dic
>>> {'a': '123455', 'c': 'S', 'b': '234234', 'e': '1', 'd': 'CO', 'f': 'user1', 'timestamp':'null'}
>>> dic['a']
'123455'
>>> dic['b']
'234234'
>>> dic['c']
'S'
>>> dic['d']
'CO'
>>>

Filling a python dictionary in for loop returns same values

For the needs of the project im iterating over some data and adding needed values to premade dictionary.
here is striped down example of code which represents my question:
class Parser:
def __init__(self):
self.records = [['value1','value2','value3'],
['value4','value5','value6'],
['value7','value8','value9']]
def get_parsed(self):
parsed = []
dic = {'key1': '',
'key2': '',
'key3': '',
}
for i in self.records:
dic['key1'] = i[0]
dic['key2'] = i[1]
dic['key3'] = i[2]
parsed.append(dic)
return parsed
What i expect to get is list of dicts like this:
[{'key1':'value1','key2':'value2','key3':'value3'},
{'key1':'value4','key2':'value5','key3':'value6'},
{'key1':'value7','key2':'value8','key3':'value9'}]
But what im getting is:
[{'key1':'value1','key2':'value2','key3':'value3'},
{'key1':'value1','key2':'value2','key3':'value3'},
{'key1':'value1','key2':'value2','key3':'value3'}]
Though if i move dictionary initialization into 'for' loop - im getting the desired result but i don't understand why is that happening?
EDIT:
The question is more "Why does it happens this way"
I did some testing in ipython and that's what i've got:
In [1]: d = {'a':'','b':'','c':''}
In [2]: d
Out[2]: {'a': '', 'b': '', 'c': ''}
In [3]: d['a'] = 'value'
In [4]: d['b'] = 'other_value'
In [5]: d['c'] = 'third_value'
In [6]: d
Out[6]: {'a': 'value', 'b': 'other_value', 'c': 'third_value'}
In [7]: d['a'] = 'inserting new value'
In [8]: d
Out[8]: {'a': 'inserting new value', 'b': 'other_value', 'c': 'third_value'}
So the value of the key could be updated and it changes, why doesn't it happen in FOR loop?
Because your dic is created outside the loop, you only create one dict. If you want three different dicts, you need to create three different dicts, so move the initial creation of dic inside the loop.
To answer your updated question, the issue is that although you think you are appending a new dict with each parsed.append(dic), you are just appending the same dict three times. Append doesn't copy the dict. So whenever you modify that dict, all the dicts in parse show the change, since they are all the same dict. This version of your second code example may be more illustrative:
>>> d = {'a': '', 'b': '', 'c': ''}
>>> stuff = []
>>> stuff.append(d)
>>> print stuff
[{'a': '', 'c': '', 'b': ''}]
>>> d['a'] = 'other'
>>> print stuff
[{'a': 'other', 'c': '', 'b': ''}]
>>> stuff.append(d)
>>> print stuff
[{'a': 'other', 'c': '', 'b': ''}, {'a': 'other', 'c': '', 'b': ''}]
>>> d['a'] = 'yet another'
>>> print stuff
[{'a': 'yet another', 'c': '', 'b': ''}, {'a': 'yet another', 'c': '', 'b': ''}]
Notice that changing the dict "works" in that it indeed changes the value, but regardless of that, the list still contains the same dict multiple times, so any changes you make overwrite whatever changes you made earlier. In the end, your list only contains the last version of the dict, because all earlier changes were overwritten in all dicts in the list.
You are modifying and inserting the same dictionary to your list three times. Create a new dictionary for each iteration of the loop:
for i in self.records:
dic = { 'key1': i[0], 'key2': i[1], 'key3': i[2] }
parsed.append(dic)

Categories

Resources