Using Variables instead of pattern in Regular Expression

Using Variables instead of pattern in Regular Expression - python

I am fairly new to python. I have searched several forums and have not quite found the answer.
I have a list defined and would like to search a line for occurrences in the list. Something like
import re
list = ['a', 'b', 'c']
for xa in range(0, len(list)):
m = re.search(r, list[xa], line):
if m:
print(m)
Is there anyway to pass the variable into regex?

yep, you could do like this,
for xa in range(0, len(lst)):
m = re.search(lst[xa], line)
if m:
print(m.group())
Example:
>>> line = 'foo bar'
>>> import re
>>> lst = ['a', 'b', 'c']
>>> for xa in range(0, len(lst)):
m = re.search(lst[xa], line)
if m:
print(m.group())
a
b

You can build the variable into the regex parameter, for example:
import re
line = '1y2c3a'
lst = ['a', 'b', 'c']
for x in lst:
m = re.search('\d'+x, line)
if m:
print m.group()
Output:
3a
2c

Related

All possible substring in Python

Can anyone help me with finding all the possible substring in a string using python?
E.g:
string = 'abc'
output
a, b, c, ab, bc, abc
P.s : I am a beginner and would appreciate if the solution is simple to understand.

You could do something like:
for length in range(len(string)):
for index in range(len(string) - length):
print(string[index:index+length+1])
Output:
a
b
c
ab
bc
abc

else one way is using the combinations
from itertools import combinations
s = 'abc'
[
''.join(x)
for size in range(1, len(s) + 1)
for x in (combinations(s, size))
]
Out
['a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']

Every substring contains a unique start index and a unique end index (which is greater than the start index). You can use two for loops to get all unique combinations of indices.
def all_substrings(s):
all_subs = []
for end in range(1, len(s) + 1):
for start in range(end):
all_subs.append(s[start:end])
return all_subs
s = 'abc'
print(all_substrings(s)) # prints ['a', 'ab', 'b', 'abc', 'bc', 'c']

You can do like:
def subString(s):
for i in range(len(s)):
for j in range(i+1,len(s)+1):
print(s[i:j])
subString("aashu")
a
aa
aas
aash
aashu
a
as
ash
ashu
s
sh
shu
h
hu
u

How to iterate over a string in groups of n characters instead of one character at a time?

Suppose -
string = "abcdefgh"
If I do -
for i in string:
print (i)
I get -
a
b
c
d
e
f
g
What I want is something like -
ab
bc
cd
de
ef
fg
Or in any other grouping we specify. Is it possible to make a function for this keeping in mind the grouping we require? Thanks

You can use zip():
>>> for i, j in zip(string, string[1:]):
... print(i+j)
...
ab
bc
cd
de
ef
fg
gh
As a function:
def func(seq, n):
return [''.join(item) for item in zip(*[seq[n:] for n in range(n)])]
Example:
>>> for item in func("abcdefgh", 3):
... print(item)
...
abc
bcd
cde
def
efg
fgh

if s is the name of your string, this comprehension will do what you want:
[s[i:i+2] for i in range(0, len(s) - 1)]
Using this, you can easily print the strings on separate lines:
for substr in [s[i:i+2] for i in range(0, len(s) -1)]:
print substr
It can be generalised fairly easily:
def subgroups(s, n):
return [s[i:i+n] for i in range(0, len(s) - 1)]
(and this function can similarly be used to print the resulting substrings in any fashion you like)

This works:
string = "abcdefgh"
i = 0
while i < len(string) - 1:
print(string[i]+string[i+1])
i += 1
Result:
ab
bc
cd
de
ef
fg
gh
If you don't want gh (it's missing in your example), change the while loop to: while i < len(string) - 2:.
Also another way to do (which hasn't been posted), is via regex:
import re
print("\n".join(re.findall(r'(?=(\w\w))', 'abcdefgh')))
The (?=) (lookahead assertion), allows regex patterns to overlap.

import re
def splittext(text, split_by):
'''the regex will take a string and create groupings of n-characters plus a final grouping of any remainder. if no remainder is desired, the |.+ can be removed'''
return re.findall(r".{%d}|.+" % split_by, text)
ret = splittext("HELLOWORLD!", 2)
print "\n".join(ret)
some sample output
>>> re.findall(r".{2}",a)
['HE', 'LL', 'OW', 'OR', 'LD']
>>> re.findall(r".{2}|.{1}",a)
['HE', 'LL', 'OW', 'OR', 'LD', '!']
>>> re.findall(r".{2}|.*",a)
['HE', 'LL', 'OW', 'OR', 'LD', '!', '']
>>> re.findall(r".{2}|.+",a)
['HE', 'LL', 'OW', 'OR', 'LD', '!']
>>> print "\n".join(_)
HE
LL
OW
OR
LD
!
>>>

how to turn a string of letters embedded in squared brackets into embedded lists

I'm trying to find a simple way to convert a string like this:
a = '[[a b] [c d]]'
into the corresponding nested list structure, where the letters are turned into strings:
a = [['a', 'b'], ['c', 'd']]
I tried to use
import ast
l = ast.literal_eval('[[a b] [c d]]')
l = [i.strip() for i in l]
as found here
but it doesn't work because the characters a,b,c,d are not within quotes.
in particular I'm looking for something that turns:
'[[X v] -s]'
into:
[['X', 'v'], '-s']

You can use regex to find all items between brackets then split the result :
>>> [i.split() for i in re.findall(r'\[([^\[\]]+)\]',a)]
[['a', 'b'], ['c', 'd']]
The regex r'\[([^\[\]]+)\]' will match anything between square brackets except square brackets,which in this case would be 'a b' and 'c d' then you can simply use a list comprehension to split the character.
Note that this regex just works for the cases like this, which all the characters are between brackets,and for another cases you can write the corresponding regex, also not that the regex tick won't works in all cases .
>>> a = '[[a b] [c d] [e g]]'
>>> [i.split() for i in re.findall(r'\[([^\[\]]+)\]',a)]
[['a', 'b'], ['c', 'd'], ['e', 'g']]

Use isalpha method of string to wrap all characters into brackets:
a = '[[a b] [c d]]'
a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() else x, a))
Now a is:
'[["a" "b"] ["c" "d"]]'
And you can use json.loads (as #a_guest offered):
json.loads(a.replace(' ', ','))

>>> import json
>>> a = '[[a b] [c d]]'
>>> a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() else x, a))
>>> a
'[["a" "b"] ["c" "d"]]'
>>> json.loads(a.replace(' ', ','))
[[u'a', u'b'], [u'c', u'd']]
This will work with any degree of nested lists following the above pattern, e.g.
>>> a = '[[[a b] [c d]] [[e f] [g h]]]'
>>> ...
>>> json.loads(a.replace(' ', ','))
[[[u'a', u'b'], [u'c', u'd']], [[u'e', u'f'], [u'g', u'h']]]
For the specific example of '[[X v] -s]':
>>> import json
>>> a = '[[X v] -s]'
>>> a = ''.join(map(lambda x: '"{}"'.format(x) if x.isalpha() or x=='-' else x, a))
>>> json.loads(a.replace('[ [', '[[').replace('] ]', ']]').replace(' ', ',').replace('][', '],[').replace('""',''))
[[u'X', u'v'], u'-s']

Split a string into sets of twos

I want to split a string into sets of twos, e.g.
['abcdefg']
to
['ab','cd','ef']
Here is what I have so far:
string = 'acabadcaa\ndarabr'
newString = []
for i in string:
newString.append(string[i:i+2])

One option using regular expressions:
>>> import re
>>> re.findall(r'..', 'abcdefg')
['ab', 'cd', 'ef']
re.findall returns a list of all non-overlapping matches from a string. '..' says match any two consecutive characters.

def splitCount(s, count):
return [''.join(x) for x in zip(*[list(s[z::count]) for z in range(count)])]
splitCount('abcdefg',2)

To split a string s into a list of (guaranteed) equally long substrings of the length n, and truncating smaller fragments:
n = 2
s = 'abcdef'
lst = [s[i:i+n] for i in xrange(0, len(s)-len(s)%n, n)]
['ab', 'cd', 'ef']

Try this
s = "abcdefg"
newList = [s[i:i+2] for i in range(0,len(s)-1,2)]

This function will get any chunk :
def chunk(s,chk):
ln = len(s)
return [s[i:i+chk] for i in xrange(0, ln - ln % chk, chk)]
In [2]: s = "abcdefg"
In [3]: chunk(s,2)
Out[3]: ['ab', 'cd', 'ef']
In [4]: chunk(s,3)
Out[4]: ['abc', 'def']
In [5]: chunk(s,5)
Out[5]: ['abcde']

split string on a number of different characters

I'd like to split a string using one or more separator characters.
E.g. "a b.c", split on " " and "." would give the list ["a", "b", "c"].
At the moment, I can't see anything in the standard library to do this, and my own attempts are a bit clumsy. E.g.
def my_split(string, split_chars):
if isinstance(string_L, basestring):
string_L = [string_L]
try:
split_char = split_chars[0]
except IndexError:
return string_L
res = []
for s in string_L:
res.extend(s.split(split_char))
return my_split(res, split_chars[1:])
print my_split("a b.c", [' ', '.'])
Horrible! Any better suggestions?

>>> import re
>>> re.split('[ .]', 'a b.c')
['a', 'b', 'c']

This one replaces all of the separators with the first separator in the list, and then "splits" using that character.
def split(string, divs):
for d in divs[1:]:
string = string.replace(d, divs[0])
return string.split(divs[0])
output:
>>> split("a b.c", " .")
['a', 'b', 'c']
>>> split("a b.c", ".")
['a b', 'c']
I do like that 're' solution though.

Solution without re:
from itertools import groupby
sep = ' .,'
s = 'a b.c,d'
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]
An explanation is here https://stackoverflow.com/a/19211729/2468006

Not very fast but does the job:
def my_split(text, seps):
for sep in seps:
text = text.replace(sep, seps[0])
return text.split(seps[0])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Variables instead of pattern in Regular Expression - python

yep, you could do like this, for xa in range(0, len(lst)): m = re.search(lst[xa], line) if m: print(m.group()) Example: >>> line = 'foo bar' >>> import re >>> lst = ['a', 'b', 'c'] >>> for xa in range(0, len(lst)): m = re.search(lst[xa], line) if m: print(m.group()) a b

You can build the variable into the regex parameter, for example: import re line = '1y2c3a' lst = ['a', 'b', 'c'] for x in lst: m = re.search('\d'+x, line) if m: print m.group() Output: 3a 2c

Related

All possible substring in Python

How to iterate over a string in groups of n characters instead of one character at a time?

how to turn a string of letters embedded in squared brackets into embedded lists

Split a string into sets of twos

split string on a number of different characters

Categories

Resources