Python: list and string matching - python

I have following:
temp = "aaaab123xyz#+"
lists = ["abc", "123.35", "xyz", "AND+"]
for list in lists
if re.match(list, temp, re.I):
print "The %s is within %s." % (list,temp)
The re.match is only match the beginning of the string, How to I match substring in between too.

You can use re.search instead of re.match.
It also seems like you don't really need regular expressions here. Your regular expression 123.35 probably doesn't do what you expect because the dot matches anything.
If this is the case then you can do simple string containment using x in s.

Use re.search or just use in if l in temp:
Note: built-in type list should not be shadowed, so for l in lists: is better

You can do this with a slightly more complex check using map and any.
>>> temp = "aaaab123xyz#+"
>>> lists = ["abc", "123.35", "xyz", "AND+"]
>>> any(map(lambda match: match in temp, lists))
True
>>> temp = 'fhgwghads'
>>> any(map(lambda match: match in temp, lists))
False
I'm not sure if this is faster than a compiled regexp.

Related

Cleanest way to obtain a list of the numeric values in a string

What is the cleanest way to obtain a list of the numeric values in a string?
For example:
string = 'version_4.11.2-2-1.4'
array = [4, 11, 2, 2, 1, 4]
As you might understand, I need to compare versions.
By "cleanest", I mean as simple / short / readable as possible.
Also, if possible, then I prefer built-in functions over regexp (import re).
This is what I've got so far, but I feel that it is rather clumsy:
array = [int(n) for n in ''.join(c if c.isdigit() else ' ' for c in string).split()]
Strangely enough, I have not been able to find an answer on SO:
In this question, the input numeric values are assumed to be separated by white spaces
In this question, the input numeric values are assumed to be separated by white spaces
In this question, the user only asks for a single numeric value at the beginning of the string
In this question, the user only asks for a single numeric value of all the digits concatenated
Thanks
Just match on consecutive digits:
map(int, re.findall(r'\d+', versionstring))
It doesn't matter what's between the digits; \d+ matches as many digits as can be found in a row. This gives you the desired output in Python 2:
>>> import re
>>> versionstring = 'version_4.11.2-2-1.4'
>>> map(int, re.findall(r'\d+', versionstring))
[4, 11, 2, 2, 1, 4]
If you are using Python 3, map() gives you an iterable map object, so either call list() on that or use a list comprehension:
[int(d) for d in re.findall(r'\d+', versionstring)]
I'd solve this with a regular expression, too.
I prefer re.finditer over re.findall for this task. re.findall returns a list, re.finditer returns an iterator, so with this solution you won't create a temporary list of strings:
>>> [int(x.group()) for x in re.finditer('\d+', string)]
[4, 11, 2, 2, 1, 4]
You are tracking every character and checking if it is a digit, if yes you are adding it to a list, Gets slow for larger strings.
Let's say,
import re
string='version_4.11.2-2-1.4.9.7.5.43.2.57.9.5.3.46.8.5'
l=map(int, re.findall('\d+',string))
print l
Hopefully, this should work.
Not sure in the answer above why are we using 'r'.
You can simply resolve this using regular expressions.
import re
string = 'version_4.11.2-2-1.4'
p=re.compile(r'\d+')
p.findall(string)
Regex is definitely the best way to go as #MartijnPieters answer clearly shows, but if you don't want to use it, you probably can't use a list comprehension. This is how you could do it, though:
def getnumbers(string):
numberlist = []
substring = ""
for char in string:
if char.isdigit():
substring += char
elif substring:
numberlist.append(int(substring))
substring = ""
if substring:
numberlist.append(int(substring))
return numberlist

Check if a list has one or more strings that match a regex

If need to say
if <this list has a string in it that matches this rexeg>:
do_stuff()
I found this powerful construct to extract matching strings from a list:
[m.group(1) for l in my_list for m in [my_regex.search(l)] if m]
...but this is hard to read and overkill. I don't want the list, I just want to know if such a list would have anything in it.
Is there a simpler-reading way to get that answer?
You can simply use any. Demo:
>>> lst = ['hello', '123', 'SO']
>>> any(re.search('\d', s) for s in lst)
True
>>> any(re.search('\d{4}', s) for s in lst)
False
use re.match if you want to enforce matching from the start of the string.
Explanation:
any will check if there is any truthy value in an iterable. In the first example, we pass the contents of the following list (in the form of a generator):
>>> [re.search('\d', s) for s in lst]
[None, <_sre.SRE_Match object at 0x7f15ef317d30>, None]
which has one match-object which is truthy, while None will always evaluate to False in a boolean context. This is why any will return False for the second example:
>>> [re.search('\d{4}', s) for s in lst]
[None, None, None]

trying to regex in python

Can anyone please help me understand this code snippet, from http://garethrees.org/2007/05/07/python-challenge/ Level2
>>> import urllib
>>> def get_challenge(s):
... return urllib.urlopen('http://www.pythonchallenge.com/pc/' + s).read()
...
>>> src = get_challenge('def/ocr.html')
>>> import re
>>> text = re.compile('<!--((?:[^-]+|-[^-]|--[^>])*)-->', re.S).findall(src)[-1]
>>> counts = {}
>>> for c in text: counts[c] = counts.get(c, 0) + 1
>>> counts
http://garethrees.org/2007/05/07/python-challenge/
re.compile('<!--((?:[^-]+|-[^-]|--[^>])*)-->', re.S).findall(src)[-1] why we have [-1] here what's the purpose of it? is it Converting that to a list? **
Yes. re.findall() returns a list of all the matches. Have a look at the documentation.
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group. Empty matches are included in the result
unless they touch the beginning of another match.
When calling [-1] on the result, the first element from the end of the list is accessed.
For example;
>>> a = [1,2,3,4,5]
>>> a[-1]
5
And also:
>>> re.compile('.*?-').findall('-foo-bar-')[-1]
'bar-'
It's already a list. And if you have a list myList, myList[-1] returns the last element in that list.
Read this: https://docs.python.org/2/tutorial/introduction.html#lists.

str.startswith with a list of strings to test for

I'm trying to avoid using so many comparisons and simply use a list, but not sure how to use it with str.startswith:
if link.lower().startswith("js/") or link.lower().startswith("catalog/") or link.lower().startswith("script/") or link.lower().startswith("scripts/") or link.lower().startswith("katalog/"):
# then "do something"
What I would like it to be is:
if link.lower().startswith() in ["js","catalog","script","scripts","katalog"]:
# then "do something"
Is there a way to do this?
str.startswith allows you to supply a tuple of strings to test for:
if link.lower().startswith(("js", "catalog", "script", "katalog")):
From the docs:
str.startswith(prefix[, start[, end]])
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for.
Below is a demonstration:
>>> "abcde".startswith(("xyz", "abc"))
True
>>> prefixes = ["xyz", "abc"]
>>> "abcde".startswith(tuple(prefixes)) # You must use a tuple though
True
>>>
You can also use any(), map() like so:
if any(map(l.startswith, x)):
pass # Do something
Or alternatively, using a generator expression:
if any(l.startswith(s) for s in x)
pass # Do something
You can also use next() to iterate over the list of patterns.
prefixes = ["xyz", "abc"]
my_string = "abcde"
next((True for s in prefixes if my_string.startswith(s)), False) # True
One way where next could be useful is that it can return the prefix itself. Try:
next((s for s in prefixes if my_string.startswith(s)), None) # 'abc'

Find array item in a string

I know can use string.find() to find a substring in a string.
But what is the easiest way to find out if one of the array items has a substring match in a string without using a loop?
Pseudocode:
string = 'I would like an apple.'
search = ['apple','orange', 'banana']
string.find(search) # == True
You could use a generator expression (which somehow is a loop)
any(x in string for x in search)
The generator expression is the part inside the parentheses. It creates an iterable that returns the value of x in string for each x in the tuple search. x in string in turn returns whether string contains the substring x. Finally, the Python built-in any() iterates over the iterable it gets passed and returns if any of its items evaluate to True.
Alternatively, you could use a regular expression to avoid the loop:
import re
re.search("|".join(search), string)
I would go for the first solution, since regular expressions have pitfalls (escaping etc.).
Strings in Python are sequences, and you can do a quick membership test by just asking if one string exists inside of another:
>>> mystr = "I'd like an apple"
>>> 'apple' in mystr
True
Sven got it right in his first answer above. To check if any of several strings exist in some other string, you'd do:
>>> ls = ['apple', 'orange']
>>> any(x in mystr for x in ls)
True
Worth noting for future reference is that the built-in 'all()' function would return true only if all items in 'ls' were members of 'mystr':
>>> ls = ['apple', 'orange']
>>> all(x in mystr for x in ls)
False
>>> ls = ['apple', 'like']
>>> all(x in mystr for x in ls)
True
The simpler is
import re
regx = re.compile('[ ,;:!?.:]')
string = 'I would like an apple.'
search = ['apple','orange', 'banana']
print any(x in regx.split(string) for x in search)
EDIT
Correction, after having read Sven's answer: evidently, string has to not be splited, stupid ! any(x in string for x in search) works pretty well
If you want no loop:
import re
regx = re.compile('[ ,;:!?.:]')
string = 'I would like an apple.'
search = ['apple','orange', 'banana']
print regx.split(string)
print set(regx.split(string)) & set(search)
result
set(['apple'])

Categories

Resources