trying to regex in python - python

Can anyone please help me understand this code snippet, from http://garethrees.org/2007/05/07/python-challenge/ Level2
>>> import urllib
>>> def get_challenge(s):
... return urllib.urlopen('http://www.pythonchallenge.com/pc/' + s).read()
...
>>> src = get_challenge('def/ocr.html')
>>> import re
>>> text = re.compile('<!--((?:[^-]+|-[^-]|--[^>])*)-->', re.S).findall(src)[-1]
>>> counts = {}
>>> for c in text: counts[c] = counts.get(c, 0) + 1
>>> counts
http://garethrees.org/2007/05/07/python-challenge/
re.compile('<!--((?:[^-]+|-[^-]|--[^>])*)-->', re.S).findall(src)[-1] why we have [-1] here what's the purpose of it? is it Converting that to a list? **

Yes. re.findall() returns a list of all the matches. Have a look at the documentation.
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group. Empty matches are included in the result
unless they touch the beginning of another match.
When calling [-1] on the result, the first element from the end of the list is accessed.
For example;
>>> a = [1,2,3,4,5]
>>> a[-1]
5
And also:
>>> re.compile('.*?-').findall('-foo-bar-')[-1]
'bar-'

It's already a list. And if you have a list myList, myList[-1] returns the last element in that list.
Read this: https://docs.python.org/2/tutorial/introduction.html#lists.

Related

How to find starting index of a string if it contains any element from a list of substrings

Suppose, I have a list of substring below:
my_list = ["am", "is", "are"]
I want to search elements of this list in a string. If the string includes any item from the list, then starting index of this substring in the string should be printed.
The string is:
s = "I am a Python developer."
It is obvious that the string contains "am" and starting index of this substring in string is 2.
Once I thought to use:
if "am" in s:
print(s.find("am"))
but I limited searching operation by only one element of list. There can be at most an item from list in the string.
Use a loop:
for item in my_list:
print(s.find(item))
One approach is to search the string for the elements in your list, and return the index of the first match. We exit on the first match found (since you want any element, there's no need to test remaining substrings past that point).
def find_index_first_match(target, *args):
for arg in args:
search = target.find(arg)
# If arg is not in target, search is -1.
if search >= 0:
return search
# You could return -1 here so the return's more
# consistent with str.find's behavior.
return None
find_index_first_match(s, *my_list)
You can even split the string and search:
my_list = ["am", "is", "are"]
s = "I am a Python developer."
split_s = s.split()
for item in my_list:
if item in split_s:
print("Found this word in string s: "+item)
You can just iterate over your list:
for item in my_list:
if item in s:
print(s.find(item))
my_list = ["am", "is", "are"]
#s = "I am a Python developer."
s = "I am a Python developer and is are blah."
test=dict(enumerate(s.split(' '),1))
[k for k,v in test.items() if v in my_list]
Output:
[2, 7, 8]
you can use enumerate and create a dictionary out of the string and use list comprehension with an if clause to get the index if it exists in your other list.
You could use this:
i = min(len(s.split(w,1)[0]) for w in my_list) # 2
This is how it works:
for p in my_list is a comprehension that goes through the keywords one by one
s.split(p,1)[0] uses the keyword (w) to split the string and picks up the left part of the split (prefix). It will contain all the characters preceding the keyword (or the whole string if the keyword is not present)
len(...) gets the size of the prefixes obtained from the comprehension. This will correspond to the position (index) of the keyword.
min(... selects the smallest size of keyword prefixes which corresponds to the index of the first keyword found in s
Another option would be to use a regular expression formed by combining your list of keywords:
import re
pattern = r'\b('+"|".join(my_list)+r')\b' # \b(am|is|are)\b
i = re.search(pattern,s).start() # 2

How to change the index of an element in a list/array to another position/index without deleting/changing the original element and its value

For example lets say I have a list as below,
list = ['list4','this1','my3','is2'] or [1,6,'one','six']
So now I want to change the index of each element to match the number or make sense as I see fit (needn't be number) like so, (basically change the index of the element to wherever I want)
list = ['this1','is2','my3','list4'] or ['one',1,'six',6]
how do I do this whether there be numbers or not ?
Please help, Thanks in advance.
If you don't wanna use regex and learn it's mini language use this simpler method:
list1 = ['list4','this1', 'he5re', 'my3','is2']
def mySort(string):
if any(char.isdigit() for char in string): #Check if theres a number in the string
return [float(char) for char in string if char.isdigit()][0] #Return list of numbers, and return the first one (we are expecting only one number in the string)
list1.sort(key = mySort)
print(list1)
Inspired by this answer: https://stackoverflow.com/a/4289557/11101156
For the first one, it is easy:
>>> lst = ['list4','this1','my3','is2']
>>> lst = sorted(lst, key=lambda x:int(x[-1]))
>>> lst
['this1', 'is2', 'my3', 'list4']
But this assumes each item is string, and the last character of each item is numeric. Also it works as long as the numeric parts in each item is single digit. Otherwise it breaks. For the second one, you need to define "how you see it fit", in order to sort it in a logic.
If there are multiple numeric characters:
>>> import re
>>> lst = ['lis22t4','th2is21','my3','is2']
>>> sorted(lst, key=lambda x:int(re.search(r'\d+$', x).group(0)))
['is2', 'my3', 'list4', 'this21']
# or,
>>> ['is2', 'my3', 'lis22t4', 'th2is21']
But you can always do:
>>> lst = [1,6,'one','six']
>>> lst = [lst[2], lst[0], lst[3], lst[1]]
>>> lst
['one', 1, 'six', 6]
Also, don't use python built-ins as variable names. list is a bad variable name.
If you just want to move element in position 'y' to position 'x' of a list, you can try this one-liner, using pop and insert:
lst.insert(x, lst.pop(y))
If you know the order how you want to change indexes you can write simple code:
old_list= ['list4','this1','my3','is2']
order = [1, 3, 2, 0]
new_list = [old_list[idx] for idx in order]
If you can write your logic as a function, you can use sorted() and pass your function name as a key:
old_list= ['list4','this1','my3','is2']
def extract_number(string):
digits = ''.join([c for c in string if c.isdigit()])
return int(digits)
new_list = sorted(old_list, key = extract_number)
This case list is sorted by number, which is constructed by combining digits found in a string.
a = [1,2,3,4]
def rep(s, l, ab):
id = l.index(s)
q = s
del(l[id])
l.insert(ab, q)
return l
l = rep(a[0], a, 2)
print(l)
Hope you like this
Its much simpler

How to match every word in the list having single sentence using python

how to match the below case 1 in python.. i want each and every word in the sentence to be matched with the list.
l1=['there is a list of contents available in the fields']
>>> 'there' in l1
False
>>> 'there is a list of contents available in the fields' in l1
True
Simple way
l1=['there is a list of contents available in the fields']
>>> 'there' in l1[0]
True
Better way wil be to iterate to all element of list.
l1=['there is a list of contents available in the fields']
print(bool([i for i in l1 if 'there' in i]))
If you just want to know if any of the string in the list contains a word no matter which string it is you can do this:
if any('there' in element for element in li):
pass
Now if you want to filter the ones which matches the string you can simply:
li = filter(lambda x: 'there' in x, li)
Or in Python 3:
li = list(filter(lambda x: 'there' in x, li))

Get the first character of the first string in a list?

How would I get the first character from the first string in a list in Python?
It seems that I could use mylist[0][1:] but that does not give me the first character.
>>> mylist = []
>>> mylist.append("asdf")
>>> mylist.append("jkl;")
>>> mylist[0][1:]
'sdf'
You almost had it right. The simplest way is
mylist[0][0] # get the first character from the first item in the list
but
mylist[0][:1] # get up to the first character in the first item in the list
would also work.
You want to end after the first character (character zero), not start after the first character (character zero), which is what the code in your question means.
Get the first character of a bare python string:
>>> mystring = "hello"
>>> print(mystring[0])
h
>>> print(mystring[:1])
h
>>> print(mystring[3])
l
>>> print(mystring[-1])
o
>>> print(mystring[2:3])
l
>>> print(mystring[2:4])
ll
Get the first character from a string in the first position of a python list:
>>> myarray = []
>>> myarray.append("blah")
>>> myarray[0][:1]
'b'
>>> myarray[0][-1]
'h'
>>> myarray[0][1:3]
'la'
Numpy operations are very different than python list operations.
Python has list slicing, indexing and subsetting. Numpy has masking, slicing, subsetting, indexing.
These two videos cleared things up for me.
"Losing your Loops, Fast Numerical Computing with NumPy" by PyCon 2015:
https://youtu.be/EEUXKG97YRw?t=22m22s
"NumPy Beginner | SciPy 2016 Tutorial" by Alexandre Chabot LeClerc:
https://youtu.be/gtejJ3RCddE?t=1h24m54s
Indexing in python starting from 0. You wrote [1:] this would not return you a first char in any case - this will return you a rest(except first char) of string.
If you have the following structure:
mylist = ['base', 'sample', 'test']
And want to get fist char for the first one string(item):
myList[0][0]
>>> b
If all first chars:
[x[0] for x in myList]
>>> ['b', 's', 't']
If you have a text:
text = 'base sample test'
text.split()[0][0]
>>> b
Try mylist[0][0]. This should return the first character.
If your list includes non-strings, e.g. mylist = [0, [1, 's'], 'string'], then the answers on here would not necessarily work. In that case, using next() to find the first string by checking for them via isinstance() would do the trick.
next(e for e in mylist if isinstance(e, str))[:1]
Note that ''[:1] returns '' while ''[0] spits IndexError, so depending on the use case, either could be useful.
The above results in StopIteration if there are no strings in mylist. In that case, one possible implementation is to set the default value to None and take the first character only if a string was found.
first = next((e for e in mylist if isinstance(e, str)), None)
first_char = first[0] if first else None

Python: list and string matching

I have following:
temp = "aaaab123xyz#+"
lists = ["abc", "123.35", "xyz", "AND+"]
for list in lists
if re.match(list, temp, re.I):
print "The %s is within %s." % (list,temp)
The re.match is only match the beginning of the string, How to I match substring in between too.
You can use re.search instead of re.match.
It also seems like you don't really need regular expressions here. Your regular expression 123.35 probably doesn't do what you expect because the dot matches anything.
If this is the case then you can do simple string containment using x in s.
Use re.search or just use in if l in temp:
Note: built-in type list should not be shadowed, so for l in lists: is better
You can do this with a slightly more complex check using map and any.
>>> temp = "aaaab123xyz#+"
>>> lists = ["abc", "123.35", "xyz", "AND+"]
>>> any(map(lambda match: match in temp, lists))
True
>>> temp = 'fhgwghads'
>>> any(map(lambda match: match in temp, lists))
False
I'm not sure if this is faster than a compiled regexp.

Categories

Resources