Python string,re.match,loop - python

OK guys I got like 4 example:
I love #hacker,
I just scored 27 points in the Picking Cards challenge on #Hacker,
I just signed up for summer cup #hacker,
interesting talk by hari, co-founder of hacker,
I need to find how many times the word "hacker" repeats.
import re
count = 0
res = re.match("hacker")
for res in example:
count += 1
return count
Here is my code "so far" since I don't know how should I figure out the solution for this exercise

you can use re.findall:
my_string = """I love #hacker, I just scored 27 points in the Picking Cards challenge on #Hacker, I just signed up for summer cup #hacker, interesting talk by hari, co-founder of hacker,"""
>>> import re
>>> len(re.findall("hacker",my_string.lower()))
4
re.findall give you all matched substring in the string, and then len will give you how many of them are.
str.lower() is used to convert string to lowercase
instead of str.lower you can also use re.IGNORECASE FLAG:
>>> len(re.findall("hacker",my_string,re.IGNORECASE))
4

this:
the_string = """I love #hacker, I just scored 27 points in the Picking Cards challenge on #Hacker, I just signed up for summer cup #hacker, interesting talk by hari, co-founder of hacker,"""
num = the_string.lower().count("hacker")

string1="hello Hacker what are you doing hacker"
a=re.findall("hacker",string1.lower())
print (len(a))
Output:
>>>
2
>>>
re.findall will find all of the strings that you write.
Edit: I added the string1.lower() too as mentioned by Rawing.
Your codes are not working because match() find the first match
only. Not all of them.

You can just use count() function , after split your string so you dont need regex , if you want to match upper cases too you need to use lower function :
>>> l='this is a test and not a Test'
>>> map(lambda x: x.lower() ,l.split()).count('test')
2
>>> l='this is a test and not a rtest'
>>> map(lambda x: x.lower() ,l.split()).count('test')
1

Related

regex groups: How to get the desired output with a more specific match pattern?

The following input list of entries
l = ["555-8396 Neu, Allison",
"Burns, C. Montgomery",
"555-5299 Putz, Lionel",
"555-7334 Simpson, Homer Jay"]
is expected to be transformed to:
Allison Neu 555-8396
C. Montgomery Burns
Lionel Putz 555-5299
Homer Jay Simpson 555-7334
I tried the following:
for i in l:
mo = re.search(r"([0-9]{3}-[0-9]{4})?\s*(\w*),\s*(\S.*$)", i)
if mo:
print("{} {} {}".format(mo.group(3), mo.group(2), mo.group(1)))
and it results in the following incorrect output (note the "None" in the second line of output)
Allison Neu 555-8396
C. Montgomery Burns None
Lionel Putz 555-5299
Homer Jay Simpson 555-7334
However the following solution mentioned in the e-book does indeed give the desired output:
for i in l:
mo = re.search(r"([0-9-]*)\s*([A-Za-z]+),\s+(.*)", i)
print(mo.group(3) + " " + mo.group(2) + " " + mo.group(1))
In short, it boils down to the difference in the groups() output of the 2 reg exp searches:
>>> mo = re.search(r"([0-9]{3}-[0-9]{4})?\s*(\w*),\s*(\S.*$)", "Burns, C. Montgomery")
>>> mo.groups()
(None, 'Burns', 'C. Montgomery')
versus
>>> mo = re.search(r"([0-9-]*)\s*(\w*),\s*(\S.*$)", "Burns, C. Montgomery")
>>> mo.groups()
('', 'Burns', 'C. Montgomery')
None vs ''
I wanted to do a more accurate match of the phone number format with [0-9]{3}-[0-9]{4} instead of using [0-9-]* which can match arbitrary number and - combinations (ex: "0-1-2" or "1-23").
Why does "*" result in a different grouping than "?".
Yes, it is trivial for me to take care of the "None" while printing out the result, but I am interested to know the reason for the difference in grouping results.
((?:[0-9]{3}-[0-9]{4})?)\s*(\w*),\s*(\S.*$)
Try this.See demo.
https://regex101.com/r/Qx6ylw/1
In the book example group was not optional...its contents were....in your regex group was optional.
Let me say in plain English what RegEx demos are hinting at and actually answer your actual question:
([0-9-]*) Matches 0 or more characters of digits or the - character. When there is no telephone present, that would be the case of matching 0 characters. But note the operative word matching, i.e. it is still a match. Thus, mo.group(1) returns ''.
([0-9]{3}-[0-9]{4})? Attempts to match a phone number in a specific format, but this match is optional. When the phone number is not present in the input, the match does not exist and thus mo.group(1) returns None.
Using judicious whitespace trimming, a simple find and replace example is this :
Find: ^((?:\d+(?:-\d+)+)?)\s*([^,]*?)\s*,\s*(.*)
Replace \3 \2 \1
https://regex101.com/r/oo0NWy/1
This code solves your problem:
for i in l:
mo = re.search(r"([0-9]{3}-[0-9]{4})?\s*(\w*),\s*(\S.*$)", i)
if mo:
if mo.group(1):
print("{} {} {}".format(mo.group(3), mo.group(2), mo.group(1)))
else:
print("{} {}".format(mo.group(3), mo.group(2)))
Output:
Allison Neu 555-8396
C. Montgomery Burns
Lionel Putz 555-5299
Homer Jay Simpson 555-7334

Replace every space after x chars with a "\n"

The goals of the function is to split one single string into multiple lines to make it more readable. The goal is to replace the first space found after at least n characters (since the beginning of the string, or since the last "\n" dropped in the string)
Hp:
you can assume no \n in the string
Example
Marcus plays soccer in the afternoon
f(10) should result in
Marcus plays\nsoccer in\nthe afternoon
The first space in Marcus plays soccer in the afternoonis skipped because Marcus is only 5 chars long. We put then a \n after plays and we start counting again. The space after soccer is therefore skipped, etc.
So far tried
def replace_space_w_newline_every_n_chars(n,s):
return re.sub("(?=.{"+str(n)+",})(\s)", "\\1\n", s, 0, re.DOTALL)
inspired by this
Try replacing
(.{10}.*?)\s
with
$1\n
Check it out here.
Example:
>>> import re
>>> s = 'Marcus plays soccer in the afternoo
>>> re.sub(r'(.{9}.*?)\s', r'\1\n', s)
'Marcus plays\nsoccer in\nthe afternoon'

Python, find words from array in string

I just want to ask how can I find words from array in my string?
I need to do filter that will find words i saved in my array in text that user type to text window on my web.
I need to have 30+ words in array or list or something.
Then user type text in text box.
Then script should find all words.
Something like spam filter i quess.
Thanks
import re
words = ['word1', 'word2', 'word4']
s = 'Word1 qwerty word2, word3 word44'
r = re.compile('|'.join([r'\b%s\b' % w for w in words]), flags=re.I)
r.findall(s)
>> ['Word1', 'word2']
Solution 1 uses the regex approach which will return all instances of the keyword found in the data. Solution 2 will return the indexes of all instances of the keyword found in the data
import re
dataString = '''Life morning don't were in multiply yielding multiply gathered from it. She'd of evening kind creature lesser years us every, without Abundantly fly land there there sixth creature it. All form every for a signs without very grass. Behold our bring can't one So itself fill bring together their rule from, let, given winged our. Creepeth Sixth earth saying also unto to his kind midst of. Living male without for fruitful earth open fruit for. Lesser beast replenish evening gathering.
Behold own, don't place, winged. After said without of divide female signs blessed subdue wherein all were meat shall that living his tree morning cattle divide cattle creeping rule morning. Light he which he sea from fill. Of shall shall. Creature blessed.
Our. Days under form stars so over shall which seed doesn't lesser rule waters. Saying whose. Seasons, place may brought over. All she'd thing male Stars their won't firmament above make earth to blessed set man shall two it abundantly in bring living green creepeth all air make stars under for let a great divided Void Wherein night light image fish one. Fowl, thing. Moved fruit i fill saw likeness seas Tree won't Don't moving days seed darkness.
'''
keyWords = ['Life', 'stars', 'seed', 'rule']
#---------------------- SOLUTION 1
print 'Solution 1 output:'
for keyWord in keyWords:
print re.findall(keyWord, dataString)
#---------------------- SOLUTION 2
print '\nSolution 2 output:'
for keyWord in keyWords:
index = 0
indexes = []
indexFound = 0
while indexFound != -1:
indexFound = dataString.find(keyWord, index)
if indexFound not in indexes:
indexes.append(indexFound)
index += 1
indexes.pop(-1)
print indexes
Output:
Solution 1 output:
['Life']
['stars', 'stars']
['seed', 'seed']
['rule', 'rule', 'rule']
Solution 2 output:
[0]
[765, 1024]
[791, 1180]
[295, 663, 811]
Try
words = ['word1', 'word2', 'word4']
s = 'word1 qwerty word2, word3 word44'
s1 = s.split(" ")
i = 0
for x in s1:
if(x in words):
print x
i++
print "count is "+i
output
'word1'
'word2'
count is 2

Match to second regular expression if first has no matches

I'm attempting to extract the text between HTML tags using regex in python. The catch is that sometimes there are no HTML tags in the string, so I want my regex to match the entire string. So far, I've got the part that matches the inner text of the tag:
(?<=>).*(?=<\/)
This would match to Russia in the tag below
<a density="sparse" href="http://topics.bloomberg.com/russia/">Russia</a>
Alternately, the entire string would be matched:
Typhoon Vongfong prompted ANA to cancel 101 flights, affecting about 16,600 passengers, the airline said in a faxed statement. Japan Airlines halted 31 flights today and three tomorrow, it said by fax. The storm turned northeast after crossing Okinawa, Japan’s southernmost prefecture, with winds gusting to 75 knots (140 kilometers per hour), according to the U.S. Navy’s Joint Typhoon Warning Center.
Otherwise I want it to return all the text in the string.
I've read a bit about regex conditionals online, but I can't seem to get them to work. If anyone can point me in the right direction, that would be great. Thanks in advance.
You could do this with a single regex. You don't need to go for any workaround.
>>> import re
>>> s='<a density="sparse" href="http://topics.bloomberg.com/russia/">Russia</a>'
>>> re.findall(r'(?<=>)[^<>]+(?=</)|^(?!.*?>.*?</).*', s, re.M)
['Russia']
>>> s='This is Russia Today'
>>> re.findall(r'(?<=>)[^<>]+(?=</)|^(?!.*?>.*?</).*', s, re.M)
['This is Russia Today']
Here is a work-around. Instead of adjusting the regex, we adjust the string:
>>> s='<a density="sparse" href="http://topics.bloomberg.com/russia/">Russia</a>'
>>> re.findall(r'(?<=>)[^<>]*(?=<\/)', s if '>' in s else '>%s</' % s)
['Russia']
>>> s='This is Russia Today'
>>> re.findall(r'(?<=>)[^<>]*(?=<\/)', s if '>' in s else '>%s</' % s)
['This is Russia Today']

Extracting sub-string after the first space in Python

I need help in regex or Python to extract a substring from a set of string. The string consists of alphanumeric. I just want the substring that starts after the first space and ends before the last space like the example given below.
Example 1:
A:01 What is the date of the election ?
BK:02 How long is the river Nile ?
Results:
What is the date of the election
How long is the river Nile
While I am at it, is there an easy way to extract strings before or after a certain character? For example, I want to extract the date or day like from a string like the ones given in Example 2.
Example 2:
Date:30/4/2013
Day:Tuesday
Results:
30/4/2013
Tuesday
I have actually read about regex but it's very alien to me. Thanks.
I recommend using split
>>> s="A:01 What is the date of the election ?"
>>> " ".join(s.split()[1:-1])
'What is the date of the election'
>>> s="BK:02 How long is the river Nile ?"
>>> " ".join(s.split()[1:-1])
'How long is the river Nile'
>>> s="Date:30/4/2013"
>>> s.split(":")[1:][0]
'30/4/2013'
>>> s="Day:Tuesday"
>>> s.split(":")[1:][0]
'Tuesday'
>>> s="A:01 What is the date of the election ?"
>>> s.split(" ", 1)[1].rsplit(" ", 1)[0]
'What is the date of the election'
>>>
There's no need to dig into regex if this is all you need; you can use str.partition
s = "A:01 What is the date of the election ?"
before,sep,after = s.partition(' ') # could be, eg, a ':' instead
If all you want is the last part, you can use _ as a placeholder for 'don't care':
_,_,theReallyAwesomeDay = s.partition(':')

Categories

Resources