Why does my regular expression not find names? [closed] - python

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I am learning Python, and following along sentdex's videos. I just got to regular expressions, and copied the code he used. While the ages print out fine, when I try to print out the names I just get '[]' as an output.
import re
examplestring = ''' Jessica is 15 years old, and Daniel is 27 years old.
Edward is 97, and his grandfather, Oscar, is 102
'''
ages = re.findall(r'\d{1,3}',examplestring)
name = re.findall(r'[A-Z], [a-z]*',examplestring)
print(ages)
print(name)

There are multiple scenarios can be possible to match name. In you case, if name is Oscar then your regex should look like this.
Regex: [A-Z][a-z]+ there should be no comma and then space as it will try to find as CoryKramer mentioned.
[A-Z] means first letter is word and it is Capital.
[a-z] means from second letter onwards all letters are lowercase.
I have mentioned + instead of *. Difference between + and * is,
+ denotes at least one time so if you have word just O it will not match, your data should be at least two character like Os.
* denotes zero or more time so if you have word just O it will match, so if your name is any letter from Alphabet it will match. So if you think that your name can be only one letter use * else use +.
Example for *: https://regex101.com/r/n9HSIu/1
Example for +: https://regex101.com/r/hL4Pd8/1

The problem here is with you using comma(,) while writing the expression.
According to it, it will be looking for a Capital Letter(A-Z) followed by comma(,) and then space followed by n number of alphabets which your string doesn't satisfy.
For your desired result you need to eliminate comma(,) and use this instead:
name = re.findall(r'[A-Z][a-z]*',examplestring)

Related

How do I search for passwords with minimum length of 8 allowing letters, numbers and symbols: ##$% with regex? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 months ago.
Improve this question
What is wrong with the below used regular expression? Why does it not match the password?
import re
pattern = re.compile ('^\w##$%{8,}')
password = '12345abcd##$%'
x = pattern.search(password)
print (x)
print (len(password))
You didn't escape the $ which has a special meaning in a regular expression and didn't put the allowed characters in square brackets to allow any of them.
This: ^[\w##\$%]{8,} is the modified version of the regex which matches the password.
Escaping the $ character isn't really necessary within square brackets so ^[\w##$%]{8,} will work as well.
I suggest you check your regular expressions here: https://regex101.com/r/ldvJLf/1 . This site explains in detail the meaning of all single elements of the regular expression, so you can directly see what is wrong if things doesn't work as you expected.
Tip:
check your regexes online https://regexr.com/
I think you want:
pattern = re.compile ('^[\w##$%]{8,}')

How to use python to extract the mentions? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I'm working on a code using python to extract the mentions from a tweet text.
The parameter is a tweet text. This function should return a list containing all of the mentions in the tweet, in the order they appear in the tweet. Each mention in the returned list should have the initial mention symbol removed, and the list should contain every mention encountered — including repeats, if a user is mentioned more than once within a tweet.Here are two examples:
>>>extract_mentions('#AndreaTantaros- You are a true journalistic\
professional. I so agree with what you say. Keep up the great\
work!#RepJohnLewis ')
['AndreaTantaros','RepJohnLewis']
>>>extract_mentions('#CPAC For all the closet #libertarians attending \
#CPAC2016 , I'll be there Thurs/Fri -- speaking Thurs. a.m. on the main\
stage. Look me up! #CPAC')
['CPAC','CPAC']
a mention begins with the '#' symbol and contains all alphanumeric characters up to (but not including) a space character, punctuation, or the end of a tweet.
How can I extract the mentions from the string? And sorry, I haven't learned about regex, is there any other ways?
You can use the following regular expression as it disregards email addresses.
(^|[^#\w])#(\w{1,15})
Example Code
import re
text = "#RayFranco is answering to #jjconti, this is a real '#username83' but this is an#email.com, and this is a #probablyfaketwitterusername";
result = re.findall("(^|[^#\w])#(\w{1,15})", text)
print(result);
This returns:
[('', 'RayFranco'), (' ', 'jjconti'), ("'", 'username83'), (' ', 'probablyfaketwi')]
Note that, twitter allows max 15 characters for twitter usernames. Based on Twitter specs:
Your username cannot be longer than 15 characters. Your real name can
be longer (20 characters), but usernames are kept shorter for the sake
of ease. A username can only contain alphanumeric characters (letters
A-Z, numbers 0-9) with the exception of underscores, as noted above.
Check to make sure your desired username doesn't contain any symbols,
dashes, or spaces.
Use regex :
import re
input_string = '#AndreaTantaros- You are a true journalistic professional. I so agree with what you say. Keep up the great work!#RepJohnLewis '
result = re.findall("#([a-zA-Z0-9]{1,15})", input_string)
Output : ['AndreaTantaros', 'RepJohnLewis']
If you want to remove email-addresses first, simply do :
re.sub("[\w]+#[\w]+\.[c][o][m]", "", input_string)

Split a string into segments in python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to split a molecule as a string into it's individual atom components. Each atom starts at a capital letter and ends at the last number.
For example, 'SO4' would become ['S', 'O4'].
And 'C6H12O6' would become ['C6', 'H12', 'O6'].
Pretty sure I need to use the regex module. This answer is close to what I'm looking for: Split a string at uppercase letters
Use re.findall() with the pattern:
[A-Z][a-z]?\d*
[A-Z] matches any uppercase character
[a-z]? matches zero or one lowercase character
\d* matches zero or more digits
Based on your example this should work, although you should look out for any specific library for this purpose.
Example:
>>> re.findall(r'[A-Z][a-z]?\d*', 'C6H12O6')
['C6', 'H12', 'O6']
>>> re.findall(r'[A-Z][a-z]?\d*', 'SO4')
['S', 'O4']
>>> re.findall(r'[A-Z][a-z]?\d*', 'HCl')
['H', 'Cl']

define a regular expression in python [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I am trying to use regular expressions in python to say a 4 characters string with 1st character being a digit and 3 other characters being either a digit or a capital letter.
Here's examples of patterns that should match 1CTT, 2IR8, 35TR, 4T1R
I tried many ways, here's the last code I tried :
exp=re.compile("[0-9]{1}([A-Z0-9]{3})")
Thank you for your help !
The expression you've tried last, looks correct and should match the provided test strings. Though you don't have to specify {1} and there is no need for a capturing group (the parenthesis):
>>> import re
>>> text = "text, 1CTT, 2IR8, 35TR, 4T1R, smth else"
>>> pattern = re.compile(r"[0-9][A-Z0-9]{3}")
>>> pattern.findall(text)
['1CTT', '2IR8', '35TR', '4T1R']
You might need to additionally add the word boundary constraint (thanks to #Jon Clements):
>>> text = "text, 1CTT, 2IR8, 35TR, 4T1R, smth else, 35TT35XYZ"
>>> pattern = re.compile(r"\b[0-9][A-Z0-9]{3}\b")
>>> pattern.findall(text)
['1CTT', '2IR8', '35TR', '4T1R']

why this regular expression returns empty [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I have these strings:
Phone: 3396222
Phone: +33333388
I want to extract the numbers.
I tried this regular expression:
Phone:\s*(\d+\.\d+)
But I got an empty result
I am using scrapy so my code is like this: sel.xpath(..).re(..)
please don't suggest using other feature in python than regular expression
Your regular expression requires a . dot in the text, but your sample input has none.
Demo:
>>> import re
>>> re.search(r'Phone:\s*(\d+\.\d+)', 'Phone: 3396222') is None
True
>>> re.search(r'Phone:\s*(\d+\.\d+)', 'Phone: 339.6222').group(1)
'339.6222'
If you wanted to make either of your sample phone numbers match, remove the \. (instead adding it to a character set) and add an optional + to the expression:
r'Phone:\s*(\+?[\d.]+)'
Demo:
>>> re.search(r'Phone:\s*(\+?[\d.]+)', 'Phone: 3396222').group(1)
'3396222'
>>> re.search(r'Phone:\s*(\+?[\d.]+)', 'Phone: +33333388').group(1)
'+33333388'
This pattern also allows for any number of dots in the number:
>>> re.search(r'Phone:\s*(\+?[\d.]+)', 'Phone: +333.333.88').group(1)
'+333.333.88'
You are asking for mandatory dot(.) inside your regex. Mate it optional:
Phone:\s*\+?(\d+\.?\d+)
^^^ ^
I have updated by adding optional \+ as you added + in your input.

Categories

Resources