Regular expression issue in python - python

Why infinite wildcard (*) in the regular expression is treated differently in python? Please tell me why in case one I'm getting different output than case two?
CASE ONE:
import re
b= None
a=None
while a!='chk':
a=input()
b= re.findall('[A-Z][a-z]{1,400}',a)
if b!=None:
print(b,bool(b),type(b))
if a=='chk':
break
output:
CAPITALLETTERSsmallletters
['Ssmallletters'] True <class 'list'>
CASE TWO:
import re
b= None
a=None
while a!='chk':
a=input()
b= re.findall('[A-Z][a-z]*',a)
if b!=None:
print(b,bool(b),type(b))
if a=='chk':
break
output:
CAPITALLETTERSsmallletters
['C', 'A', 'P', 'I', 'T', 'A', 'L', 'L', 'E', 'T', 'T', 'E', 'R', 'Ssmallletters'] True <class 'list'>

CASE ONE:
The regular expression says:
Look for things that have a uppercase letter followed by 1 to 400 lowercase letters
This gives one hit, the one it prints.
CASE TWO:
The regular expression says:
Look for things that have one uppercase letter followed by 0 to infinite lowercase letters
In this case each capital letter alone is one hit, plus the same hit you had before.

Related

Why does python return None in this instance?

I have this python practice question which is to return True if a word is an isogram (word with nonrepeating characters). It is also supposed to return True if the isogram is a blank string.
My answer didn't work out.
from string import ascii_lowercase
def is_isogram(iso):
for x in iso:
return False if (iso.count(x) > 1) and (x in ascii_lowercase) else True
#None
While another answered:
def is_isogram(word):
word = str(word).lower()
alphabet_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in word:
if word.count(i) > 1 and i in alphabet_list:
return False
return True
#True
I'm not sure why the return value is different with just a slightly different structure or is it how to return statement is defined?
I would use a set operation. Using str.count repeatedly is expensive as you need to read the whole string over and over.
If your string only has unique characters, then its length equals that of its set of characters.
def is_isogram(iso):
return len(set(iso)) == len(iso)
print(is_isogram('abc'))
print(is_isogram('abac'))
print(is_isogram(''))
print(is_isogram(' '))
Output:
True
False
True
True
You can easily implement additional checks. For instance, convert to unique case if the case doesn't matter. If you want to exclude some characters (e.g. spaces), pre-filter the characters iso = [x for x in iso if x not in excluded_set].
I think the difference is that in the other code they are looping the letters in the word and return false if a false condition is met and only if they get to the end of the letters in the word without meeting a false condition they are returning true.
In your code because the return statement for any condition is in the for loop it will only check the first letter, not the rest of the world.
I tried your code and I am getting a true output unless the first letter is repeating.
Edit: I didn't cover the none type output, someone else has already commented saying that it's happening because you never enter your for loop

Can you add characters from a string to a list?

I'm wondering if it's possible to take a string e.g. str(input()) and split it into individual chars, then add them to a list. I'm trying to make a simple script (something similar to a hangman game) and at the beginning I wrote this:
x=input('Choose word: ').lower()
letters=[]
letters.append(list(x))
print(letters)
but this code appends the whole list to a list and not individual chars
Edit: this outputs [['o', 'u', 't', 'p', 'u', 't']] meaning that the whole list got appended as one item, but I want this to output ['o', 'u', 't', 'p', 'u', 't'], how do I make it append individual chars and not the whole list
You are simply wrapping the char list in another list.
Try this one-liner instead:
print(list(x))
If you want to remove a character:
letters = list(x)
letters.remove('o')
print(letters)
Use extend instead of append function.
#Extend
x=input('Choose word: ').lower()
letters=[]
letters.extend(list(x))
print(letters)
# ['p', 'y', 't', 'h', 'o', 'n']
And to remove a character from a list while retaining position as blank after removing, use replace while within a list:
y=input("Choose a letter to remove: ").lower()
removed=[s.replace(y,'') for s in letters]
print(removed)
#['p', '', 't', 'h', 'o', 'n']
I hope this help, unless its different from what you want. Then let me know. Otherwise, happy coding!
You don't need to create an empty list and then populate it with individual letters. Simply apply the list() function directly for the user input to create it:
letters = list(input('Choose word: ').lower())
print(letters)
For adding letters from the other user input, use the same approach with the .extend() method:
letters.extend(input('Choose word: ').lower()) # No need to use list() here
A simple one liner:
x = input().lower().split()
print(x)
here we are taking the input and then we are converting to lowercase and then using the split function which will split the string on white spaces you can split the string on whatever string you feel like just give the string you want to split on as the argument in the split function for example:
x = input().lower().split(',')
print(x)
this will split on the ',' so you can give the input in csv format
You may use the + operator (preferably in the form of an augmented assignment statement, i.e. +=, for extending the list to an iterable.
No need to use the list() function here, because the string is iterable:
letters = []
letters += input('Choose word: ').lower()
print(letters)
this outputs [['o', 'u', 't', 'p', 'u', 't']] meaning that the whole
list got appended as one item, but i want this to output ['o', 'u', 't', 'p', 'u', 't']
Based on you comment, you can use:
x = [*input('Choose word: ').lower()]
print(x)
# ['p', 'y', 't', 'h', 'o', 'n']
Demo

Python Function That Receives Letter, Returns (0-Based) Numerical Position Within Alphabet

I'm trying to create a Python function that receives a letter (a string with only one alphabetic character) and returns the 0-based numerical position of that letter in the alphabet. It should not be case-sensitive, and I can't use import.
So entering "a" should return
0
Entering "A" should also return
0
Entering "O" should return
14
And so on.
I had noticed this question but the top answer uses import and the second answer doesn't make any sense to me / doesn't work. I tried to apply the second answer like this:
letter = input("enter a letter")
def alphabet_position(letter):
return ord(letter) - 97
print((alphabet_position)(letter))
but I got a TypeError:
TypeError: ord() expected a character, but string of length 2 found
Just like the asker in the question that I linked, I'm also trying to send the characters "x" amount of steps back in the alphabet, but in order to do that I need to create this helper function first.
I'm thinking there must be a way to store the letters in two separate lists, one lower-case and one upper-case, and then see if the string that the user entered matches one of the items in that list? Then once we find the match, we return it's (0-based) numerical position?
letter = input("enter a letter")
def alphabet_position(letter):
position = 0
#letter_position = index value that matches input
lower_case_list ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
'y', 'z']
upper_case_list ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
'Y','Z']
#if letter is in lower_case_list, return it's 0-based numerical position.
#else, if letter is in upper_case_list, return it's 0-based numerical position.
#else, print("Enter a valid letter")
return letter_position
Please help if you have any suggestions. Thank you.
It's probably simpler to just convert from uppercase or lowercase to specifically lowercase with the .lower() method, and use the built in list of letters (string.ascii_lowercase). You can find the index of a list's element by using the .index() method.
import string
letter = input('enter a letter: ')
def alphabet_position(letter):
letter = letter.lower()
return list(string.ascii_lowercase).index(letter)
print(alphabet_position(letter))
When you called alphabet_position, it is expecting an argument so you need to do func_name(arg) format.
Another way you could do this is to use dictionary comprehension to create a dict of letter-position pairs, like so:
from string import lowercase as l
alphabet_lookup = {letter:pos for letter,pos in zip(l, range(len(l)))}
and then
f = lambda letter: alphabet_lookup[letter.lower()]
is the desired function.
I suggest using a dictionary. It might be a large amount of code to do something relatively simple, but I find it easier to make sense of it this way (and if you are new to python it will help you learn). If you google python dictionaries you can learn lots more about them, but here is the basic concept:
Code for python version 3.X:
def alphabet_position(letter):
alphabet_pos = {'A':0, 'a':0, 'B':1, 'b':1}
pos = alphabet_pos[letter]
return pos
letter = input('Enter a letter: ')
print(alphabet_position(letter))
Code for python version 2.7:
def alphabet_position(letter):
alphabet_pos = {'A':0, 'a':0, 'B':1, 'b':1}
pos = alphabet_pos[letter]
return pos
letter = raw_input('Enter a letter: ')
print alphabet_position(letter)
If you run that, it will print 1 because it searches through the alpahbet_pos dictionary and finds the value that corresponds to the entry entitled 'B'. Notice that you can have multiple entries with the same value, so you can do uppercase and lowercase in the same dictionary. I only did letters A and B for the sake of time, so you can fill out the rest yourself ;)
I once had to enter every element on the periodic table and their corresponding atomic mass, and that took forever (felt much longer than it actually was).

Create a python regular expression regex that will find all consonants in each word within a string that are not repeated one after another

For example, If the word 'Happy' is given, I only want 'H' and 'y'.
If 'accomplished' is given, I only want 'm','p','l','s','h','d.
I know that (\w)\2 will find repeated characters, and (?i)
[b-df-hj-np-tv-z] will find all consonants, but how do I combine them?
You can use
(?=[b-df-hj-np-tv-xz])(.)(?!\1)(?<!\1\1)
which unfolds as
(?=[b-df-hj-np-tv-xz]) # Match only if the next character is a consonant
(.) # Match the consonant and capture it for subsequent usage
(?!\1) # Don't match if the next character if the same as the one we captured (avoid matching all but the last characters of a cluster)
(?<!\1\1) # Don't match if the penultimate character was the same as the one we captured (to avoid matching the last character of a cluster)
but sadly that last line is not allowed in re, as lookbehinds must have fixed length. But the regex module¹ supports it
In [1]: import regex
In [2]: s=r'(?=[b-df-hj-np-tv-xz])(.)(?!\1)(?<!\1\1)'
In [3]: regex.findall(s, 'happy')
Out[3]: ['h']
In [4]: regex.findall(s, 'accomplished')
Out[4]: ['m', 'p', 'l', 's', 'h', 'd']
¹ “intended eventually to replace Python’s current re module implementation” according to the cheeseshop description.
from re import findall
string = "Happy you!"
res = []
for c in findall('[^aeiou]', string):
if c not in res:
res.append(c)
Filtering out duplicates and making use of the by your required 're' module.
Here is a regex that can be used:
([^aeiou])\1+|([^aeiou\s])
You can then grab captured group #2
RegEx Demo
Explanation:
[^aeiou] # matches a consonant
([^aeiou]) # puts a consonant in captured group #1
([^aeiou])\1+ # matches repetitions of group #1
| # regex alternation (OR)
([^aeiou\s]) # matches a consonant and grabs it in captured group #2
Code:
>>> for m in re.finditer(r'([^aeiou])\1+|([^aeiou\s])', "accomplished"):
... print m.group(2)
...
None
m
p
l
s
h
d
Brute force (super slow) solution:
import re
expr = '(?<!b)b(?!b)|(?<!c)c(?!c)|(?<!d)d(?!d)|(?<!f)f(?!f)|(?<!g)g(?!g)|(?<!h)h(?!h)|(?<!j)j(?!j)|(?<!k)k(?!k)|(?<!l)l(?!l)|(?<!m)m(?!m)|(?<!n)n(?!n)|(?<!p)p(?!p)|(?<!q)q(?!q)|(?<!r)r(?!r)|(?<!s)s(?!s)|(?<!t)t(?!t)|(?<!v)v(?!v)|(?<!w)w(?!w)|(?<!x)x(?!x)|(?<!y)y(?!y)|(?<!z)z(?!z)'
print re.findall(expr, 'happy')
print re.findall(expr, 'accomplished')
print re.findall(expr, 'happy accomplished')
print re.findall(expr, 'happy accccccompliiiiiiishedd')
# Readable form of expr
# (?<!b)b(?!b)|
# (?<!c)c(?!c)|
# (?<!d)d(?!d)|
# (?<!f)f(?!f)|
# (?<!g)g(?!g)|
# (?<!h)h(?!h)|
# (?<!j)j(?!j)|
# (?<!k)k(?!k)|
# (?<!l)l(?!l)|
# (?<!m)m(?!m)|
# (?<!n)n(?!n)|
# (?<!p)p(?!p)|
# (?<!q)q(?!q)|
# (?<!r)r(?!r)|
# (?<!s)s(?!s)|
# (?<!t)t(?!t)|
# (?<!v)v(?!v)|
# (?<!w)w(?!w)|
# (?<!x)x(?!x)|
# (?<!y)y(?!y)|
# (?<!z)z(?!z)
Output:
['h', 'y']
['m', 'p', 'l', 's', 'h', 'd']
['h', 'y', 'm', 'p', 'l', 's', 'h', 'd']
['h', 'y', 'm', 'p', 'l', 's', 'h']

Regular Expressions, Python 3,

I have problems to understand this regular expression in python:
re.findall(r'([a-z]+?)\w*', "Ham, spam, and, eggs")
I understand that:
[a-z] is a class that includes the all letters from a-z
+ says that it can appear at least once
? is it can appear once or never
My output for ([a-z]+?) is:
['a', 'm', 's', 'p', 'a', 'm', 'a', 'n', 'd', 'e', 'g', 'g', 's']
Now the problems start:
if I test:
re.findall(r'([a-z]+?)\w', "Ham, spam, and, eggs")
My output is:
['a', 's', 'a', 'a', 'e', 'g'] # Why?
and if i test the full expression:
re.findall(r'([a-z]+?)\w*', "Ham, spam, and, eggs")
my output is:
['a', 's', 'a', 'e'] # Why?
Can somebody explain this to me, please?
You misunderstand the use of +? * - this means at least once, non-greedy, i.e. as a few additional characters as needed to match. In practice, this is the same as [a-z] ("at least once and as few times as possible" is the same as, simply, "once").
The other token in your pattern, \w, means any "word character", equivalent to [A-Za-z0-9_].
Your first attempt, ([a-z]+?)\w, captures any single, lower-case letter that is followed by any other word character - hence ['a', 's', 'a', 'a', 'e', 'g']:
"Ham, spam, and, eggs"
# ^. ^.^. ^. ^.^.
(Note: ^ is the captured character, . is the non-captured match.)
Your second attempt, ([a-z]+?)\w* captures any single, lower-case letter followed by as many other word characters as possible, hence only captures once per word (the first lower-case letter):
"Ham, spam, and, eggs"
# ^. ^... ^.. ^...
In both cases, as you have specified a capture group, findall only returns the characters within that group. If you remove the capture group parentheses, it will capture the whole match:
>>> re.findall(r'[a-z]+?\w*', "Ham, spam, and, eggs")
['am', 'spam', 'and', 'eggs']
You can try an interactive demonstration here.
* You have confused it with ? on its own, which does mean "zero or one times".
I am going to take a stab although I am not sure if I am correct.
Does the "?" apply non-greedy 1 or more matching (+ sign) maybe?
So you do not match the "H" in Ham because it is upper case.
Next you we look at "a". Since its followed by a word character (\w) the matching captures the letter "a" there, and carries on to the "," where we start over.
Next letter it matches the s in spam and a following word characters captures the "s" and moves on to th next ",", and so on.

Categories

Resources