Python RegEx with word boundaries

Python RegEx with word boundaries - python

I am trying to write a login routine for a python script. In doing so, I find the need to pattern match the credentials on a whole word basis. I have attempted to RegEx this, but it is failing for reasons that are unclear to me, but I hope are obvious to someone here. The code and output:
import re
authentry = "testusertestpass"
username = "testuser"
password = "testpass"
combo = "r\'\\b"+username + password + "\\b\'"
testcred = re.search(combo, authentry)
print combo
print authentry
print testcred
r'\btestusertestpass\b'
testusertestpass
None
So my regex test appears, at least to me, to be properly formatted, and should be a direct match against the test string, but is not. Any ideas? Thanks so much for any insight!

try this: it may works.
import re
authentry = "testusertestpass with another text"
username = "testuser"
password = "testpass"
combo = username + password + r'\b'
testcred = re.search(combo, authentry)
print combo
print authentry
print testcred
output:
testusertestpass\b
testusertestpass with another text
<_sre.SRE_Match object at 0x1b8a030>

Related

Is there an equivalent to RStudio's ".rs.askForPassword" in Python?

I want to connect to my database in Python, but i don't want to show my password to other users, like .rs.askForPassword does exactly what i need in RStudio. Here's an example:
library(RODBC)
conn = odbcConnect(dsn = "my_dsn",
uid = "name.last_name",
pwd = .rs.askForPassword("my.password"))
Is there any way to do this in Python?

You may use getpass()
import getpass
p = getpass.getpass(prompt='What is your favorite person? ')
if p.lower() == 'gf':
print 'Right. Off you go.'
else:
print 'Wrong!'

LDAP search with username as variable

I am using the Python-LDAP module and trying to make a query on the logged in user. The username will be passed into the query. When I simply type the username in as a string my results come out correctly.
But if I try to pass the (username) variable it returns
LDAPError - FILTER_ERROR: {'desc': u'Bad search filter'} I've tried a number of different combinations but continue to get the same error returned. Any insight here would be great!
Edited for Minimal, Complete, and Verifiable example:
import ldap
LDAP_SERVER = "ldap://myldapserver.domain.ad:389"
username = r"domain\serviceAccount"
password = "Password"
l = ldap.initialize(LDAP_SERVER)
def login(username, password):
try:
l.simple_bind_s(username, password)
base = "OU=Users,OU=Group,DC=domain,DC=ad"
criteria = "(&(objectClass=user)(sAMAccountName=anActualUsername))" #WORKS
criteria = '(&(objectClass=user)(sAMAccountName=%s))' % username #DOESNT WORK
criteria = "(&(objectClass=user)" + "(sAMAccountName=" + username + "))" #DOESNT WORK
attributes = ['displayName']
result = l.search_s(base, ldap.SCOPE_SUBTREE, criteria, attributes)
print result
except ldap.INVALID_CREDENTIALS:
return False
return True
login(username,password)

Did you try to encode your string ?
criteria = ('(&(objectClass=user)(sAMAccountName=%s))' % username).encode('utf8')

In the "WORKS" case, your filter string contains a simple name with no domain:
(&(objectClass=user)(sAMAccountName=bobsmith))
In the "DOESN'T WORK" case, you use a name with a domain:
(&(objectClass=user)(sAMAccountName=domain\serviceAccount)
The character \ is not allowed in a filter string unless it is escaped.
How to fix this depends upon the data present in your ldap server. Perhaps this:
criteria = '(&(objectClass=user)(sAMAccountName=%s))' % (
username if '\\' not in username else username.split('\\')[1])
Or perhaps this:
criteria = '(&(objectClass=user)(sAMAccountName=%s))' % (
ldap.filter.escape_filter_chars(username))

I needed to use ldap.filter.filter_format for proper character escaping.
import ldap.filter
criteria= ldap.filter.filter_format('(&(objectClass=user)(sAMAccountName=%s))', [username])

Try switching single quotes with double quotes.
criteria = "(&(objectClass=user)(sAMAccountName=anActualUsername))" #WORKS
criteria = '(&(objectClass=user)(sAMAccountName=%s))' % username #DOESNT WORK
the second criteria change it to this one (I didn't try with %s but only string):
criteria = "(&(objectClass=user)(sAMAccountName=%s))" % username #SHOULD WORK

Using regular expressions to match a word in Python

I am using PRAW to make a reddit bot that takes the comment author of someone who says "alot" and stores their username into a list. I am having troubles with the regular expression and how to get the string to work. Here is my code.
#importing praw for reddit api and time to make intervals
import praw
import time
import re
username = "LewisTheRobot"
password =
r = praw.Reddit(user_agent = "Counts people who say alot")
word_to_match = ['\balot\b']
storage = []
r.login(username, password)
def run_bot():
subreddit = r.get_subreddit("test")
print("Grabbing subreddit")
comments = subreddit.get_comments(limit=200)
print("Grabbing comments")
for comment in comments:
comment_text = comment.body.lower()
isMatch = any(string in comment_text for string in word_to_match)
if comment.id not in storage and isMatch:
print("Match found! Storing username: " + str(comment.author) + " into list.")
storage.append(comment.author)
print("There are currently: " + str(len(storage)) + " people who use 'alot' instead of ' a lot'.")
while True:
run_bot()
time.sleep(5)
so the regular expression I am using looks for the word alot instead of alot as part of a string. Example zealot. Whenever I run this, it will not find a comment that I have made. Any suggestions?

You're checking with string operations, not RE ones, in
isMatch = any(string in comment_text for string in word_to_match)
The first in here checks for a substring -- nothing to do with REs.
Change this to
isMatch = any(re.search(string, comment_text) for string in word_to_match)
Moreover, you have an error in your initialization:
word_to_match = ['\balot\b']
'\b' is the character with code 0x08 (backspace). Always use raw string syntax for RE patterns, to avoid such traps:
word_to_match = [r'\balot\b']
Now you'll have a couple of characters, backslash then b, which RE will interpret to mean "word boundary".
There may be other bugs but I try not to look for more than two bugs per question...:-)

How to manipulate a URL string in order to extract a single piece?

I'm new to programming and Python.
Background
My program accepts a url. I want to extract the username from the url.
The username is the subdomain.
If the subdomain is 'www', the username should be the main part of the domain. The rest of the domain should be discard (eg. '.com/', '.org/')
I've tried the following:
def get_username_from_url(url):
if url.startswith(r'http://www.'):
user = url.replace(r'http://www.', '', 1)
user = user.split('.')[0]
return user
elif url.startswith(r'http://'):
user = url.replace(r'http://', '', 1)
user = user.split('.')[0]
return user
easy_url = "http://www.httpwwwweirdusername.com/"
hard_url = "http://httpwwwweirdusername.blogger.com/"
print get_username_from_url(easy_url)
# output = httpwwwweirdusername (good! expected.)
print get_username_from_url(hard_url)
# output = weirdusername (bad! username should = httpwwwweirdusername)
I've tried many other combinations using strip(), split(), and replace().
Could you advise me on how to solve this relatively simple problem?

There is a module called urlparse that is specifically for the task:
>>> from urlparse import urlparse
>>> url = "http://httpwwwweirdusername.blogger.com/"
>>> urlparse(url).hostname.split('.')[0]
'httpwwwweirdusername'
In case of http://www.httpwwwweirdusername.com/ it would output www which is not desired. There are workarounds to ignore www part, like, for example, get the first item from the splitted hostname that is not equal to www:
>>> from urlparse import urlparse
>>> url = "http://www.httpwwwweirdusername.com/"
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www')
'httpwwwweirdusername'
>>> url = "http://httpwwwweirdusername.blogger.com/"
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www')
'httpwwwweirdusername'

Possible to do this with regular expressions (could probably modify the regex to be more accurate/efficient).
import re
url_pattern = re.compile(r'.*/(?:www.)?(\w+)')
def get_username_from_url(url):
match = re.match(url_pattern, url)
if match:
return match.group(1)
easy_url = "http://www.httpwwwweirdusername.com/"
hard_url = "http://httpwwwweirdusername.blogger.com/"
print get_username_from_url(easy_url)
print get_username_from_url(hard_url)
Which yields us:
httpwwwweirdusername
httpwwwweirdusername

Find strings that begins with a '#' and create link

I want to check whether a string (a tweet) begins with a '#' (i.e. is a hashtag) or not, and if so create a link.
Below is what I've tried so far but it doesn't work (error on the last line).
How can I fix this and will the code work for the purpose?
tag_regex = re.compile(r"""
[\b#\w\w+] # hashtag found!""", re.VERBOSE)
message = raw_message
for tag in tag_regex.findall(raw_message):
message = message.replace(url, '' + message + '')

>>> msg = '#my_tag the rest of my tweet'
>>> re.sub('^#(\w+) (.*)', r'\2', msg)
'the rest of my tweet'
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python RegEx with word boundaries - python

Related

Is there an equivalent to RStudio's ".rs.askForPassword" in Python?

LDAP search with username as variable

Using regular expressions to match a word in Python

How to manipulate a URL string in order to extract a single piece?

Find strings that begins with a '#' and create link

Categories

Resources