Using regular expressions to match a word in Python - python

I am using PRAW to make a reddit bot that takes the comment author of someone who says "alot" and stores their username into a list. I am having troubles with the regular expression and how to get the string to work. Here is my code.
#importing praw for reddit api and time to make intervals
import praw
import time
import re
username = "LewisTheRobot"
password =
r = praw.Reddit(user_agent = "Counts people who say alot")
word_to_match = ['\balot\b']
storage = []
r.login(username, password)
def run_bot():
subreddit = r.get_subreddit("test")
print("Grabbing subreddit")
comments = subreddit.get_comments(limit=200)
print("Grabbing comments")
for comment in comments:
comment_text = comment.body.lower()
isMatch = any(string in comment_text for string in word_to_match)
if comment.id not in storage and isMatch:
print("Match found! Storing username: " + str(comment.author) + " into list.")
storage.append(comment.author)
print("There are currently: " + str(len(storage)) + " people who use 'alot' instead of ' a lot'.")
while True:
run_bot()
time.sleep(5)
so the regular expression I am using looks for the word alot instead of alot as part of a string. Example zealot. Whenever I run this, it will not find a comment that I have made. Any suggestions?

You're checking with string operations, not RE ones, in
isMatch = any(string in comment_text for string in word_to_match)
The first in here checks for a substring -- nothing to do with REs.
Change this to
isMatch = any(re.search(string, comment_text) for string in word_to_match)
Moreover, you have an error in your initialization:
word_to_match = ['\balot\b']
'\b' is the character with code 0x08 (backspace). Always use raw string syntax for RE patterns, to avoid such traps:
word_to_match = [r'\balot\b']
Now you'll have a couple of characters, backslash then b, which RE will interpret to mean "word boundary".
There may be other bugs but I try not to look for more than two bugs per question...:-)

Related

Change direction of outlook sent email to be from right to left in python

I want to send arabic email using python through win32com, but I want to change the direction of text to be from Right to left in the outlook like in the picture right to left direction in outlook
Is there a way to do that through the code ?
here is my code
import win32com.client as win32
from pathlib import Path
dataframe1 = dataframe.active
olapp = win32.Dispatch('Outlook.Application')
olns = olapp.GetNameSpace('MAPI')
arabic_msg_file = Path('arabic_body.txt')
mail_item = olapp.CreateItem(0)
mail_item.CC = 'sender#gamil.com'
mail_item.To = 'receiver#gamil.com'
mail_item.BodyFormat = 1
mail_item.Subject = 'subject'
mail_item.Body = arabic_msg_file.read_text(encoding='utf-8')
mail_item.Send()
To expand on what I suggested in the comments, here is one possible solution. I've replaced the reading of the text file with some dummy text lines.
import win32com.client as wc
ol = wc.gencache.EnsureDispatch('Outlook.Application')
text = 'Broad is the Gate and wide the Path\n'
text += 'That leads man to his daily bath.\n'
text += 'But \'ere you spend the shining hour,\n'
text += 'With plunge and spray, with sluice and shower,\n'
text += 'Remember, whereso\'er you be,\n'
text += 'To shut the door and turn the key!\n'
item = ol.CreateItem(wc.constants.olMailItem)#=0
item.To = 'xxx#yyy.com'
item.Subject = 'Test email formatting'
item.Body = text
item.BodyFormat = wc.constants.olFormatHTML #=2
item.Display()
#Either explicitly cast to the Word Document interface
doc = wc.CastTo(item.GetInspector.WordEditor,'_Document')
doc.Paragraphs.ReadingOrder = wc.constants.wdReadingOrderRtl #=0
#Or use the 'magic number' for the reading order
#doc = item.GetInspector.WordEditor
#doc.Paragraphs.ReadingOrder = 0
item.Send()
This is what arrives at the Recipient's end:
Notes:
I haven't found a way to be able to Send without Display-ing first. Maybe some other SO member can suggest a way? Send() throws a The parameter is incorrect exception if Display() has not been called previously.
I am not a fan of 'magic numbers' replacing constants, hence I have used the gencache route for creating objects, and CastTo to generate the constants for the Word Document interface. If you want to stick with simple Dispatch() then I have included the magic numbers for the constants. Using the constants makes it easier to port VBA code to Python.
But an alternative and simpler approach is to turn the text into HTML first:
import win32com.client as wc
ol = wc.gencache.EnsureDispatch('Outlook.Application')
text = 'Broad is the Gate and wide the Path\n'
text += 'That leads man to his daily bath.\n'
text += 'But \'ere you spend the shining hour,\n'
text += 'With plunge and spray, with sluice and shower,\n'
text += 'Remember, whereso\'er you be,\n'
text += 'To shut the door and turn the key!\n'
item = ol.CreateItem(wc.constants.olMailItem)#=0
item.To = 'xxx#yyy.com'
item.Subject = 'Test email formatting'
item.BodyFormat = wc.constants.olFormatHTML #=1
item.HTMLBody = "<p dir=RTL style='text-align:right;direction:rtl'><span dir=LTR>" + text.replace('\n','<br>') + "</span></p>"
item.Send()
I am not familiar with how punctuation works in Arabic script, so you may need to play around with the HTML markup.
The plain text body format doesn't allow setting up any formatting, nor provides any options for specifying the layout:
mail_item.BodyFormat = 1
mail_item.Body = arabic_msg_file.read_text(encoding='utf-8')
Note, you can use the Word object model for setting up the message body in the way you need. The WordEditor property of the Inspector class returns an instance of the Word Document class where you can set up the ParagraphFormat.Alignment property, for example:
Selection.ParagraphFormat.Alignment = wdAlignParagraphRight

How NOT to print emojis from comments or submission when using praw

Getting error messages when I am trying to print out comments or submission with emojis in it. How can I just disregard and print only letters and numbers?
Using Praw to webscrape
top_posts2 = page.top(limit = 25)
for post in top_posts2:
outputFile.write(post.title)
outputFile.write(' ')
outputFile.write(str(post.score))
outputFile.write('\n')
outputFile.write(post.selftext)
outputFile.write('\n')
submissions = reddit.submission(id = post.id)
comment_page = submissions.comments
top_comment = comment_page[0] #by default, this will be the best comment of the post
commentBody = top_comment.body
outputFile.write(top_comment.body)
outputFile.write('\n')
I want to output only letters and numbers. and maybe some special characters (or all)
There's a couple ways you can do this. I would recommend creating kind of a "text cleaning" function
def cleanText(text):
new_text = ""
for c in text: # for each character in the text
if c.isalnum(): # check if it is either a letter or number (alphanumeric)
new_text += c
return new_text
or if you want to include specific non-alphanumeric numbers
def cleanText(text):
valid_symbols = "!##$%^&*()" # <-- add whatever symbols you want here
new_text = ""
for c in text: # for each character in the text
if c.isalnum() or c in valid_symbols: # check if alphanumeric or a valid symbol
new_text += c
return new_text
so then in your script you can do something like
commentBody = cleanText(top_comment.body)

LDAP search with username as variable

I am using the Python-LDAP module and trying to make a query on the logged in user. The username will be passed into the query. When I simply type the username in as a string my results come out correctly.
But if I try to pass the (username) variable it returns
LDAPError - FILTER_ERROR: {'desc': u'Bad search filter'} I've tried a number of different combinations but continue to get the same error returned. Any insight here would be great!
Edited for Minimal, Complete, and Verifiable example:
import ldap
LDAP_SERVER = "ldap://myldapserver.domain.ad:389"
username = r"domain\serviceAccount"
password = "Password"
l = ldap.initialize(LDAP_SERVER)
def login(username, password):
try:
l.simple_bind_s(username, password)
base = "OU=Users,OU=Group,DC=domain,DC=ad"
criteria = "(&(objectClass=user)(sAMAccountName=anActualUsername))" #WORKS
criteria = '(&(objectClass=user)(sAMAccountName=%s))' % username #DOESNT WORK
criteria = "(&(objectClass=user)" + "(sAMAccountName=" + username + "))" #DOESNT WORK
attributes = ['displayName']
result = l.search_s(base, ldap.SCOPE_SUBTREE, criteria, attributes)
print result
except ldap.INVALID_CREDENTIALS:
return False
return True
login(username,password)
Did you try to encode your string ?
criteria = ('(&(objectClass=user)(sAMAccountName=%s))' % username).encode('utf8')
In the "WORKS" case, your filter string contains a simple name with no domain:
(&(objectClass=user)(sAMAccountName=bobsmith))
In the "DOESN'T WORK" case, you use a name with a domain:
(&(objectClass=user)(sAMAccountName=domain\serviceAccount)
The character \ is not allowed in a filter string unless it is escaped.
How to fix this depends upon the data present in your ldap server. Perhaps this:
criteria = '(&(objectClass=user)(sAMAccountName=%s))' % (
username if '\\' not in username else username.split('\\')[1])
Or perhaps this:
criteria = '(&(objectClass=user)(sAMAccountName=%s))' % (
ldap.filter.escape_filter_chars(username))
I needed to use ldap.filter.filter_format for proper character escaping.
import ldap.filter
criteria= ldap.filter.filter_format('(&(objectClass=user)(sAMAccountName=%s))', [username])
Try switching single quotes with double quotes.
criteria = "(&(objectClass=user)(sAMAccountName=anActualUsername))" #WORKS
criteria = '(&(objectClass=user)(sAMAccountName=%s))' % username #DOESNT WORK
the second criteria change it to this one (I didn't try with %s but only string):
criteria = "(&(objectClass=user)(sAMAccountName=%s))" % username #SHOULD WORK

Python RegEx with word boundaries

I am trying to write a login routine for a python script. In doing so, I find the need to pattern match the credentials on a whole word basis. I have attempted to RegEx this, but it is failing for reasons that are unclear to me, but I hope are obvious to someone here. The code and output:
import re
authentry = "testusertestpass"
username = "testuser"
password = "testpass"
combo = "r\'\\b"+username + password + "\\b\'"
testcred = re.search(combo, authentry)
print combo
print authentry
print testcred
r'\btestusertestpass\b'
testusertestpass
None
So my regex test appears, at least to me, to be properly formatted, and should be a direct match against the test string, but is not. Any ideas? Thanks so much for any insight!
try this: it may works.
import re
authentry = "testusertestpass with another text"
username = "testuser"
password = "testpass"
combo = username + password + r'\b'
testcred = re.search(combo, authentry)
print combo
print authentry
print testcred
output:
testusertestpass\b
testusertestpass with another text
<_sre.SRE_Match object at 0x1b8a030>

Find strings that begins with a '#' and create link

I want to check whether a string (a tweet) begins with a '#' (i.e. is a hashtag) or not, and if so create a link.
Below is what I've tried so far but it doesn't work (error on the last line).
How can I fix this and will the code work for the purpose?
tag_regex = re.compile(r"""
[\b#\w\w+] # hashtag found!""", re.VERBOSE)
message = raw_message
for tag in tag_regex.findall(raw_message):
message = message.replace(url, '' + message + '')
>>> msg = '#my_tag the rest of my tweet'
>>> re.sub('^#(\w+) (.*)', r'\2', msg)
'the rest of my tweet'
>>>

Categories

Resources