The code is working fine when the text in clipboard has no email address or phone number i.e., when expected result is "Nothing Found"
For other case, it is not working. It is showing error -
AttributeError: 'str' object has no attribute 'matches'
#! python3
# contactDetails.py - Finds email and phone number from a page
import pyperclip, re
phoneRegex = re.compile(r'(\+\d{2}-\d{10})') # Phone Number Regex
# email Regex
emailRegex = re.compile(r'''(
[a-zA-Z0-9._]+ # username
# # # symbol
[a-zA-Z0-9._]+ # domain name
(\.[a-zA-Z]{2,4}])# dot-something
)''', re.VERBOSE)
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
phoneNum=phoneRegex.findall(text)
matches.append(phoneNum)
for groups in emailRegex.findall(text):
matches.append(groups[0])
if len(matches) >0:
pyperclip.copy('\n'.matches)
print('Copied to Clipboard:')
print('\n'.join(matches))
else:
print('Nothing Found')
As was mentioned in the comment by Wiktor Stribiżew, the problem is in this line
pyperclip.copy('\n'.matches)
In particular, it is here
'\n'.matches
The first item '\n' is a string object, and has no property called matches that can be called. What you want is to do a .join as you had done two lines later i.e.
pyperclip.copy('\n'.join(matches))
Related
I have the following code which matches rdar://problem (one or more) in the commit_msg, I only want to match it at the beginning of the message, please note that it could be more than one rdar at the beginning of the message, how can I change the regex to do that?
# -*- coding: utf-8 -*-
import re
commit_msg = """
<rdar://problem/19391231> This is the subject line1
<rdar://problem/11121314> This is the subject line2
[Problem]
The Problem description
[Solution]
This is the Solutions section
[Recommended Tests]
This is the Recommended Tests <rdar://problem/12345678> Text
Change-Id: Ibbafa780adb2502d470f12d0280ddb0049c727c4
Reviewed-on: https://tech-gerrit.sd.company.com/17954
Tested-by: Username1 <username1#company.com>
Build-watchOS: service account <serviceaccount#company.com>
Reviewed-by: username2 <username2#company.com>
"""
m = re.findall("(?!.*(?:Revert|revert))[\S]*(?:rdar:\/\/problem\/)(\d{8,8})", commit_msg)
print m
CURRENT OUTPUT:-
['19391231', '11121314', '12345678']
EXPECTED OUTPUT:-
['19391231', '11121314']
Going off your conversation with #ShadowRanger below, how about this?
import re
commit_msg = """
<rdar://problem/19391231> This is the subject line1
<rdar://problem/11121314> This is the subject line2
[Problem]
The Problem description
[Solution]
This is the Solutions section
[Recommended Tests]
This is the Recommended Tests <rdar://problem/12345678> Text
Change-Id: Ibbafa780adb2502d470f12d0280ddb0049c727c4
Reviewed-on: https://tech-gerrit.sd.company.com/17954
Tested-by: Username1 <username1#company.com>
Build-watchOS: service account <serviceaccount#company.com>
Reviewed-by: username2 <username2#company.com>
"""
m = re.findall("(?!.*(?:Revert|revert))[\S]*(?:rdar:\/\/problem\/)(\d{8,8})", commit_msg.split('[')[0])
print m
I am using Python 3.5 on MacOS Sierra. I am working on the course, Automate the Boring Stuff with Python and having a problem with pyperclip. The code (below) works when I copy only 4 lines of the pdf, however when I copy all of the text I get an error message back(below).
Could someone help me? Is it a problem with pyperclip? My code? My computer?
Error message:
Traceback (most recent call last):
File "/Users/ericgolden/Documents/MyPythonScripts/phoneAndEmail.py", line 35, in <module>
text = pyperclip.paste()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pyperclip/clipboards.py", line 22, in paste_osx
return stdout.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 79: invalid continuation byte
Here is my code:
#! python3
import re, pyperclip
# Create a regex for phone numbers
phoneRegex = re.compile(r'''
# 415-555-000, 555-0000, (415) 555-0000, 555-000 ext 12345, ext. 12345, x12345
(
((\d\d\d) | (\(\d\d\d\)))? # area code optional
(\s|-) # first separator
\d\d\d # first 3 digits
- # seperator
\d\d\d\d # last 4 digits
(((ext(\.)?\s)|x) # extension word-part optional
(\d{2,5}))? # extension number-part optional
)
''', re.VERBOSE)
# Create a regex for email addresses
emailRegex = re.compile(r'''
# some.+_things#(\d{2,5}))?.com
[a-zA-Z0-9_.+]+ # name part
# # # symbol
[a-zA-Z0-9_.+]+ # domain name part
''', re.VERBOSE)
# Get the text off the clipboard
text = pyperclip.paste()
# TODO: Extract the email/phone from this text
extractedPhone = phoneRegex.findall(text)
extractedEmail = emailRegex.findall(text)
allPhoneNumbers = []
for phoneNumber in extractedPhone:
allPhoneNumbers.append(phoneNumber[0])
# TODO: Copy the extraced email/phone to the clipboard
results = '\n'.join(allPhoneNumbers) + '\n' + '\n'.join(extractedEmail)
pyperclip.copy(results)
can you try to change the import to: import pyperclip, re ?
Also, here is just an example of my code if it helps, since I am using the same book.
#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.
import pyperclip, re
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\))? # area code
(\s|-|\.)? # separator
(\d{3}) # first 3 digits
(\s|-|\.) # separator
(\d{4}) # last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
)''', re.VERBOSE)
# Create email regex.
emailRegex = re.compile(r'''(
[a-zA-Z0-9._%+-]+ # username
# # # symbol
[a-zA-Z0-9.-]+ # domain name
(\.[a-zA-Z]{2,4}) # dot-something
)''', re.VERBOSE)
# Find matches in clipboard text.
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
phoneNum = '-'.join([groups[1], groups[3], groups[5]])
if groups[8] != '':
phoneNum += ' x' + groups[8]
matches.append(phoneNum)
for groups in emailRegex.findall(text):
matches.append(groups[0])
# Copy results to the clipboard.
if len(matches) > 0:
pyperclip.copy('\n'.join(matches))
print('Copied to clipboard:')
print('\n'.join(matches))
else:
print('No phone numbers or email addresses found.')
I'm trying to create a 'check' system for a password generator that will advsie whether or not three of the same types of character family are found in a row in a generated password, i.e
If the password is
y8kpBD8zcZLKRSh1j7vwCMDQ5orR8VEP
it will find 'ZLK' etc
I first thought lowercase_repeat = re.compile("[a-z]{3}") would for example find three lowercase repeats, but I can't seem to understand how this works exactly.
The password generator is below:
import random
import re
generator = random.SystemRandom()
password_characters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789!##$%^&*()'
password = ''.join(generator.choice(password_characters) for _ in range(32))
print password
If you just want to check for specific character sets; e.g: all uppercase, all lowercase, digit and non-alnum - you can create a non-capturing group for each set. For example:
import re
pattern = '(?:[a-z]{3}|[A-Z]{3}|\d{3}|[\x20-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]{3})'
password = 'y8kpBD8zcZLKRSh1j7vwCMDQ5orR8VEP!'
matches = re.search(pattern, password)
The variable matches returns None if there are no matches, indicating the password passes.
The pattern [\x20-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E] is a (probably pretty gnarly) way to catch a set of all non-alnum ascii characters (hex codes). It represents the following set:
[space] ! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~
I pulled it out of an old project, so YMMV. I'm sure there might be a more succinct way to express it - indeed, you might prefer to explicitly specify a set; e.g: [!?#] etc.
Quick sanity-check:
import re
def check_password(password):
pattern = '(?:[a-z]{3}|[A-Z]{3}|\d{3}|[\x20-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]{3})'
return re.search(pattern, password)
passwords = ['a', 'abc', 'ABC', 'aBc', '1bc', '123']
for password in passwords:
if check_password(password):
print 'password failed: ', password
else:
print 'password passed: ', password
Yields:
password passed: a
password failed: abc
password failed: ABC
password passed: aBc
password passed: 1bc
password failed: 123
Hope this helps :)
I am using PRAW to make a reddit bot that takes the comment author of someone who says "alot" and stores their username into a list. I am having troubles with the regular expression and how to get the string to work. Here is my code.
#importing praw for reddit api and time to make intervals
import praw
import time
import re
username = "LewisTheRobot"
password =
r = praw.Reddit(user_agent = "Counts people who say alot")
word_to_match = ['\balot\b']
storage = []
r.login(username, password)
def run_bot():
subreddit = r.get_subreddit("test")
print("Grabbing subreddit")
comments = subreddit.get_comments(limit=200)
print("Grabbing comments")
for comment in comments:
comment_text = comment.body.lower()
isMatch = any(string in comment_text for string in word_to_match)
if comment.id not in storage and isMatch:
print("Match found! Storing username: " + str(comment.author) + " into list.")
storage.append(comment.author)
print("There are currently: " + str(len(storage)) + " people who use 'alot' instead of ' a lot'.")
while True:
run_bot()
time.sleep(5)
so the regular expression I am using looks for the word alot instead of alot as part of a string. Example zealot. Whenever I run this, it will not find a comment that I have made. Any suggestions?
You're checking with string operations, not RE ones, in
isMatch = any(string in comment_text for string in word_to_match)
The first in here checks for a substring -- nothing to do with REs.
Change this to
isMatch = any(re.search(string, comment_text) for string in word_to_match)
Moreover, you have an error in your initialization:
word_to_match = ['\balot\b']
'\b' is the character with code 0x08 (backspace). Always use raw string syntax for RE patterns, to avoid such traps:
word_to_match = [r'\balot\b']
Now you'll have a couple of characters, backslash then b, which RE will interpret to mean "word boundary".
There may be other bugs but I try not to look for more than two bugs per question...:-)
I want to check whether a string (a tweet) begins with a '#' (i.e. is a hashtag) or not, and if so create a link.
Below is what I've tried so far but it doesn't work (error on the last line).
How can I fix this and will the code work for the purpose?
tag_regex = re.compile(r"""
[\b#\w\w+] # hashtag found!""", re.VERBOSE)
message = raw_message
for tag in tag_regex.findall(raw_message):
message = message.replace(url, '' + message + '')
>>> msg = '#my_tag the rest of my tweet'
>>> re.sub('^#(\w+) (.*)', r'\2', msg)
'the rest of my tweet'
>>>