Question here for you. I have this script for python that checks large datasets for emails and extracts them. On my mac it just displays all the email addresses in the terminal. Sometimes the files are 1-2 gigs so it can take a bit and the output is insane. I was wondering how easy in Python is it to have it just save to a file instead of printing it all out in terminal.
I dont even need to see it all being dumped into the terminal.
Here is the script I am working with
#!/usr/bin/env python
#
# Extracts email addresses from one or more plain text files.
#
# Notes:
# - Does not save to file (pipe the output to a file if you want it saved).
# - Does not check for duplicates (which can easily be done in the terminal).
#
from optparse import OptionParser
import os.path
import re
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(#|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
def file_to_str(filename):
"""Returns the contents of filename as a string."""
with open(filename) as f:
return f.read().lower() # Case is lowered to prevent regex mismatches.
def get_emails(s):
"""Returns an iterator of matched emails found in string s."""
# Removing lines that start with '//' because the regular expression
# mistakenly matches patterns like 'http://foo#bar.com' as '//foo#bar.com'.
return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))
if __name__ == '__main__':
parser = OptionParser(usage="Usage: python %prog [FILE]...")
# No options added yet. Add them here if you ever need them.
options, args = parser.parse_args()
if not args:
parser.print_usage()
exit(1)
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
print email
else:
print '"{}" is not a file.'.format(arg)
parser.print_usage()
Instead of printing, just write to a file instead.
with open('filename.txt', 'w') as f:
f.write('{}\n'.format(email))
First, you need to open a file:
file = open('output', 'w')
Then, instead of printing the email, write it in the file: file.write(email + '\n')
You can also just redirect the output of the program to a file at execution time as jasonharper said.
While printing , replace with write statement
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
print email
In that ,just replace with
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
with open (tempfile , 'a+') as writefile:
writefile.write(name+'\n')
tempfile is location of your output file
Related
I need some Python code, which goes into a txt, takes an email and copies it.
The email will look like this in the txt:
Email: teste1#gmail.com
password: teste1
just the email,
would help me a lot because I'm racking my brain here
#selenium webdriver
I want to copy the email from a txt to paste into a website to automatically login part of the login I know, the problem is getting the email from the txt
and the txt I say is a notepad
I don't see a whole lot to go off here so I don't feel like writing anything. If you specify what you want more I can help. However, please look this over in the meantime:
https://www.w3schools.com/python/python_file_open.asp
I hope this helps. There are better ways of doing it but I don't know what level of Python you're at, so I will try to keep it simple.
with open(r"Downloads\test.txt", "r") as txt_reader:
found_email = "Not Found"
for line in txt_reader:
if "email" in line.lower():
#Breaks line into a list of words seperated by ':' Choses the 2nd word and removes blank spaces
found_email = line.split(":")[1].strip()
print(found_email)
Here is the txt file:
Email: testemail#gmail.com
Password: pas5w0rd!
I found another way to do this try this.First create python file
from optparse import OptionParser
import os.path
import re
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(#|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
def file_to_str(filename):
"""Returns the contents of filename as a string."""
with open(filename) as f:
return f.read().lower() # Case is lowered to prevent regex mismatches.
def get_emails(s):
"""Returns an iterator of matched emails found in string s."""
# Removing lines that start with '//' because the regular expression
# mistakenly matches patterns like 'http://foo#bar.com' as '//foo#bar.com'.
return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))
if __name__ == '__main__':
parser = OptionParser(usage="Usage: python %prog [FILE]...")
# No options added yet. Add them here if you ever need them.
options, args = parser.parse_args()
if not args:
parser.print_usage()
exit(1)
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
print(email)
else:
print('"{}" is not a file.'.format(arg))
parser.print_usage()
I can recommend this.because this work for me.
I tried to create a find email addresses in a text file it work pretty well.After using pyperclip I can copy only last email address but I want to copy all the emails to clipboard.How can I do this.And also I would like to try another way of do this if you better way to do this thing please let me know as well.By the way I have another text file that contains text file with emails.And I use this gitlist to make this python program gitlist link
from optparse
import OptionParser
import os.path
import re
import pyperclip
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(#|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
def file_to_str(filename):
"""Returns the contents of filename as a string."""
with open(filename) as f:
return f.read().lower() # Case is lowered to prevent regex mismatches.
def get_emails(s):
"""Returns an iterator of matched emails found in string s."""
# Removing lines that start with '//' because the regular expression
# mistakenly matches patterns like 'http://foo#bar.com' as '//foo#bar.com'.
return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))
if __name__ == '__main__':
parser = OptionParser(usage="Usage: python %prog [FILE]...")
# No options added yet. Add them here if you ever need them.
options, args = parser.parse_args()
if not args:
parser.print_usage()
exit(1)
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
print(email)
pyperclip.copy(email)
else:
print('"{}" is not a file.'.format(arg))
parser.print_usage()
I found a way to do this you don't need to write many lines as you did try this
with open(r"text.txt", "r") as txt_reader:
found_email = "Not Found"
for line in txt_reader:
if "email" in line.lower():
#Breaks line into a list of words seperated by ':' Choses the 2nd
word and removes blank spaces
found_email = line.split(":")[1].strip()
print(found_email)
as a example if your test.txt file is like this
Email: susantha#gmail.com
Password: pas5w0rd!
then this program will copy the email and password to clipboard
I have a script that export all email adresses from a .txt document and print all the email adresses.
I would like to save this to list.txt, and if possible delete duplicates,
but it will give the error
Traceback (most recent call last):
File "mail.py", line 44, in <module>
notepad.write(email.read())
AttributeError: 'str' object has no attribute 'read'
Script:
from optparse import OptionParser
import os.path
import re
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(#|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
def file_to_str(filename):
"""Returns the contents of filename as a string."""
with open(filename) as f:
return f.read().lower() # Case is lowered to prevent regex mismatches.
def get_emails(s):
"""Returns an iterator of matched emails found in string s."""
# Removing lines that start with '//' because the regular expression
# mistakenly matches patterns like 'http://foo#bar.com' as '//foo#bar.com'.
return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))
if __name__ == '__main__':
parser = OptionParser(usage="Usage: python %prog [FILE]...")
# No options added yet. Add them here if you ever need them.
options, args = parser.parse_args()
if not args:
parser.print_usage()
exit(1)
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
#print email
notepad = open("list.txt","wb")
notepad.write(email.read())
notepad.close()
else:
print '"{}" is not a file.'.format(arg)
parser.print_usage()
When I remove .read() it shows only 1 email adres in list.txt when I
use print email is shows a couple of hundred. when refreshing the
list.txt while the extraction is busy the email adres change's but it
only shows 1.
This is because you have open() and close() within the loop, i. e. the file is written anew for each email and you end up with only the last address line written. Change the loop to:
notepad = open("list.txt", "wb")
for email in get_emails(file_to_str(arg)):
#print email
notepad.write(email)
notepad.close()
or even better:
with open("list.txt", "wb") as notepad:
for email in get_emails(file_to_str(arg)):
#print email
notepad.write(email)
I tried many way to save output to text file but it don't work for me
this is code
from optparse import OptionParser
import os.path
import re
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(#|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
def file_to_str(filename):
"""Returns the contents of filename as a string."""
with open(filename) as f:
return f.read().lower() # Case is lowered to prevent regex mismatches.
def get_emails(s):
"""Returns an iterator of matched emails found in string s."""
# Removing lines that start with '//' because the regular expression
# mistakenly matches patterns like 'http://foo#bar.com' as '//foo#bar.com'.
return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))
if __name__ == '__main__':
parser = OptionParser(usage="Usage: python %prog [FILE]...")
# No options added yet. Add them here if you ever need them.
options, args = parser.parse_args()
if not args:
parser.print_usage()
exit(1)
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
print email
else:
print '"{}" is not a file.'.format(arg)
parser.print_usage()
when you run script
it will print emails in dos screen
I want save it to text file
You can replace the below code with your own.
file = open(output_path, 'w')
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
file.write(email + '\n')
else:
print '"{}" is not a file.'.format(arg)
file.close()
Your code already has a few print statements (You should use a logger instead) but instead of adding to code to write to a file, why not just
$ python myscript.py >> output.txt
That will give you the exact same output without adding code.
$ python your_script.py > path/to/output_file/file_name.txt
OR
$ python your_script.py >> path/to/output_file/file_name.txt
This will give the output given by your print statements into file_name.txt file.
I want to filter a log file to keep all lines matching a certain pattern. I want to do this with Python.
Here's my first attempt:
#!/usr/bin/env python
from sys import argv
script, filename = argv
with open(filename) as f:
for line in f:
try:
e = line.index("some_term_I_want_to_match")
except:
pass
else:
print(line)
How can I improve this to:
save the result to a new file of similar name (i.e., a different extension)
use regex to make it more flexible/powerful.
(I'm just learning Python. This question is as much about learning Python as it is about accomplishing this particular result.)
OK, here's what I came up with so far... But how do you do the equivalent of prepending an r as in the following line
re.compile(r"\s*")
where the string is not a string literal, as in the next line?
re.compile(a_string_variable)
Other than that, I think this updated version does the job:
#!/usr/bin/env python
from sys import argv
import re
import os
import argparse #requires Python 2.7 or above
parser = argparse.ArgumentParser(description='filters a text file on the search phrase')
parser.add_argument('-s','--search', help='search phrase or keyword to match',required=True)
parser.add_argument('-f','--filename', help='input file name',required=True)
parser.add_argument('-v','--verbose', help='display output to the screen too', required=False, action="store_true")
args = parser.parse_args()
keyword = args.search
original_file = args.filename
verbose = args.verbose
base_file, ext = os.path.splitext(original_file)
new_file = base_file + ".filtered" + ext
regex_c = re.compile(keyword)
with open(original_file) as fi:
with open(new_file, 'w') as fo:
for line in fi:
result = regex_c.search(line)
if(result):
fo.write(line)
if(verbose):
print(line)
Can this be easily improved?
Well, you know, you have answered most of your questions yourself already :)
For regular expression matching use re module (the doc has pretty explanatory examples).
You already have made use open() function for opening a file. Use the same function for open files for writing, just provide a corresponding mode parameter ("w" or "a" combined with "+" if you need, see help(open) in the Python interactive shell). That's it.