My ideia is to find every email in a sentence and replace it for a different random email (anonymization). But I can't get the result I want. Every email is replaced for the same one or I get an error (list index out of range)
input:
email = "daniel#hotmail.com sent it to ana#gmail.com"
output I want
email = "albert#hotmail.com sent it to john#gmail.com"
random_emails = ["albert", "john", "mary"]
def find_email(email: str):
result = email
i = 0
email_address = r"\S+#"
for text in email:
result = re.sub(email_address, random_emails[i] + "#", result)
i += 1
return result
print(find_email(email))
I found a solution, but note that identical emails will be anonymized in the same way. I let you try this :
import re
email = "daniel#hotmail.com sent it to ana#gmail.com"
random_emails = ["albert", "john", "mary"]
def find_email(email: str):
result = email
i = 0
email_address = r"\S+#"
regex_matches = re.findall(email_address, email)
for match in regex_matches:
result = result.replace(match, random_emails[i] + "#")
i += 1
return result
print(find_email(email))
You dont need for loop, and I think your RegExr can be improved
def find_email(email):
result = email
email_address = r"(\w+#)(\w+.* )(\w+#)(\w+.*)"
a='AAAAA#'
b='BBBBB#'
result = re.sub(email_address, rf'{a}\2{b}\4', result)
return result
email = "daniel#hotmail.com sent it to ana#gmail.com"
print(find_email(email))
Explaining:
You can create substitution groups:
1º = 1º email 2º = server and texts 3º = 2º email 4º = server.com
And now, you just need to replace \1 and \2 with everythink you want
example2: Your new routine
import re
from random import seed
from random import randint
random_emails = ["albert", "john", "mary"]
def find_email(email):
result = email
email_address = r"(\w+#)(\w+.* )(\w+#)(\w+.*)"
first = randint(0, 2)
second = randint(0, 2)
while first == second:
second = randint(0, 2)
result = re.sub(email_address, rf'{random_emails[first]}#\2{random_emails[second]}#\4', result)
return result
email = "daniel#hotmail.com sent it to ana#gmail.com"
print(find_email(email))
I used random to generate an random number to got emails from list.
And "while first == second:" just to not repeat first and second
emails
Related
I have a program which logs on to a specified gmail account and gets all the emails in a selected inbox that were sent from an email that you input at runtime.
I would like to be able to grab all the links from each email and append them to a list so that i can then filter out the ones i don't need before outputting them to another file. I was using a regex to do this which requires me to convert the payload to a string. The problem is that the regex i am using doesn't work for findall(), it only works when i use search() (I am not too familiar with regexes). I was wondering if there was a better way to extract all links from an email that doesn't involve me messing around with regexes?
My code currently looks like this:
print(f'[{Mail.timestamp}] Scanning inbox')
sys.stdout.write(Style.RESET)
self.search_mail_status, self.amount_matching_criteria = self.login_session.search(Mail.CHARSET,search_criteria)
if self.amount_matching_criteria == 0 or self.amount_matching_criteria == '0':
print(f'[{Mail.timestamp}] No mails from that email address could be found...')
Mail.enter_to_continue()
import main
main.main_wrapper()
else:
pattern = '(?P<url>https?://[^\s]+)'
prog = re.compile(pattern)
self.amount_matching_criteria = self.amount_matching_criteria[0]
self.amount_matching_criteria_str = str(self.amount_matching_criteria)
num_mails = re.search(r"\d.+",self.amount_matching_criteria_str)
num_mails = ((num_mails.group())[:-1]).split(' ')
sys.stdout.write(Style.GREEN)
print(f'[{Mail.timestamp}] Status code of {self.search_mail_status}')
sys.stdout.write(Style.RESET)
sys.stdout.write(Style.YELLOW)
print(f'[{Mail.timestamp}] Found {len(num_mails)} emails')
sys.stdout.write(Style.RESET)
num_mails = self.amount_matching_criteria.split()
for message_num in num_mails:
individual_response_code, individual_response_data = self.login_session.fetch(message_num, '(RFC822)')
message = email.message_from_bytes(individual_response_data[0][1])
if message.is_multipart():
print('multipart')
multipart_payload = message.get_payload()
for sub_message in multipart_payload:
string_payload = str(sub_message.get_payload())
print(prog.search(string_payload).group("url"))
Ended up using this for loop with a recursive function and a regex to get the links, i then removed all links without a the substring that you can input earlier on in the program before appending to a set
for message_num in self.amount_matching_criteria.split():
counter += 1
_, self.individual_response_data = self.login_session.fetch(message_num, '(RFC822)')
self.raw = email.message_from_bytes(self.individual_response_data[0][1])
raw = self.raw
self.scraped_email_value = email.message_from_bytes(Mail.scrape_email(raw))
self.scraped_email_value = str(self.scraped_email_value)
self.returned_links = prog.findall(self.scraped_email_value)
for i in self.returned_links:
if self.substring_filter in i:
self.link_set.add(i)
self.timestamp = time.strftime('%H:%M:%S')
print(f'[{self.timestamp}] Links scraped: [{counter}/{len(num_mails)}]')
The function used:
def scrape_email(raw):
if raw.is_multipart():
return Mail.scrape_email(raw.get_payload(0))
else:
return raw.get_payload(None,True)
I need to create a list/dataframe that has component ID's along with their description. I have a list containing the component ID and another list containing the component ID with a description. Only components with an ID in both lists should be displayed along with its description.
I have tried to use the component ID list to exact search in the component and description list. I wasn't able to get a desired output.
desclist = ['R402 MSG ='4k2 1%'','R403 MSG ='100 1%'','R404 MSG ='4k 1%'']
component = ['R402','R403','R404']
combinedlist = []
while count<(len(component) - 1):
while True:
for c in desclist:
if c in component[count]:
combinedlist.append(c)
print(comp[count]+ ' , ' + desclist[count])
count = count + 1
This is not code I've tried but believe is similar to what I need, I'm aware there is no loop until in python.
I expect the output to be something like:
R402 , MSG ='4k2 1%'
This will require me to remove everything before the equals in the description list.
This is a simple (easy to understand) way to accomplish what you need!
desclist = ['R402 MSG = Desc402','R403 MSG = Desc403',
'R404 MSG = Desc404','R405 MSG = Desc405']
component = ['R402','R403','R404','R406']
combinedlist = []
for i in range(len(component)):
found = False
for j in range(len(desclist)):
if str(component[i]) == str(desclist[j]).split(' ')[0]:
found = True
combinedlist.append(component[i] + ', ' + desclist[j].split(' ',1)[1])
print(component[i], ',', desclist[j].split(' ',1)[1])
#print('Comp : ', component[i], 'Desc : ', desclist[j].split(' ',1)[1])
break
if not found:
print(component[i], ' not found in Description List')
print('Combined List : ', combinedlist)
Output:
R402 , MSG = Desc402
R403 , MSG = Desc403
R404 , MSG = Desc404
R406 not found in Description List
Combined List : ['R402, MSG = Desc402', 'R403, MSG = Desc403', 'R404, MSG = Desc404']
I have changed your description & component lists to cover all scenarios you may face. Also, your description list has extra quotes in each element. You would have to use escape characters if you want to keep these quotes in your list.
In your combined list, if you want to remove everything before the equal to sign (in description list) then use any one of the below (depending on all the elements in your description list).
desclist[j].split('=',1)[1]
desclist[j].rpartition('=')[2]
Try this,
>>> desclist = ['R402 MSG = "4k2 1%"','R403 MSG ="100 1%"','R404 MSG ="4k 1%"', 'R407 MSG ="4k 1%"']
# For test i have added 'R407 MSG ="4k 1%"'
>>> component = ['R402','R403','R404']
Output:
>>> from itertools import chain
>>> new_list = [[desc for desc in desclist if cid in desc] for cid in component]
>>> list(chain(*new_list))
['R402 MSG = "4k2 1%"', 'R403 MSG ="100 1%"', 'R404 MSG ="4k 1%"']
My task is to cout all top senders and top recievers of user's email.
So the plan is to get all user id's, put them in a dictionary, count their amount and print.
I tried this but it doesn't work very well with INBOX label (10 000+ messages):
import base64
import email
import re
import operator
from googleapiclient import errors
from quickstart import service
def find(st):
for i in range(0,len(st)):
tmp = str(st[i])
for j in range(0,len(tmp)):
if tmp[j] == 'T' and tmp[j+1] == 'o' and tmp[j-1] == "'" and tmp[j+2] == "'":
return i
pass
def getTop(n):
try:
if n == 1:
label_ids = "INBOX"
else:
label_ids = "SENT"
user_id = "me"
topers = service.users().labels().get(userId = user_id,id = label_ids).execute()
count = topers['messagesTotal']
print(count)
topers = service.users().messages().list(userId = user_id, labelIds = label_ids).execute()
arrId = []
for i in range(0,count):
arrId.append(topers['messages'][i]['id'])
st = []
for i in range(0,count):
message = service.users().messages().get(userId=user_id,
id=arrId[i],
format = 'metadata').execute()
head = message['payload']['headers']
index = find(head)
obval = head[index]['value']
tmp = str(obval)
tmp =tmp.split('<', 1)[-1]
tmp = tmp.replace('>',"")
st.append(tmp)
cnt = 0
mvalues = {}
for mail in st:
if not mail in mvalues:
mvalues[mail] = 1
else:
mvalues[mail]+= 1
sorted_values = sorted(mvalues.items(),key= operator.itemgetter(1))
ln = len(sorted_values)
for j in range(1,6):
print(sorted_values[-j])
pass
except errors.HttpError as error:
print('An error occurred: %s' % error)
My question is: what is the fastest and the most correct way to get all these user emails?
If I have a lot of messages, using a while and make a request every time is not the best way I guess. I'm trying to figure this out for about 4 days. Help
I am having issues with my below API request to Flickr. My function takes as input a list of 10 photo ids. However when I print the data from my function I am only getting information based on 1 photo ID. Looking at my below function any ideas on what may be causing the contents of only 1 photo ID to print? Any help would be great.
for item in get_flickr_data(word)["photos"]["photo"]:
photo_ids =item["id"].encode('utf-8')
lst_photo_ids.append(photo_ids)
print lst_photo_ids
lst_photo_ids = ['34117701526', '33347528313', '34158745075', '33315997274', '33315996984', '34028007021', '33315995844', '33347512113', '33315784134', '34024299271']
def get_photo_data(lst_photo_ids):
baseurl = "https://api.flickr.com/services/rest/"
params_d = {}
params_d["method"] = "flickr.photos.getInfo"
params_d["format"] = "json"
params_d["photo_id"] = photo_ids
params_d["api_key"] = FLICKR_KEY
unique_identifier = params_unique_combination(baseurl,params_d)
if unique_identifier in CACHE_DICTION:
flickr_data_diction = CACHE_DICTION[unique_identifier]
else:
resp = requests.get(baseurl,params_d)
json_result_text = resp.text[14:-1]
flickr_data_diction = json.loads(json_result_text)
CACHE_DICTION[unique_identifier] = flickr_data_diction
fileref = open(CACHE_FNAME,"w")
fileref.write(json.dumps(CACHE_DICTION))
fileref.close()
return flickr_data_diction
print get_photo_data(photo_ids)
I'm looking to return the n (most likely 10) most recent emails from an email accounts inbox using IMAP.
So far I've cobbled together:
import imaplib
from email.parser import HeaderParser
M = imaplib.IMAP4_SSL('my.server')
user = 'username'
password = 'password'
M.login(user, password)
M.search(None, 'ALL')
for i in range (1,10):
data = M.fetch(i, '(BODY[HEADER])')
header_data = data[1][0][1]
parser = HeaderParser()
msg = parser.parsestr(header_data)
print msg['subject']
This is returning email headers fine, but it seems to be a semi-random collection of emails that it gets, not the 10 most recent.
If it helps, I'm connecting to an Exchange 2010 server. Other approaches also welcome, IMAP just seemed the most appropriate given that I only wanted to read the emails not send any.
The sort command is available, but it is not guaranteed to be supported by the IMAP server. For example, Gmail does not support the SORT command.
To try the sort command, you would replace:
M.search(None, 'ALL')
with
M.sort(search_critera, 'UTF-8', 'ALL')
Then search_criteria would be a string like:
search_criteria = 'DATE' #Ascending, most recent email last
search_criteria = 'REVERSE DATE' #Descending, most recent email first
search_criteria = '[REVERSE] sort-key' #format for sorting
According to RFC5256 these are valid sort-key's:
"ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" / "SUBJECT" / "TO"
Notes:
1. charset is required, try US-ASCII or UTF-8 all others are not required to be supported by the IMAP server
2. search critera is also required. The ALL command is a valid one, but there are many. See more at http://www.networksorcery.com/enp/rfc/rfc3501.txt
The world of IMAP is wild and crazy. Good luck
This is the code to get the emailFrom, emailSubject, emailDate, emailContent etc..
import imaplib, email, os
user = "your#email.com"
password = "pass"
imap_url = "imap.gmail.com"
connection = imaplib.IMAP4_SSL(imap_url)
connection.login(user, password)
result, data = connection.uid('search', None, "ALL")
if result == 'OK':
for num in data[0].split():
result, data = connection.uid('fetch', num, '(RFC822)')
if result == 'OK':
email_message = email.message_from_bytes(data[0][1])
print('From:' + email_message['From'])
print('To:' + email_message['To'])
print('Date:' + email_message['Date'])
print('Subject:' + str(email_message['Subject']))
print('Content:' + str(email_message.get_payload()[0]))
connection.close()
connection.logout()
# get recent one email
from imap_tools import MailBox
with MailBox('imap.mail.com').login('test#mail.com', 'password', 'INBOX') as mailbox:
for msg in mailbox.fetch(limit=1, reverse=True):
print(msg.date_str, msg.subject)
https://github.com/ikvk/imap_tools
this is work for me~
import imaplib
from email.parser import HeaderParser
M = imaplib.IMAP4_SSL('my.server')
user = 'username'
password = 'password'
M.login(user, password)
(retcode, messages) =M.search(None, 'ALL')
news_mail = get_mostnew_email(messages)
for i in news_mail :
data = M.fetch(i, '(BODY[HEADER])')
header_data = data[1][0][1]
parser = HeaderParser()
msg = parser.parsestr(header_data)
print msg['subject']
and this is get the newer email function :
def get_mostnew_email(messages):
"""
Getting in most recent emails using IMAP and Python
:param messages:
:return:
"""
ids = messages[0] # data is a list.
id_list = ids.split() # ids is a space separated string
#latest_ten_email_id = id_list # get all
latest_ten_email_id = id_list[-10:] # get the latest 10
keys = map(int, latest_ten_email_id)
news_keys = sorted(keys, reverse=True)
str_keys = [str(e) for e in news_keys]
return str_keys
Workaround for Gmail. Since the The IMAP.sort('DATE','UTF-8','ALL') does not work for gmail ,we can insert the values and date into a list and sort the list in reverse order of date. Can check for the first n-mails using a counter. This method will take a few minutes longer if there are hundreds of mails.
M.login(user,password)
rv,data= M.search(None,'ALL')
if rv=='OK':
msg_list=[]
for num in date[0].split():
rv,data=M.fetch(num,'(RFC822)')
if rv=='OK':
msg_object={}
msg_object_copy={}
msg=email.message_from_bytes(data[0][1])
msg_date=""
for val in msg['Date'].split(' '):
if(len(val)==1):
val="0"+val
# to pad the single date with 0
msg_date=msg_date+val+" "
msg_date=msg_date[:-1]
# to remove the last space
msg_object['date']= datetime.datetime.strptime(msg_date,"%a, %d %b %Y %H:%M:%S %z")
# to convert string to date time object for sorting the list
msg_object['msg']=msg
msg_object_copy=msg_object.copy()
msg_list.append(msg_object_copy)
msg_list.sort(reverse=True,key=lambda r:r['date'])
# sorts by datetime so latest mails are parsed first
count=0
for msg_obj in msg_list:
count=count+1
if count==n:
break
msg=msg_obj['msg']
# do things with the message
To get the latest mail:
This will return all the mail numbers contained inside the 2nd return value which is a list containing a bytes object:
imap.search(None, "ALL")[1][0]
This will split the bytes object of which the last element can be taken by accessing the negative index:
imap.search(None, "ALL")[1][0].split()[-1]
You may use the mail number to access the corresponding mail.