Python: email get_payload decode fails when hitting equal sign? - python

Running into strangeness with get_payload: it seems to crap out when it sees an equal sign in the message it's decoding. Here's code that displays the error:
import email
data = file('testmessage.txt').read()
msg = email.message_from_string( data )
payload = msg.get_payload(decode=True)
print payload
And here's a sample message: test message.
The message is printed only until the first "=" . The rest is omitted. Anybody know what's going on?
The same script with "decode=False" returns the full message, so it appears the decode is unhappy with the equal sign.
This is under Python 2.5 .

You have a line endings problem. The body of your test message uses bare carriage returns (\r) without newlines (\n). If you fix up the line endings before parsing the email, it all works:
import email, re
data = file('testmessage.txt').read()
data = re.sub(r'\r(?!\n)', '\r\n', data) # Bare \r becomes \r\n
msg = email.message_from_string( data )
payload = msg.get_payload(decode=True)
print payload

Related

How keep line breaks in Request Body which is send to Spring Boot rest controller

I'm quite confused.
I'm sending a String via Http Post request to a Spring Boot rest controller.
The string contains line breaks which are represented by '\n'.
The controller consumes 'plain/text'.
The client is a python programm which reads the the string from the field in a json object which is read from a text file.
The json object in the text file looks like this:
{"content": "Multi \n line \n string"}
However, the new line characters are escaped if they arrives on the rest controller to '\\n' so that they will display as normal '\n' as part of the string.
I tried to send a string from python which is not read from the text file. Instead I provide the string direct in a variable with newlines:
payload = "new \n line \n string"
headers = {'content-type': 'text/plain'}
r = requests.post(url, data=payload, headers=headers)
But '\n' is escaped to '\\n' which makes it to a normal part of the string which display like "new \n line \n string" instead of
"new
line
string".
But if I send the string via Postman like this, the new lines are interpreted correct as '\r\n' in the string that arrives to the controller(text/plain header is set):
So my specific question is, how to send the string I read from the json field from python to the rest controller so that java/spring-boot don't escaped the '\n' character but rather interpret it as new line character.
Is there a way to send the string from python correct(prefered way) or have I to do some conversion/encoding on spring server side?
If I have to do the conversion/encoding on spring server side, how to ensure that If I post with my python client to a server I have no control over, will interpret new line characters correct?
Is there any standart or those it depend on the server side custom conversion/encoding?
The controller looks like this:
#RestController
class SmoeController {
#PostMapping("/some-endpoint", consumes = "text/plain", produces =
"text/plain")
String newEmployee(#RequestBody String text) {
System.out.println(text);
return text;
}
try this String[] lines = payload.split("\r\n");

Python Gmail Api Base64 Decode Strange Chars In Email Body

I'm using the Gmail API to retrieve emails from my inbox:
query = 'to:me after:{}'.format(weekStartDate)
unreadEmailsQuery = service.users().messages().list(userId='me', q=query).execute()
# For Each Email
for message in unreadEmailsQuery['messages']:
result = service.users().messages().get(id=message['id'],userId='me').execute()
email_content = ''
if 'data' in result['payload']['body'].keys():
email_content+= result['payload']['body']['data']
else:
for part in result['payload']['parts']:
email_content = part['body']['data'] + email_content
test = bytes(str(email_content),encoding='utf-8')
print(base64.decodebytes(test))
prints out simple plain text messages correctly:
b'Got another one with me
But prints out html messages like this:
b'<body\x03B\x83B\x83B\x83B\x88\x08\x0f\x1bY]\x18H\x1a\x1d\x1d\x1c\x0bY\\]Z]\x8fH\x90\xdb\
I can see that it's okay until the first > from then on the string gets printed incorrectly and I'm not sure why.
I am trying to extract words out of my email so that I can train a classifier but I am stuck.
Any help would be greatly appreciated.
I needed to use the URl safe base64 decoding.
I managed to get this working by changing the last line:
print(base64.decodebytes(test))
to:
print(base64.urlsafe_b64decode(test))

mail is delivering without subject when calling from function in python

I have a sample code like following:
import smtplib
def send_mail(PASS,FAIL):
me = "XXXX"
you = "YYYY"
print "Start of program"
server = smtplib.SMTP('ZZZ', 25)
total_testcase = "15/12"
print total_testcase
message = """From: From Person <XXXX>
To: To Person <YYYY>
Subject: mail testing
%s
""" %total_testcase
print message
server.sendmail(me, you, message)
send_mail(8,9)
when I am sending the email it is delivering without the subject
But if I use the code instead of a function call - then it is delivering fine with subject. Anything I am missing in a function call. Please suggest.
The issue you're having is with the triple-quoted multi-line string. When you put it in your function, you're indenting all of its lines so that they line up with the rest of the code. However, this results in unnecessary (and inappropriate) spaces at the start of each line of the message after the first.
Leading spaces in the headers of an SMTP message indicate that the previous header should be continued. This means that all of your first three lines are combined into the From header.
You can fix this either by leaving out the leading spaces:
def send_mail(PASS,FAIL):
#...
message = """From: From Person <XXXX>
To: To Person <YYYY>
Subject: mail testing
%s
""" % total_testcase
#...
Or by using \n instead of real newlines in your string:
message = "From: From Person <XXXX>\nTo: To Person <YYYY>\nSubject: mail testing\n\n%s" % total_testcase
Or finally, you could keep the current code for the generation of the message, but strip out the leading whitespace afterwards:
def send_mail(PASS,FAIL):
#...
message = """From: From Person <XXXX>
To: To Person <YYYY>
Subject: mail testing
%s
""" % total_testcase
message = "\n".join(line if not line.startswith(" ") else line[4:]
for line in message.splitlines())
#...
This last option is a bit fragile, as it may strip out desired whitespace from lines in your total_testcase string (if it had multiple lines), not only the spaces added due to the multi-line string. It also will break if you're using tabs for indentation, or really anything other than four spaces. I'm not sure I'd actually recommend this approach.
A better version of the last approach is to use the textwrap.dedent function from the the standard library. It removes any indentation that is present at the start of every line in a string (but only the indentation that is common to all lines). This does require a small change to how you were creating message, as you need the first line to have the same leading spaces as all the rest (you'll also need to avoid adding any newlines without indentation in the extra text that comes from total_testcase).
Here's the code:
import textwrap
def send_mail(PASS,FAIL):
#...
# backslash after the quotes on the first line avoids a empty line at the start
message = """\
From: From Person <XXXX>
To: To Person <YYYY>
Subject: mail testing
%s
""" % total_testcase
message = textwrap.dedent(message)
#...

curl post request failing in the presence of special characters

Ok, I know there are too many questions on this topic already; reading every one of those hasn't helped me solve my problem.
I have " hello'© " on my webpage. My objective is to get this content as json, strip the "hello" and write back the remaining contents ,i.e, "'©" back on the page.
I am using a CURL POST request to write back to the webpage. My code for getting the json is as follows:
request = urllib2.Request("http://XXXXXXXX.json")
user = 'xxx'
base64string = base64.encodestring('%s:%s' % (xxx, xxx))
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request) #send URL request
newjson = json.loads(result.read().decode('utf-8'))
At this point, my newres is unicode string. I discovered that my curl post request works only with percentage-encoding (like "%A3" for £).
What is the best way to do this? The code I wrote is as follows:
encode_dict = {'!':'%21',
'"':'%22',
'#':'%24',
'$':'%25',
'&':'%26',
'*':'%2A',
'+':'%2B',
'#':'%40',
'^':'%5E',
'`':'%60',
'©':'\xa9',
'®':'%AE',
'™':'%99',
'£':'%A3'
}
for letter in text1:
print (letter)
for keyz, valz in encode_dict.iteritems():
if letter == keyz:
print(text1.replace(letter, valz))
path = "xxxx"
subprocess.Popen(['curl','-u', 'xxx:xxx', 'Content-Type: text/html','-X','POST','--data',"text="+text1, ""+path])
This code gives me an error saying " UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if letter == keyz:"
Is there a better way to do this?
The problem was with the encoding. json.loads() returns a stream of bytes and needs to be decoded to unicode, using the decode() fucntion. Then, I replaced all non-ascii characters by encoding the unicode with ascii encoding using encode('ascii','xmlcharrefreplace').
newjson = json.loads(result.read().decode('utf-8').encode("ascii","xmlcharrefreplace"))
Also, learning unicode basics helped me a great deal! This is an excellent tutorial.

Microsoft Translator gives HTTP Error 400 only with certain strings

I am translating a large number of strings and using urllib2 to send requests to the API. My program will run fine, but I always get HTTP Error 400 when translating certain strings in particular. Each request I make is exactly the same, except for the text parameter, so I think it must be the text somehow causing the request to be malformed. Here are two strings that I know of that always cause this error:
#monaeltahawy hatla2eh fel jym bytmrn wla 3arf en fe 7aga bt7sl :d
and
#yaratambash ta3aly a3zemek 3l fetar
for example.
I know for certain that it isn't the "#" character causing the error, or the fact that "#" is at the front of the string. The API has processed strings with these attributes just fine before.
It also is not the nonsense words in these strings causing issues, because the API has processed nonsense words fine before as well. It just returns the same string that I sent to it.
Here is the code where the error seems to be coming from:
tweet = tweet.encode("utf-8")
to = "en"
translate_params = { 'text' : tweet, 'to' : to }
request = urllib2.Request('http://api.microsofttranslator.com/v2/Http.svc/Translate?' + urllib.urlencode(translate_params))
request.add_header('Authorization', 'Bearer '+ self.access_token)
response = urllib2.urlopen(request)
# Removes XML tags to return only the translated text
response_text = ET.fromstring(response.read())
response_text = ET.tostring(response_text, encoding = 'utf8', method = 'text')
return response_text
I am running Python 2.7 in Eclipse 4.3.2.
Any insight or suggestions would be very much appreciated.

Categories

Resources