Issue inserting XML element with ElementTree when characters are too long

Issue inserting XML element with ElementTree when characters are too long - python

I'm having an issue with inserting text into a specific element in an XML tree.
My goal is to take an image, convert it to base64, then insert the base64 string into an element.
Below is my current code:
with open("t.png", "rb") as imageFile:
str = base64.b64encode(imageFile.read())
tree=ElementTree()
tree = ET.parse('image-template.xml')
root = tree.getroot()
for z in root.iter('body'):
z.text=(str)
tree.write('new_branding.xml')
If I insert a variable with a shorter character length the code seems to work properly. When I try inserting the long character length of a base64 string I get the following error:
" "cannot serialize %r (type %s)" % (text, type(text).__name__)"
Is there something I need to add to my for loop to insert longer strings?

As it stands, your str variable is actually of type bytes. You can test this by putting a print(type(str)) statement after you declare str. It will print <class 'bytes'>.
To get it working, you first of all should change the name of your str variable so that you aren't overwriting the built-in str() function.
If you change it to, say, imageString, you can then change the line where you declare it to something like imageString = str(base64.b64encode(imageFile.read())). Note that we're using str() to convert your variable from bytes to a string: that's why you need to change the name.
Anyway, that should work, or at least it does on my end.

Related

How to keep encoded value from Python3 encode() method into a string

I have this code :
toto = 'récépissé.pdf'.encode('utf-8')
print(toto)
=> b'r\xc3\xa9c\xc3\xa9piss\xc3\xa9.pdf'
Of course it return a byte type, but I still want a string with the content of the encoded result.
I want a string like this :
'r\xc3\xa9c\xc3\xa9piss\xc3\xa9.pdf'
If I try to return a string by using a str() or decode(), it revert back to initial value.
PS: the final purpose is to pass string data in header for dropbox api.

Probably this is what you are looking for:
toto = 'récépissé.pdf'.encode('utf-8')
print(str(toto).split("'")[1])

hex header of file, magic numbers, python

I have a program in Python which analyses file headers and decides which file type it is. (https://github.com/LeoGSA/Browser-Cache-Grabber)
The problem is the following:
I read first 24 bytes of a file:
with open (from_folder+"/"+i, "rb") as myfile:
header=str(myfile.read(24))
then I look for pattern in it:
if y[1] in header:
shutil.move (from_folder+"/"+i,to_folder+y[2]+i+y[3])
where y = ['/video', r'\x47\x40\x00', '/video/', '.ts']
y[1] is the pattern and = r'\x47\x40\x00'
the file has it inside, as you can see from the picture below.
the program does NOT find this pattern (r'\x47\x40\x00') in the file header.
so, I tried to print header:
You see? Python sees it as 'G#' instead of '\x47\x40'
and if i search for 'G#'+r'\x00' in header - everything is ok. It finds it.
Question: What am I doing wrong? I want to look for r'\x47\x40\x00' and find it. Not for some strange 'G#'+r'\x00'.
OR
why python sees first two numbers as 'G#' and not as '\x47\x40', though the rest of header it sees in HEX? Is there a way to fix it?

with open (from_folder+"/"+i, "rb") as myfile:
header=myfile.read(24)
header = str(binascii.hexlify(header))[2:-1]
the result I get is:
And I can work with it
4740001b0000b00d0001c100000001efff3690e23dffffff
P.S. But anyway, if anybody will explain what was the problem with 2 first bytes - I would be grateful.

In Python 3 you'll get bytes from a binary read, rather than a string.
No need to convert it to a string by str.
Print will try to convert bytes to something human readable.
If you don't want that, convert your bytes to e.g. hex representations of the integer values of the bytes by:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (''.join ([hex (aByte) for aByte in aBytes]))
Output as redirected from the console:
b'\x00G#\x00\x13\x00\x00\xb0'
0x00x470x400x00x130x00x00xb0
You can't search in aBytes directly with the in operator, since aBytes isn't a string but an array of bytes.
If you want to apply a string search on '\x00\x47\x40', use:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (r'\x'.join ([''] + ['%0.2x'%aByte for aByte in aBytes]))
Which will give you:
b'\x00G#\x00\x13\x00\x00\xb0'
\x00\x47\x40\x00\x13\x00\x00\xb0
So there's a number of separate issues at play here:
print tries to print something human readable, which succeeds only for the first two chars.
You can't directly search for bytearrays in bytearrays with in, so convert them to a string containing fixed length hex representations as substrings, as shown.

Python: String indices must be integers

I get this string from stdin.
{u'trades': [Custom(time=1418854520, sn=47998, timestamp=1418854517,
price=322, amount=0.269664, tid=48106793, type=u'ask',
start=1418847319, end=1418847320), Custom(time=1418854520, sn=47997,
timestamp=1418854517, price=322, amount=0.1, tid=48106794,
type=u'ask', start=1418847319, end=1418847320),
Custom(time=1418854520, sn=47996, timestamp=1418854517, price=321.596,
amount=0.011, tid=48106795, type=u'ask', start=1418847319,
end=1418847320)]}
My program fails when i try to access jsonload["trades"]. If i use jsonload[0] I only receive one character: {.
I checked it isn't a problem from get the text from stdin, but I don't know if it is a problem of format received (because i used Incursion library) or if it is a problem in my python code. I have tried many combinations about json.load/s and json.dump/s but without success.
inputdata = sys.stdin.read()
jsondump = json.dumps(inputdata)
jsonload = json.loads(jsondump)
print jsonload
print type(jsonload) # return me "<type 'unicode'>"
print repr(jsonload) # return me same but with u" ..same string.... "
for row in jsonload["trades"]: # error here: TypeError: string indices must be integers

You read input data into a string. This is then turned into a JSON encoded string by json.dumps. You then turn it back into a plain string using json.loads. You have not interpreted the original data as JSON at any point.
Try just converting the input data from json:
inputdata = sys.stdin.read()
jsonload = json.loads(inputdata)
However this will not work because you have not got valid JSON data in your snippet. It looks like serialized python code. You can check the input data using http://jsonlint.com
The use of u'trades' shows me that you have a unicode python string. The JSON equivalent would be "trades". To convert the python code you can eval it, but this is a dangerous operation if the data comes from an untrusted source.

How to Read Reg_Binary type values in string format from Registry in python

from winreg import *
import binascii
aReg = ConnectRegistry(None,HKEY_CURRENT_USER)
aKey = OpenKey(aReg, r"Software\Microsoft\Windows\CurrentVersion\Explorer\ComDlg32\CIDSizeMRU")
for i in range(1024):
try:
name,value,type = EnumValue(aKey,i)
print(value)
print("\n")
except EnvironmentError:
break
CloseKey(aKey)
OUTPUT is like this. its for only one value:
b'v\x00l\x00c\x00.\x00e\x00x\x00e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0\x00\x00\x00\x85\x00\x00\x00\x10\x03\x00\x00<\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf8\x00\x00\x00\xa4\x00\x00\x00i\x03\x00\x00\x84\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00'
How can i convert it to string or text?

There seems to be some more data in these registry entries, but they begin with the name of a recently used program file. I found a way to extract that first name, but the entries end with some more data, where I don't know what it is.
The characters are separated by null bytes (b'\x00). So we have to extract every second character (I suppose the data is in value):
value[::2]
Then we need to find the first null byte to terminate the string there (otherwise the decoding fails, because of the other data at the end):
value[::2].find(b'\x00')
The found index can be used to get the part before it and then we call the decode() method to make a sting out of the bytes object (decode() defaults to utf-8 encoding):
value[::2][:value[::2].find(b'\x00')].decode()
Using your example it would look like the following. The input is your example OUTPUT. After using value[::2] it look like this:
b'vlc.exe\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0\x00\x85\x00\x10\x00<\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf8\x00\xa4\x00i\x00\x84\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00'
And in the end you will get this:
'vlc.exe'

How do you convert string to bytes

I am using python v2.7.3 and am trying to get a conversion to work but am having some issues.
This is code that works the way I would like it to:
testString = "\x00\x13\xA2\x00\x40\xAA\x15\x47"
print 'Test String:',testString
This produces the following result
TestString: ¢#ªG
Now I load the same string as above along with some other data:
\x00\x13\xA2\x00\x40\xAA\x15\x47123456
into a SQLite3 database and then pull it from the database as such:
cur.execute('select datafield from databasetable')
rows = cur.fetchall()
if len(rows) == 0:
print 'Sorry Found Nothing!'
else:
print row[0][:32]
This however produces the following result:
\x00\x13\xA2\x00\x40\xAA\x15\x47
I can not figure out how to convert the database stored string to the bytes string, if that is what it is, as the first snippet of code does. I actually need it to load into a variable in that format so I can pass it to a function for further processing.
The following I have tried:
print "My Addy:",bytes(row[0][:32])
print '{0}'.format(row[0][:32])
...
They all produce the same results...
Please
First, Can anyone tell me what format the first results are in? I think its bytes format but am not sure.
Second, How can I convert the database stored text into
Any help and I would be eternally grateful.
Thanks in advance,
Ed

The problem is that you're not storing the value in the database properly. You want to store a sequence of bytes, but you're storing an escaped version of those bytes instead.
When entering string literals into a programming language, you can use escape codes in your source code to access non-printing characters. That's what you've done in your first example:
testString = "\x00\x13\xA2\x00\x40\xAA\x15\x47"
print 'Test String:',testString
But this is processing done by the Python interpreter as it's reading through your program and executing it.
Change the database column to a binary blob instead of a string, then go back to the code you're using to store the bytes in SQLite3, and have it store the actual bytes ('ABC', 3 bytes) instead of an escaped string ('\x41\x42\x43', 12 characters).
If you truly need to store the escaped string in SQLite3 and convert it at run-time, you might be able to use ast.literal_eval() to evaluate it as a string literal.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issue inserting XML element with ElementTree when characters are too long - python

Related

How to keep encoded value from Python3 encode() method into a string

hex header of file, magic numbers, python

Python: String indices must be integers

How to Read Reg_Binary type values in string format from Registry in python

How do you convert string to bytes

Categories

Resources