json.loads with a unicode string - python

I've got a large json formatted string that I'm trying to convert into a python dictionary, but all of the keys and values are in unicode so they have a leading u in the string. When attempting to use json.loads() it complains that ValueError: Expecting property name: line 1 column 2 (char 1) because of the u.
I have:
x = "{u'abc': [{u'xyz': u'XYZ'}, {u'lmno': u'LMNO'}], u'def': u'DEF'}"
json.loads(x) --> ValueError
I want:
x = "{u'abc': [{u'xyz': u'XYZ'}, {u'lmno': u'LMNO'}], u'def': u'DEF'}"
z = x.strip_unicode()
r = json.loads(z)
# r = {'abc': [{'xyz':'XYZ'}, {'lmno': 'LMNO'}], 'def': 'DEF'}
So is there something like strip_unicode or maybe a different function from json where it can handle the leading u?

Related

Unknown pdf encoding from JSON response

I have an API that returns a pdf from json, but it just returns as a long string of integers like following
[{"status":"SUCCESS"},{"data":"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46,...
...,1,32,49,55,10,47,82,111,111,116,32,56,32,48,32,82,10,47,73,110,102,111,32,49,32,48,32,82,62,62,10,115,116,97,114,116,120,114,101,102,10,54,55,54,56,53,10,37,37,69,79,70"}
My questions are:
What is this encoding?
How to convert this into a pdf using python?
P.S: Here is the endpoint to get the full response.
The beginning of data is a hint that you actually have a list of the bytes values of the PDF file: it starts with the byte values of '%PDF-1.4'.
So you must first extract that curious string:
data = json_data[1]['data']
to have:
"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46, ..."
convert it to a list of int first, then a byte string (i if i >=0 else i+256 ensure positive values...):
intlist = [int(i) for i in data.split(",")]
b = bytes(i if i >=0 else i+256 for i in intlist)
to get b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (11 CS-II Subjective Q...'
And finaly save that to a file:
with open('file.pdf', 'wb') as fd:
fd.write(b)

Python Print Hex variable

I have hex variable that I want to print as hex
data = '\x99\x02'
print (data)
Result is: ™
I want to the python to print 0x9902
Thank you for your help
Please check this one.
data = r'\x99\x02'
a, b = [ x for x in data.split(r'\x') if x]
d = int(a+b, base=16)
print('%#x'%d)
You have to convert every char to its number - ord(char) - and convert every number to hex value - '{:02x}'.format() - and concatenate these values to string. And add string '0x'.
data = '\x99\x02'
print('0x' + ''.join('{:02x}'.format(ord(char)) for char in data))
EDIT: The same but first string is converted to bytes using encode('raw_unicode_escape')
data = '\x99\x02'
print('0x' + ''.join('{:02x}'.format(code) for code in data.encode('raw_unicode_escape')))
and if you have already bytes then you don't have to encode()
data = b'\x99\x02'
print('0x' + ''.join('{:02x}'.format(code) for code in data))
BTW: Similar way you can convert to binary using {:08b}
data = '\x99\x02'
print(''.join('{:08b}'.format(code) for code in data.encode('raw_unicode_escape')))

error in splitting a string using "re.findall"

i need to split a string into three values (x,y,z) the string is something like this (48,25,19)
i used "re.findall" and it works fine but sometimes it produces this error
(plane_X, plane_Y, plane_Z = re.findall("\d+.\d+", planepos)
ValueError: not enough values to unpack (expected 3, got 0))
this is the code:
def read_data():
# reading from file
file = open("D:/Cs/Grad/Tests/airplane test/Reading/Positions/PlanePos.txt", "r")
planepos = file.readline()
file.close()
file = open("D:/Cs/Grad/Tests/airplane test/Reading/Positions/AirportPosition.txt", "r")
airportpos = file.readline()
file.close()
# ==================================================================
# spliting and getting numbers
plane_X, plane_Y, plane_Z = re.findall("\d+\.\d+", planepos)
airport_X, airport_Y, airport_Z = re.findall("\d+\.\d+", airportpos)
return plane_X,plane_Y,plane_Z,airport_X,airport_Y,airport_Z
what i need is to split the string (48,25,19) to x=48,y=25,z=19
so if someone know a better way to do this or how to solve this error will be appreciated.
Your regex only works for numbers with a decimal point and not for integers, hence the error. You can instead strip the string of parentheses and white spaces, then split the string by commas, and map the resulting sequence of strings to the float constructor:
x, y, z = map(float, planepos.strip('() \n').split(','))
You can use ast.literal_eval which safely evaluates your string:
import ast
s = '(48,25,19)'
x, y, z = ast.literal_eval(s)
# x => 48
# y => 25
# z => 19
If your numbers are integers, you can use the regex:
re.findall(r"\d+","(48,25,19)")
['48', '25', '19']
If there are mixed numbers:
re.findall(r"\d+(?:\.\d+)?","(48.2,25,19.1)")
['48.2', '25', '19.1']

base64decode a string like "b'Mw=='" (containing literal b' substring)

I encoded a comma delimited list (ex. "1,2,3") of ids to base64 then the returned data from the form looks like x below.
I tried decoding and encoding and all sorts of things but nothing seems to return a the original string.
x = "b'Mw=='"
base64.b64decode(x)
# b'l\xcc'
x.decode()
# AttributeError: 'str' object has no attribute 'decode'
y = x.encode('utf-8')
print(y)
# b"b'Mw=='"
What am I missing?
If you have b'...' in your data, that's the repr()esentation of a bytestring.
If you can't get your data source to fix their content (it should just be Mw==: what they're giving you isn't valid base64 encoding!), you can use ast.literal_eval() to read it into a bytestring:
>>> import ast, base64
>>> x = "b'Mw=='"
>>> base64.b64decode(ast.literal_eval(x))
'3'

Unpacking a struct ending with an ASCIIZ string

I am trying to use struct.unpack() to take apart a data record that ends with an ASCII string.
The record (it happens to be a TomTom ov2 record) has this format (stored little-endian):
1 byte
4 byte int for total record size (including this field)
4 byte int
4 byte int
variable-length string, null-terminated
unpack() requires that the string's length be included in the format you pass it. I can use the second field and the known size of the rest of the record -- 13 bytes -- to get the string length:
str_len = struct.unpack("<xi", record[:5])[0] - 13
fmt = "<biii{0}s".format(str_len)
then proceed with the full unpacking, but since the string is null-terminated, I really wish unpack() would do it for me. It'd also be nice to have this should I run across a struct that doesn't include its own size.
How can I make that happen?
I made two new functions that should be useable as drop-in replacements for the standard pack and unpack functions. They both support the 'z' character to pack/unpack an ASCIIZ string. There are no restrictions to the location or number of occurrences of the 'z' character in the format string:
import struct
def unpack (format, buffer) :
while True :
pos = format.find ('z')
if pos < 0 :
break
asciiz_start = struct.calcsize (format[:pos])
asciiz_len = buffer[asciiz_start:].find('\0')
format = '%s%dsx%s' % (format[:pos], asciiz_len, format[pos+1:])
return struct.unpack (format, buffer)
def pack (format, *args) :
new_format = ''
arg_number = 0
for c in format :
if c == 'z' :
new_format += '%ds' % (len(args[arg_number])+1)
arg_number += 1
else :
new_format += c
if c in 'cbB?hHiIlLqQfdspP' :
arg_number += 1
return struct.pack (new_format, *args)
Here's an example of how to use them:
>>> from struct_z import pack, unpack
>>> line = pack ('<izizi', 1, 'Hello', 2, ' world!', 3)
>>> print line.encode('hex')
0100000048656c6c6f000200000020776f726c64210003000000
>>> print unpack ('<izizi',line)
(1, 'Hello', 2, ' world!', 3)
>>>
The size-less record is fairly easy to handle, actually, since struct.calcsize() will tell you the length it expects. You can use that and the actual length of the data to construct a new format string for unpack() that includes the correct string length.
This function is just a wrapper for unpack(), allowing a new format character in the last position that will drop the terminal NUL:
import struct
def unpack_with_final_asciiz(fmt, dat):
"""
Unpack binary data, handling a null-terminated string at the end
(and only at the end) automatically.
The first argument, fmt, is a struct.unpack() format string with the
following modfications:
If fmt's last character is 'z', the returned string will drop the NUL.
If it is 's' with no length, the string including NUL will be returned.
If it is 's' with a length, behavior is identical to normal unpack().
"""
# Just pass on if no special behavior is required
if fmt[-1] not in ('z', 's') or (fmt[-1] == 's' and fmt[-2].isdigit()):
return struct.unpack(fmt, dat)
# Use format string to get size of contained string and rest of record
non_str_len = struct.calcsize(fmt[:-1])
str_len = len(dat) - non_str_len
# Set up new format string
# If passed 'z', treat terminating NUL as a "pad byte"
if fmt[-1] == 'z':
str_fmt = "{0}sx".format(str_len - 1)
else:
str_fmt = "{0}s".format(str_len)
new_fmt = fmt[:-1] + str_fmt
return struct.unpack(new_fmt, dat)
>>> dat = b'\x02\x1e\x00\x00\x00z\x8eJ\x00\xb1\x7f\x03\x00Down by the river\x00'
>>> unpack_with_final_asciiz("<biiiz", dat)
(2, 30, 4886138, 229297, b'Down by the river')

Categories

Resources