Can't decode JSON string coming from Python in PHP - python

I have the following Python code:
array_to_return = dict()
response_json_object = json.loads(responsestring)
for section in response_json_object:
if section["requestMethod"] == "getPlayerResources":
array_to_return["resource_list"] = json.dumps(section["responseData"]["resources"])
break
array_to_return["requests_duration"] = time.time() - requests_start_time
array_to_return["python_duration"] = time.time() - python_start_time
Which returns the following content into a PHP script:
{'resource_list': '{"aaa": 120, "bbb": 20, "ccc": 2138, "ddd": 8}', 'requests_duration': '7.30', 'python_duration': 41.0}
I'm then trying to decode this string and convert it into something usable in PHP. My code if the following:
$cmd = "$python $pyscript";
exec("$cmd", $output);
echo 'output: ';
var_dump($output[0]);
$json_output = json_decode($output[0], true);
echo 'json_output: ';
var_dump($json_output, json_last_error_msg());
$output[0] is a string but json_last_error_msg() returns Syntax Error
I'm well aware that my string is not a valid Json string, but how can I convert it properly (either in Python or in PHP)? I probably do something wrong in my Python script...
UPDATE 1:
I actually found out that responsestring is a valid JSON string (with double quotes) but json.loads switches the double to single quotes; thus response_json_object has single quotes.
If I comment out the line with json.loads, I get an error:
TypeError: 'int' object is not subscriptable
UPDATE 2:
I managed to get around it by removing the associative list in Python, not exactly what I was hoping for but this works for now...
array_to_return = json.dumps(section["responseData"]["resources"])
#No longer using the following
#array_to_return["requests_duration"] = time.time() - requests_start_time
#array_to_return["python_duration"] = time.time() - python_start_time
If a working solution with associative list is suggested, I will accept that one.

The ' character is not a legal character for JSON, it must be a ".
Your json should look like this.
{
"resource_list": "{\"aaa\": 120, \"bbb\": 20, \"ccc\": 2138, \"ddd\": 8}",
"requests_duration": "7.30",
"python_duration": 41.0
}

instead of modifying the individual key, value pairs of array_to_return by json.dumps, you would json.dumps the whole dictionary.
array_to_return = dict()
response_json_object = json.loads(responsestring)
for section in response_json_object:
if section["requestMethod"] == "getPlayerResources":
array_to_return["resource_list"] = json.dumps(section["responseData"]["resources"])
array_to_return["resource_list"] = section["responseData"]["resources"]
break
array_to_return["requests_duration"] = time.time() - requests_start_time
array_to_return["python_duration"] = time.time() - python_start_time
json.dumps(array_to_return)

Related

MD5 Mismatch between Python and PHP

I am trying to compare the MD5 string between PHP and Python, the server we have is working fine with PHP clients, but when we tried to do the same in python, we always get an invalid response from the server.
I have the following piece of code In Python
import hashlib
keyString = '96f6e3a1c4748b81e41ac58dcf6ecfa0'
decodeString = ''
length = len(keyString)
for i in range(0, length, 2):
subString1 = keyString[i:(i + 2)]
decodeString += chr(int(subString1, 16))
print(hashlib.md5(decodeString.encode("utf-8")).hexdigest())
Produces: 5a9536a1490714cb77a02080f902be4c
now, the same concept in PHP:
$serverRandom = "96f6e3a1c4748b81e41ac58dcf6ecfa0";
$length = strlen($serverRandom);
$server_rand_code = '';
for($i = 0; $i < $length; $i += 2)
{
$server_rand_code .= chr(hexdec(substr($serverRandom, $i, 2)));
}
echo 'SERVER CODE: '.md5($server_rand_code).'<br/>';
Produces: b761f889707191e6b96954c0da4800ee
I tried checking the encoding, but no luck, the two MD5 output don't match at all, any help?
Looks like your method of generating the byte string is incorrect, so the input to hashlib.md5 is wrong:
print(decodeString.encode('utf-8'))
# b'\xc2\x96\xc3\xb6\xc3\xa3\xc2\xa1\xc3\x84t\xc2\x8b\xc2\x81\xc3\xa4\x1a\xc3\x85\xc2\x8d\xc3\x8fn\xc3\x8f\xc2\xa0'
The easiest way to interpret the string as a hex string of bytes is to use binascii.unhexlify, or bytes.fromhex:
import binascii
decodeString = binascii.unhexlify(keyString)
decodeString2 = bytes.fromhex(keyString)
print(decodeString)
# b'\x96\xf6\xe3\xa1\xc4t\x8b\x81\xe4\x1a\xc5\x8d\xcfn\xcf\xa0'
print(decodeString == decodeString2)
# True
You can now directly use the resulting bytes object in hashlib.md5:
import hashlib
result = hashlib.md5(decodeString)
print(result.hexdigest())
# 'b761f889707191e6b96954c0da4800ee'

Error in JSON file reading : write() argument must be str, not generator

I'm reading content from a JSON file and appending it to a text file . I'm getting the following error :
' write() argument must be str, not generator ' when I run this code and I'm not able to correct it .
with open('stackExchangeAPI.json','r') as json_file:
tags_list = []
data = json.load(json_file)
for i in range(0,99):
for j in range(0,99):
tags = data[i]["items"][j]["tags"]
with open('tags.txt','a+') as tags_file:
tags_file.seek(0)
d = tags_file.read(100)
if len(d) > 0 :
tags_file.write("\n")
tags_file.write(f'{tags[i]}' for i in range(0,(len(tags)-1)))
The error is from the last line ' tags_file.write(f'......) '
Can someone please help me rectify this ?
You're trying to write the for loop to the file. Try changing the last line to:
[tags_file.write(f'{tags[i]}') for i in range(0,(len(tags)-1))]
As it says, you are trying to write a generator, you must first convert it to a string, probably by using join:
out = ''.join(f'{tags[i]}' for i in range(0,(len(tags)-1)))
tags_file.write(out)

Is there a better way to unpack a binary string in Python

At the moment I have a byte stream of a string that is received by my Python code and must be converted into a string. For now I managed to extract each character, convert them and append them to a string individually. The code looks something like this:
import struct
# The byte stream is received and stored in byte_stream
text = ''
i = 0
while i < len(byte_stream):
text = text + struct.unpack('c', byte_stream[i])[0]
i += 1
print(text)
But that surely cannot be the most efficient way... Is there a more elegant way to do achieve the same result?
From Convert bytes to a Python string:
byte_stream = [112, 52, 52]
''.join(map(chr, bytes))
>> p44

Decoding JSON array with unicode character

I have the following JSON array:
[u'steve#gmail.com']
"u" is apparently the unicode character, and it was automatically created by Python. Now, I want to bring this back into Objective-C and decode it into an array using this:
+(NSMutableArray*)arrayFromJSON:(NSString*)json
{
if(!json) return nil;
NSData *jsonData = [json dataUsingEncoding:NSUTF8StringEncoding];
//I've also tried NSUnicodeStringEncoding here, same thing
NSError *e;
NSMutableArray *result= [NSJSONSerialization JSONObjectWithData:jsonData options:NSJSONReadingMutableContainers error:&e];
if (e != nil) {
NSLog(#"Error:%#", e.description);
return nil;
}
return result;
}
However, I get an error: (Cocoa error 3840.)" (Invalid value around character 1.)
How do I remedy this?
Edit: Here's how I bring the entity from Python back into objective-c:
First I convert the entity to a dictionary:
def to_dict(self):
return dict((p, unicode(getattr(self, p))) for p in self.properties()
if getattr(self, p) is not None)
I add this dictionary to a list, set the value of my responseDict['entityList'] to this list, then self.response.out.write(json.dumps(responseDict))
However the result I get back still has that 'u' character.
[u'steve#gmail.com'] is the decoded python value of the array it is not valid JSON.
The valid JSON string data would be just ["steve#gmail.com"].
Dump the data from python back into a JSON string by doing:
import json
python_data = [u'steve#gmail.com']
json_string = json.dumps(data)
The u prefix on python string literals indicates that those strings are unicode rather than the default encoding in python2.X (ASCII).

Retrieving JSON objects from a text file (using Python)

I have thousands of text files containing multiple JSON objects, but unfortunately there is no delimiter between the objects. Objects are stored as dictionaries and some of their fields are themselves objects. Each object might have a variable number of nested objects. Concretely, an object might look like this:
{field1: {}, field2: "some value", field3: {}, ...}
and hundreds of such objects are concatenated without a delimiter in a text file. This means that I can neither use json.load() nor json.loads().
Any suggestion on how I can solve this problem. Is there a known parser to do this?
This decodes your "list" of JSON Objects from a string:
from json import JSONDecoder
def loads_invalid_obj_list(s):
decoder = JSONDecoder()
s_len = len(s)
objs = []
end = 0
while end != s_len:
obj, end = decoder.raw_decode(s, idx=end)
objs.append(obj)
return objs
The bonus here is that you play nice with the parser. Hence it keeps telling you exactly where it found an error.
Examples
>>> loads_invalid_obj_list('{}{}')
[{}, {}]
>>> loads_invalid_obj_list('{}{\n}{')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "decode.py", line 9, in loads_invalid_obj_list
obj, end = decoder.raw_decode(s, idx=end)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 376, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 2 column 2 (char 5)
Clean Solution (added later)
import json
import re
#shameless copy paste from json/decoder.py
FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)
class ConcatJSONDecoder(json.JSONDecoder):
def decode(self, s, _w=WHITESPACE.match):
s_len = len(s)
objs = []
end = 0
while end != s_len:
obj, end = self.raw_decode(s, idx=_w(s, end).end())
end = _w(s, end).end()
objs.append(obj)
return objs
Examples
>>> print json.loads('{}', cls=ConcatJSONDecoder)
[{}]
>>> print json.load(open('file'), cls=ConcatJSONDecoder)
[{}]
>>> print json.loads('{}{} {', cls=ConcatJSONDecoder)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return cls(encoding=encoding, **kw).decode(s)
File "decode.py", line 15, in decode
obj, end = self.raw_decode(s, idx=_w(s, end).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 376, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 5 (char 5)
Sebastian Blask has the right idea, but there's no reason to use regexes for such a simple change.
objs = json.loads("[%s]"%(open('your_file.name').read().replace('}{', '},{')))
Or, more legibly
raw_objs_string = open('your_file.name').read() #read in raw data
raw_objs_string = raw_objs_string.replace('}{', '},{') #insert a comma between each object
objs_string = '[%s]'%(raw_objs_string) #wrap in a list, to make valid json
objs = json.loads(objs_string) #parse json
How about something like this:
import re
import json
jsonstr = open('test.json').read()
p = re.compile( '}\s*{' )
jsonstr = p.sub( '}\n{', jsonstr )
jsonarr = jsonstr.split( '\n' )
for jsonstr in jsonarr:
jsonobj = json.loads( jsonstr )
print json.dumps( jsonobj )
Solution
As far as I know }{ does not appear in valid JSON, so the following should be perfectly safe when trying to get strings for separate objects that were concatenated (txt is the content of your file). It does not require any import (even of re module) to do that:
retrieved_strings = map(lambda x: '{'+x+'}', txt.strip('{}').split('}{'))
or if you prefer list comprehensions (as David Zwicker mentioned in the comments), you can use it like that:
retrieved_strings = ['{'+x+'}' for x in txt.strip('{}').split('}{'))]
It will result in retrieved_strings being a list of strings, each containing separate JSON object. See proof here: http://ideone.com/Purpb
Example
The following string:
'{field1:"a",field2:"b"}{field1:"c",field2:"d"}{field1:"e",field2:"f"}'
will be turned into:
['{field1:"a",field2:"b"}', '{field1:"c",field2:"d"}', '{field1:"e",field2:"f"}']
as proven in the example I mentioned.
Why don't you load the file as string, replace all }{ with },{ and surround the whole thing with []? Something like:
re.sub('\}\s*?\{', '\}, \{', string_read_from_a_file)
Or simple string replace if you are sure you always have }{ without whitespaces in between.
In case you expect }{ to occur in strings as well, you could also split on }{ and evaluate each fragment with json.load, in case you get an error, the fragment wasn't complete and you have to add the next to the first one and so forth.
import json
file1 = open('filepath', 'r')
data = file1.readlines()
for line in data :
values = json.loads(line)
'''Now you can access all the objects using values.get('key') '''
How about reading through the file incrementing a counter every time a { is found and decrementing it when you come across a }. When your counter reaches 0 you'll know that you've come to the end of the first object so send that through json.load and start counting again. Then just repeat to completion.
Suppose you added a [ to the start of the text in a file, and used a version of json.load() which, when it detected the error of finding a { instead of an expected comma (or hits the end of the file), spit out the just-completed object?
Replace a file with that junk in it:
$ sed -i -e 's;}{;}, {;g' foo
Do it on the fly in Python:
junkJson.replace('}{', '}, {')

Categories

Resources