I have an API that returns a pdf from json, but it just returns as a long string of integers like following
[{"status":"SUCCESS"},{"data":"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46,...
...,1,32,49,55,10,47,82,111,111,116,32,56,32,48,32,82,10,47,73,110,102,111,32,49,32,48,32,82,62,62,10,115,116,97,114,116,120,114,101,102,10,54,55,54,56,53,10,37,37,69,79,70"}
My questions are:
What is this encoding?
How to convert this into a pdf using python?
P.S: Here is the endpoint to get the full response.
The beginning of data is a hint that you actually have a list of the bytes values of the PDF file: it starts with the byte values of '%PDF-1.4'.
So you must first extract that curious string:
data = json_data[1]['data']
to have:
"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46, ..."
convert it to a list of int first, then a byte string (i if i >=0 else i+256 ensure positive values...):
intlist = [int(i) for i in data.split(",")]
b = bytes(i if i >=0 else i+256 for i in intlist)
to get b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (11 CS-II Subjective Q...'
And finaly save that to a file:
with open('file.pdf', 'wb') as fd:
fd.write(b)
I am trying to decode a Base64 encoded byte string to a valid HTTP URL. I have tried appending necessary padding (=). But it still does not seem to work.
I have tried the following code.
import base64
encoded = b"aHR0cHM6Ly9mb3Jtcy5nbGUvWU5ZXQ0d2NRWHVLNnNwdjU="
decoded = base64.b64decode(encoded)
print(decoded)
The string encoded has a missing character as a part of noise. Is there a way to detect that missing character and then perform the decode operation?
So, you have this aHR0cHM6Ly9mb3Jtcy5nbGUvWU5ZXQ0d2NRWHVLNnNwdjU= base64 encoding of an URL with exactly one character missing.
For the missing character, you've 64 choices: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/ (for base64) and 48 possible positions to put the missing character in -a-H-R-0-c-H-M-6-L-y-9-m-b-3-J-t-c-y-5-n-b-G-U-v-W-U-5-Z-X-Q-0-d-2-N-R-W-H-V-L-N-n-N-w-d-j-U-=- (- indicates the possible positions)
So, you've 64 * 48 = 3072 possible encoded strings. Either you can try to generate them by your hand or write some code to do the same.
Once you generate them, you can decode the string to get the URL using some built-in libraries & check whether this URL is valid or not. If you also need to know whether this URL exists or not, you can make an HTTP request to the URL & check the response StatusCode.
Code:
package main
import (
"encoding/base64"
"fmt"
"net/http"
)
func main() {
encodedURL := "aHR0cHM6Ly9mb3Jtcy5nbGUvWU5ZXQ0d2NRWHVLNnNwdjU="
options := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/"
length := len(encodedURL)
for i := 0; i <= length; i++ {
for idx := 0; idx < 64; idx++ {
tempEncoded := encodedURL[:i] + options[idx:idx+1] + encodedURL[i:]
decodedURL, _ := base64.URLEncoding.DecodeString(tempEncoded)
resp, err := http.Get(string(decodedURL))
if err == nil && resp.StatusCode == http.StatusOK {
fmt.Println("this URL is valid & exists: ", string(decodedURL))
}
}
}
}
when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four.
len(encoded) is 47, it should be 48, So append another =
encoded = b"aHR0cHM6Ly9mb3Jtcy5nbGUvWU5ZXQ0d2NRWHVLNnNwdjU=="
print(decoded)
b'https://forms.gle/YNY]\r\x1d\xd8\xd4V\x1dR\xcd\x9c\xdc\x1d\x8d'
I have the following Python code:
array_to_return = dict()
response_json_object = json.loads(responsestring)
for section in response_json_object:
if section["requestMethod"] == "getPlayerResources":
array_to_return["resource_list"] = json.dumps(section["responseData"]["resources"])
break
array_to_return["requests_duration"] = time.time() - requests_start_time
array_to_return["python_duration"] = time.time() - python_start_time
Which returns the following content into a PHP script:
{'resource_list': '{"aaa": 120, "bbb": 20, "ccc": 2138, "ddd": 8}', 'requests_duration': '7.30', 'python_duration': 41.0}
I'm then trying to decode this string and convert it into something usable in PHP. My code if the following:
$cmd = "$python $pyscript";
exec("$cmd", $output);
echo 'output: ';
var_dump($output[0]);
$json_output = json_decode($output[0], true);
echo 'json_output: ';
var_dump($json_output, json_last_error_msg());
$output[0] is a string but json_last_error_msg() returns Syntax Error
I'm well aware that my string is not a valid Json string, but how can I convert it properly (either in Python or in PHP)? I probably do something wrong in my Python script...
UPDATE 1:
I actually found out that responsestring is a valid JSON string (with double quotes) but json.loads switches the double to single quotes; thus response_json_object has single quotes.
If I comment out the line with json.loads, I get an error:
TypeError: 'int' object is not subscriptable
UPDATE 2:
I managed to get around it by removing the associative list in Python, not exactly what I was hoping for but this works for now...
array_to_return = json.dumps(section["responseData"]["resources"])
#No longer using the following
#array_to_return["requests_duration"] = time.time() - requests_start_time
#array_to_return["python_duration"] = time.time() - python_start_time
If a working solution with associative list is suggested, I will accept that one.
The ' character is not a legal character for JSON, it must be a ".
Your json should look like this.
{
"resource_list": "{\"aaa\": 120, \"bbb\": 20, \"ccc\": 2138, \"ddd\": 8}",
"requests_duration": "7.30",
"python_duration": 41.0
}
instead of modifying the individual key, value pairs of array_to_return by json.dumps, you would json.dumps the whole dictionary.
array_to_return = dict()
response_json_object = json.loads(responsestring)
for section in response_json_object:
if section["requestMethod"] == "getPlayerResources":
array_to_return["resource_list"] = json.dumps(section["responseData"]["resources"])
array_to_return["resource_list"] = section["responseData"]["resources"]
break
array_to_return["requests_duration"] = time.time() - requests_start_time
array_to_return["python_duration"] = time.time() - python_start_time
json.dumps(array_to_return)
I have a JSON file from the Facebook's "Download your data" feature and instead of escaping Unicode characters as their codepoint number, it's escaped just as a sequence of UTF-8 bytes.
For example, the letter á (U+00E1) is escaped in the JSON file as \u00c3\u00a1 instead of \u00e1. 0xC3 0xA1 is UTF-8 encoding for U+00E1.
The json library in Python 3 decodes it as á which corresponds to U+00C3 and U+00A1.
Is there a way to parse such a file correctly (so that I get the letter á) in Python?
It seems they encoded their Unicode string into bytes using utf-8 then transformed the bytes into JSON. This is very bad behaviour from them.
Python 3 example:
>>> '\u00c3\u00a1'.encode('latin1').decode('utf-8')
'á'
You need to parse the JSON and walk the entire data to fix it:
def visit_list(l):
return [visit(item) for item in l]
def visit_dict(d):
return {visit(k): visit(v) for k, v in d.items()}
def visit_str(s):
return s.encode('latin1').decode('utf-8')
def visit(node):
funcs = {
list: visit_list,
dict: visit_dict,
str: visit_str,
}
func = funcs.get(type(node))
if func:
return func(node)
else:
return node
incorrect = '{"foo": ["\u00c3\u00a1", 123, true]}'
correct_obj = visit(json.loads(incorrect))
I'm trying to get message digest of a string on IOS. I have tried nv-ios-digest 3rd party Hash lib but still no use.
Below is the function i'm using to get the base64encoded string of a message digest.
-(NSString*) sha1:(NSString*)input //sha1- Digest
{
NSData *data = [input dataUsingEncoding:NSUTF8StringEncoding];
uint8_t digest[CC_SHA1_DIGEST_LENGTH];
CC_SHA1(data.bytes, data.length, digest);
NSMutableString* output = [NSMutableString stringWithCapacity:CC_SHA1_DIGEST_LENGTH * 2];
for(int i = 0; i < CC_SHA1_DIGEST_LENGTH; i++){
[output appendFormat:#"%02x", digest[i]];//digest
}
return [NSString stringWithFormat:#"%#",[[[output description] dataUsingEncoding:NSUTF8StringEncoding]base64EncodedStringWithOptions:0]]; //base64 encoded
}
Here is my sample input string - '530279591878676249714013992002683ec3a85216db22238a12fcf11a07606ecbfb57b5'
When I use this string either in java or python I get same result - '5VNqZRB1JiRUieUj0DufgeUbuHQ='
But in IOS I get 'ZTU1MzZhNjUxMDc1MjYyNDU0ODllNTIzZDAzYjlmODFlNTFiYjg3NA=='
Here is the code I'm using in python:
import hashlib
import base64
def checkForDigestKey(somestring):
msgDigest = hashlib.sha1()
msgDigest.update(somestring)
print base64.b64encode(msgDigest.digest())
Let me know if there is anyway to get the same result for IOS.
You are producing a binary digest in Python, a hexadecimal digest in iOS.
The digests are otherwise equal:
>>> # iOS-produced base64 value
...
>>> 'ZTU1MzZhNjUxMDc1MjYyNDU0ODllNTIzZDAzYjlmODFlNTFiYjg3NA=='.decode('base64')
'e5536a65107526245489e523d03b9f81e51bb874'
>>> # Python-produced base64 value
...
>>> '5VNqZRB1JiRUieUj0DufgeUbuHQ='.decode('base64')
'\xe5Sje\x10u&$T\x89\xe5#\xd0;\x9f\x81\xe5\x1b\xb8t'
>>> from binascii import hexlify
>>> # Python-produced value converted to a hex representation
...
>>> hexlify('5VNqZRB1JiRUieUj0DufgeUbuHQ='.decode('base64'))
'e5536a65107526245489e523d03b9f81e51bb874'
Use base64.b64encode(msgDigest.hexdigest()) in Python to produce the same value, or Base-64 encode the digest bytes instead of hexadecimal characters in iOS.