Facebook JSON badly encoded - python

I downloaded my Facebook messenger data (in your Facebook account, go to settings, then to Your Facebook information, then Download your information, then create a file with at least the Messages box checked) to do some cool statistics
However there is a small problem with encoding. I'm not sure, but it looks like Facebook used bad encoding for this data. When I open it with text editor I see something like this: Rados\u00c5\u0082aw. When I try to open it with python (UTF-8) I get RadosÅ\x82aw. However I should get: Radosław.
My python script:
text = open(os.path.join(subdir, file), encoding='utf-8')
conversations.append(json.load(text))
I tried a few most common encodings. Example data is:
{
"sender_name": "Rados\u00c5\u0082aw",
"timestamp": 1524558089,
"content": "No to trzeba ostatnie treningi zrobi\u00c4\u0087 xD",
"type": "Generic"
}

I can indeed confirm that the Facebook download data is incorrectly encoded; a Mojibake. The original data is UTF-8 encoded but was decoded as Latin-1 instead. I’ll make sure to file a bug report.
What this means is that any non-ASCII character in the string data was encoded twice. First to UTF-8, and then the UTF-8 bytes were encoded again by interpreting them as Latin-1 encoded data (which maps exactly 256 characters to the 256 possible byte values), by using the \uHHHH JSON escape notation (so a literal backslash, a literal lowercase letter u, followed by 4 hex digits, 0-9 and a-f). Because the second step encoded byte values in the range 0-255, this resulted in a series of \u00HH sequences (a literal backslash, a literal lower case letter u, two 0 zero digits and two hex digits).
E.g. the Unicode character U+0142 LATIN SMALL LETTER L WITH STROKE in the name Radosław was encoded to the UTF-8 byte values C5 and 82 (in hex notation), and then encoded again to \u00c5\u0082.
You can repair the damage in two ways:
Decode the data as JSON, then re-encode any string values as Latin-1 binary data, and then decode again as UTF-8:
>>> import json
>>> data = r'"Rados\u00c5\u0082aw"'
>>> json.loads(data).encode('latin1').decode('utf8')
'Radosław'
This would require a full traversal of your data structure to find all those strings, of course.
Load the whole JSON document as binary data, replace all \u00hh JSON sequences with the byte the last two hex digits represent, then decode as JSON:
import re
from functools import partial
fix_mojibake_escapes = partial(
re.compile(rb'\\u00([\da-f]{2})').sub,
lambda m: bytes.fromhex(m[1].decode()),
)
with open(os.path.join(subdir, file), 'rb') as binary_data:
repaired = fix_mojibake_escapes(binary_data.read())
data = json.loads(repaired)
(If you are using Python 3.5 or older, you'll have to decode the repaired bytes object from UTF-8, so use json.loads(repaired.decode())).
From your sample data this produces:
{'content': 'No to trzeba ostatnie treningi zrobić xD',
'sender_name': 'Radosław',
'timestamp': 1524558089,
'type': 'Generic'}
The regular expression matches against all \u00HH sequences in the binary data and replaces those with the bytes they represent, so that the data can be decoded correctly as UTF-8. The second decoding is taken care of by the json.loads() function when given binary data.

Here is a command-line solution with jq and iconv. Tested on Linux.
cat message_1.json | jq . | iconv -f utf8 -t latin1 > m1.json

I would like to extend #Geekmoss' answer with the following recursive code snippet, I used to decode my facebook data.
import json
def parse_obj(obj):
if isinstance(obj, str):
return obj.encode('latin_1').decode('utf-8')
if isinstance(obj, list):
return [parse_obj(o) for o in obj]
if isinstance(obj, dict):
return {key: parse_obj(item) for key, item in obj.items()}
return obj
decoded_data = parse_obj(json.loads(file))
I noticed this works better, because the facebook data you download might contain list of dicts, in which case those dicts would be just returned 'as is' because of the lambda identity function.

My solution for parsing objects use parse_hook callback on load/loads function:
import json
def parse_obj(dct):
for key in dct:
dct[key] = dct[key].encode('latin_1').decode('utf-8')
pass
return dct
data = '{"msg": "Ahoj sv\u00c4\u009bte"}'
# String
json.loads(data)
# Out: {'msg': 'Ahoj svÄ\x9bte'}
json.loads(data, object_hook=parse_obj)
# Out: {'msg': 'Ahoj světe'}
# File
with open('/path/to/file.json') as f:
json.load(f, object_hook=parse_obj)
# Out: {'msg': 'Ahoj světe'}
pass
Update:
Solution for parsing list with strings does not working. So here is updated solution:
import json
def parse_obj(obj):
for key in obj:
if isinstance(obj[key], str):
obj[key] = obj[key].encode('latin_1').decode('utf-8')
elif isinstance(obj[key], list):
obj[key] = list(map(lambda x: x if type(x) != str else x.encode('latin_1').decode('utf-8'), obj[key]))
pass
return obj

Based on #Martijn Pieters solution, I wrote something similar in Java.
public String getMessengerJson(Path path) throws IOException {
String badlyEncoded = Files.readString(path, StandardCharsets.UTF_8);
String unescaped = unescapeMessenger(badlyEncoded);
byte[] bytes = unescaped.getBytes(StandardCharsets.ISO_8859_1);
String fixed = new String(bytes, StandardCharsets.UTF_8);
return fixed;
}
The unescape method is inspired by the org.apache.commons.lang.StringEscapeUtils.
private String unescapeMessenger(String str) {
if (str == null) {
return null;
}
try {
StringWriter writer = new StringWriter(str.length());
unescapeMessenger(writer, str);
return writer.toString();
} catch (IOException ioe) {
// this should never ever happen while writing to a StringWriter
throw new UnhandledException(ioe);
}
}
private void unescapeMessenger(Writer out, String str) throws IOException {
if (out == null) {
throw new IllegalArgumentException("The Writer must not be null");
}
if (str == null) {
return;
}
int sz = str.length();
StrBuilder unicode = new StrBuilder(4);
boolean hadSlash = false;
boolean inUnicode = false;
for (int i = 0; i < sz; i++) {
char ch = str.charAt(i);
if (inUnicode) {
unicode.append(ch);
if (unicode.length() == 4) {
// unicode now contains the four hex digits
// which represents our unicode character
try {
int value = Integer.parseInt(unicode.toString(), 16);
out.write((char) value);
unicode.setLength(0);
inUnicode = false;
hadSlash = false;
} catch (NumberFormatException nfe) {
throw new NestableRuntimeException("Unable to parse unicode value: " + unicode, nfe);
}
}
continue;
}
if (hadSlash) {
hadSlash = false;
if (ch == 'u') {
inUnicode = true;
} else {
out.write("\\");
out.write(ch);
}
continue;
} else if (ch == '\\') {
hadSlash = true;
continue;
}
out.write(ch);
}
if (hadSlash) {
// then we're in the weird case of a \ at the end of the
// string, let's output it anyway.
out.write('\\');
}
}

Facebook programmers seem to have mixed up the concepts of Unicode encoding and escape sequences, probably while implementing their own ad-hoc serializer. Further details in Invalid Unicode encodings in Facebook data exports.
Try this:
import json
import io
class FacebookIO(io.FileIO):
def read(self, size: int = -1) -> bytes:
data: bytes = super(FacebookIO, self).readall()
new_data: bytes = b''
i: int = 0
while i < len(data):
# \u00c4\u0085
# 0123456789ab
if data[i:].startswith(b'\\u00'):
u: int = 0
new_char: bytes = b''
while data[i+u:].startswith(b'\\u00'):
hex = int(bytes([data[i+u+4], data[i+u+5]]), 16)
new_char = b''.join([new_char, bytes([hex])])
u += 6
char : str = new_char.decode('utf-8')
new_chars: bytes = bytes(json.dumps(char).strip('"'), 'ascii')
new_data += new_chars
i += u
else:
new_data = b''.join([new_data, bytes([data[i]])])
i += 1
return new_data
if __name__ == '__main__':
f = FacebookIO('data.json','rb')
d = json.load(f)
print(d)

This is #Geekmoss' answer, but adapted for Python 3:
def parse_facebook_json(json_file_path):
def parse_obj(obj):
for key in obj:
if isinstance(obj[key], str):
obj[key] = obj[key].encode('latin_1').decode('utf-8')
elif isinstance(obj[key], list):
obj[key] = list(map(lambda x: x if type(x) != str else x.encode('latin_1').decode('utf-8'), obj[key]))
pass
return obj
with json_file_path.open('rb') as json_file:
return json.load(json_file, object_hook=parse_obj)
# Usage
parse_facebook_json(Path("/.../message_1.json"))

Extending Martijn solution #1, that I see it can lead towards recursive object processing (It certainly lead me initially):
You can apply this to the whole string of json object, if you don't ensure_ascii
json.dumps(obj, ensure_ascii=False, indent=2).encode('latin-1').decode('utf-8')
then write it to file or something.
PS: This should be comment on #Martijn answer: https://stackoverflow.com/a/50011987/1309932 (but I can't add comments)

This is my approach for Node 17.0.1, based on #hotigeftas recursive code, using the iconv-lite package.
import iconv from 'iconv-lite';
function parseObject(object) {
if (typeof object == 'string') {
return iconv.decode(iconv.encode(object, 'latin1'), 'utf8');;
}
if (typeof object == 'object') {
for (let key in object) {
object[key] = parseObject(object[key]);
}
return object;
}
return object;
}
//usage
let file = JSON.parse(fs.readFileSync(fileName));
file = parseObject(file);

Related

Translating an RSA decrypt function from Golang to Python

I need an equivalent of this golang function in Python :
func RsaDecrypt(ciphertext []byte) ([]byte, error) {
block, _ := pem.Decode(privateKey)
if block == nil {
return nil, errors.New("private key error!")
}
priv, err := x509.ParsePKCS1PrivateKey(block.Bytes)
if err != nil {
return nil, err
}
return rsa.DecryptPKCS1v15(rand.Reader, priv, ciphertext)
}
I'm a python developer, and do not understand how this works. I have created this function, but it is not the same:
import rsa
with open("rsa.key") as f:
priv_key_pkcs1 = f.read()
priv_key = rsa.PrivateKey.load_pkcs1(priv_key_pkcs1)
line = '''
Lyzkh2pqrisgM_p32O6FmA8oDvzaimvrU9zyd0vyW6HBM2BznuHLbAYUMGp5oYgEHCxmZTWDs67Jt5AGulfn-LrcewCQi89wrb00ZvP69YdjwBe-7aoXBG4_zNMZ7ecLgd8WzUqBGGtVvUhCTVSBBi85mNMSCcgYHt__PFefRHZE09nHnEX25w6iR0ZZlQxuESBkuqTcs8qjUhs2Guin1xBMSWRINj4JDdCjIVHV4hdSjrINgFU-VF1sYFRibWcboYlXifROOxCF50MGtIBkcf7dnqsrR8HEXgZLnCyikhhlQAFoh2hsj4lPWNpWum-dBWj-B0b8P-hRmermDzcPqA==
'''
encrypted = line.decode('base64')
decrypted = rsa.decrypt(encrypted, priv_key)
print decrypted
Can anybody can help me to convert golang function in Python? Or give me some info where I was wrong in my actual python code?
You're using the wrong base64 decoder to decode your ciphertext. It's evident from the "-" and "_" characters present in your ciphertext that it was encoded using the URL-safe variant of base64. To decode this you should use the base64 module, e.g.
import base64
line = '''
Lyzkh2pqrisgM_p32O6FmA8oDvzaimvrU9zyd0vyW6HBM2BznuHLbAYUMGp5oYgEHCxmZTWDs67Jt5AGulfn-LrcewCQi89wrb00ZvP69YdjwBe-7aoXBG4_zNMZ7ecLgd8WzUqBGGtVvUhCTVSBBi85mNMSCcgYHt__PFefRHZE09nHnEX25w6iR0ZZlQxuESBkuqTcs8qjUhs2Guin1xBMSWRINj4JDdCjIVHV4hdSjrINgFU-VF1sYFRibWcboYlXifROOxCF50MGtIBkcf7dnqsrR8HEXgZLnCyikhhlQAFoh2hsj4lPWNpWum-dBWj-B0b8P-hRmermDzcPqA==
'''
encrypted = base64.urlsafe_b64decode(line)
print len(encrypted)
print encrypted.encode('hex')

Hex to string, the python way, in powershell

A bit of a weird question perhaps, but I'm trying to replicate a python example where they are creating a HMAC SHA256 hash from a series of parameters.
I've run into a problem where I'm supposed to translate an api key in hex to ascii and use it as secret, but I just can't get the output to be the same as python.
>>> import hmac
>>> import hashlib
>>> apiKey = "76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b"
>>> apiKey.decode('hex')
'v\xb0,\x0cEC\xa8^EU$fiL\xf6w\x93x3\xc9\xcc\xe8\x7f\nb\x82H\xaf-,I['
If I've understood the material online this is supposed to represent the hex string in ascii characters.
Now to the powershell script:
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$hexstring = ""
for($i=0; $i -lt $apikey.Length;$i=$i+2){
$hexelement = [string]$apikey[$i] + [string]$apikey[$i+1]
$hexstring += [CHAR][BYTE][CONVERT]::toint16($hexelement,16)
}
That outputs the following:
v°,♀EC¨^EU$fiLöw?x3ÉÌè⌂
b?H¯-,I[
They are almost the same, but not quite and using them as secret in the HMAC generates different results. Any ideas?
Stating the obvious: The key in this example is no the real key.
Update:
They look more or less the same, but the encoding of the output is different. I also verified the hex to ASCII in multiple online functions and the powershell version seems right.
Does anyone have an idea how to compare the two different outputs?
Update 2:
I converted each character to integer and both Python and Powershell generates the same numbers, aka the content should be the same.
Attaching the scripts
Powershell:
Function generateToken {
Param($apikey, $url, $httpMethod, $queryparameters=$false, $postData=$false)
#$timestamp = [int]((Get-Date -UFormat %s).Replace(",", "."))
$timestamp = "1446128942"
$datastring = $httpMethod + $url
if($queryparameters){ $datastring += $queryparameters }
$datastring += $timestamp
if($postData){ $datastring += $postData }
$hmacsha = New-Object System.Security.Cryptography.HMACSHA256
$apiAscii = HexToString -hexstring $apiKey
$hmacsha.key = [Text.Encoding]::ASCII.GetBytes($apiAscii)
$signature = $hmacsha.ComputeHash([Text.Encoding]::ASCII.GetBytes($datastring))
$signature
}
Function HexToString {
Param($hexstring)
$asciistring = ""
for($i=0; $i -lt $hexstring.Length;$i=$i+2){
$hexelement = [string]$hexstring[$i] + [string]$hexstring[$i+1]
$asciistring += [CHAR][BYTE][CONVERT]::toint16($hexelement,16)
}
$asciistring
}
Function TokenToHex {
Param([array]$Token)
$hexhash = ""
Foreach($element in $Token){
$hexhash += '{0:x}' -f $element
}
$hexhash
}
$apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
$postData = "{param1: 123, param2: 456}"
$token = generateToken -uri $apiEndpoint -httpMethod "GET" -queryparameters $queryParameters, postData=postData, -apiKey $apiKey
TokenToHex -Token $token
Python:
import hashlib
import hmac
import time
try: import simplejson as json
except ImportError: import json
class HMACSample:
def generateSecurityToken(self, url, httpMethod, apiKey, queryParameters=None, postData=None):
#timestamp = str(int(round(time.time()*1000)))
timestamp = "1446128942"
datastring = httpMethod + url
if queryParameters != None : datastring += queryParameters
datastring += timestamp
if postData != None : datastring += postData
token = hmac.new(apiKey.decode('hex'), msg=datastring, digestmod=hashlib.sha256).hexdigest()
return token
if __name__ == '__main__':
apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
apiKey = "76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b";
queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
postData = "{param1: 123, param2: 456}"
tool = HMACSample()
hmac = tool.generateSecurityToken(url=apiEndpoint, httpMethod="GET", queryParameters=queryParameters, postData=postData, apiKey=apiKey)
print json.dumps(hmac, indent=4)
apiKey with "test" instead of the converted hex to ASCII string outputs the same value which made me suspect that the conversion was the problem. Now I'm not sure what to believe anymore.
/Patrik
ASCII encoding support characters from this code point range 0–127. Any character outside this range, encoded with byte 63, which correspond to ?, in case you decode byte array back to string. So, with your code, you ruin your key by applying ASCII encoding to it. But if what you want is a byte array, then why do you do Hex String -> ASCII String -> Byte Array instead of just Hex String -> Byte Array?
Here is PowerShell code, which generate same results, as your Python code:
function GenerateToken {
param($apikey, $url, $httpMethod, $queryparameters, $postData)
$datastring = -join #(
$httpMethod
$url
$queryparameters
#[DateTimeOffset]::Now.ToUnixTimeSeconds()
1446128942
$postData
)
$hmacsha = New-Object System.Security.Cryptography.HMACSHA256
$hmacsha.Key = #($apikey -split '(?<=\G..)(?=.)'|ForEach-Object {[byte]::Parse($_,'HexNumber')})
[BitConverter]::ToString($hmacsha.ComputeHash([Text.Encoding]::UTF8.GetBytes($datastring))).Replace('-','').ToLower()
}
$apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
$postData = "{param1: 123, param2: 456}"
GenerateToken -url $apiEndpoint -httpMethod "GET" -queryparameters $queryParameters -postData $postData -apiKey $apiKey
I also fix some other errors in your PowerShell code. In particular, arguments to GenerateToken function call. Also, I change ASCII to UTF8 for $datastring encoding. UTF8 yields exactly same bytes if all characters are in ASCII range, so it does not matter in you case. But if you want to use characters out of ASCII range in $datastring, than you should choose same encoding, as you use in Python, or you will not get the same results.

How to encode text to base64 in python

I am trying to encode a text string to base64.
i tried doing this :
name = "your name"
print('encoding %s in base64 yields = %s\n'%(name,name.encode('base64','strict')))
But this gives me the following error:
LookupError: 'base64' is not a text encoding; use codecs.encode() to handle arbitrary codecs
How do I go about doing this ? ( using Python 3.4)
Remember to import base64 and that b64encode takes bytes as an argument.
import base64
b = base64.b64encode(bytes('your string', 'utf-8')) # bytes
base64_str = b.decode('utf-8') # convert bytes to string
It turns out that this is important enough to get it's own module...
import base64
base64.b64encode(b'your name') # b'eW91ciBuYW1l'
base64.b64encode('your name'.encode('ascii')) # b'eW91ciBuYW1l'
For py3, base64 encode and decode string:
import base64
def b64e(s):
return base64.b64encode(s.encode()).decode()
def b64d(s):
return base64.b64decode(s).decode()
1) This works without imports in Python 2:
>>>
>>> 'Some text'.encode('base64')
'U29tZSB0ZXh0\n'
>>>
>>> 'U29tZSB0ZXh0\n'.decode('base64')
'Some text'
>>>
>>> 'U29tZSB0ZXh0'.decode('base64')
'Some text'
>>>
(although this doesn't work in Python3 )
2) In Python 3 you'd have to import base64 and do base64.b64decode('...')
- will work in Python 2 too.
To compatibility with both py2 and py3
import six
import base64
def b64encode(source):
if six.PY3:
source = source.encode('utf-8')
content = base64.b64encode(source).decode('utf-8')
It looks it's essential to call decode() function to make use of actual string data even after calling base64.b64decode over base64 encoded string. Because never forget it always return bytes literals.
import base64
conv_bytes = bytes('your string', 'utf-8')
print(conv_bytes) # b'your string'
encoded_str = base64.b64encode(conv_bytes)
print(encoded_str) # b'eW91ciBzdHJpbmc='
print(base64.b64decode(encoded_str)) # b'your string'
print(base64.b64decode(encoded_str).decode()) # your string
Whilst you can of course use the base64 module, you can also to use the codecs module (referred to in your error message) for binary encodings (meaning non-standard & non-text encodings).
For example:
import codecs
my_bytes = b"Hello World!"
codecs.encode(my_bytes, "base64")
codecs.encode(my_bytes, "hex")
codecs.encode(my_bytes, "zip")
codecs.encode(my_bytes, "bz2")
This can come in useful for large data as you can chain them to get compressed and json-serializable values:
my_large_bytes = my_bytes * 10000
codecs.decode(
codecs.encode(
codecs.encode(
my_large_bytes,
"zip"
),
"base64"),
"utf8"
)
Refs:
https://docs.python.org/3/library/codecs.html#binary-transforms
https://docs.python.org/3/library/codecs.html#standard-encodings
https://docs.python.org/3/library/codecs.html#text-encodings
Use the below code:
import base64
#Taking input through the terminal.
welcomeInput= raw_input("Enter 1 to convert String to Base64, 2 to convert Base64 to String: ")
if(int(welcomeInput)==1 or int(welcomeInput)==2):
#Code to Convert String to Base 64.
if int(welcomeInput)==1:
inputString= raw_input("Enter the String to be converted to Base64:")
base64Value = base64.b64encode(inputString.encode())
print "Base64 Value = " + base64Value
#Code to Convert Base 64 to String.
elif int(welcomeInput)==2:
inputString= raw_input("Enter the Base64 value to be converted to String:")
stringValue = base64.b64decode(inputString).decode('utf-8')
print "Base64 Value = " + stringValue
else:
print "Please enter a valid value."
Base64 encoding is a process of converting binary data to an ASCII
string format by converting that binary data into a 6-bit character
representation. The Base64 method of encoding is used when binary
data, such as images or video, is transmitted over systems that are
designed to transmit data in a plain-text (ASCII) format.
Follow this link for further details about understanding and working of base64 encoding.
For those who want to implement base64 encoding from scratch for the sake of understanding, here's the code that encodes the string to base64.
encoder.py
#!/usr/bin/env python3.10
class Base64Encoder:
#base64Encoding maps integer to the encoded text since its a list here the index act as the key
base64Encoding:list = None
#data must be type of str or bytes
def encode(data)->str:
#data = data.encode("UTF-8")
if not isinstance(data, str) and not isinstance(data, bytes):
raise AttributeError(f"Expected {type('')} or {type(b'')} but found {type(data)}")
if isinstance(data, str):
data = data.encode("ascii")
if Base64Encoder.base64Encoding == None:
#construction base64Encoding
Base64Encoder.base64Encoding = list()
#mapping A-Z
for key in range(0, 26):
Base64Encoder.base64Encoding.append(chr(key + 65))
#mapping a-z
for key in range(0, 26):
Base64Encoder.base64Encoding.append(chr(key + 97))
#mapping 0-9
for key in range(0, 10):
Base64Encoder.base64Encoding.append(chr(key + 48))
#mapping +
Base64Encoder.base64Encoding.append('+')
#mapping /
Base64Encoder.base64Encoding.append('/')
if len(data) == 0:
return ""
length=len(data)
bytes_to_append = -(length%3)+(3 if length%3 != 0 else 0)
#print(f"{bytes_to_append=}")
binary_list = []
for s in data:
ascii_value = s
binary = f"{ascii_value:08b}"
#binary = bin(ascii_value)[2:]
#print(s, binary, type(binary))
for bit in binary:
binary_list.append(bit)
length=len(binary_list)
bits_to_append = -(length%6) + (6 if length%6 != 0 else 0)
binary_list.extend([0]*bits_to_append)
#print(f"{binary_list=}")
base64 = []
value = 0
for index, bit in enumerate(reversed(binary_list)):
#print (f"{bit=}")
#converting block of 6 bits to integer value
value += ( 2**(index%6) if bit=='1' else 0)
#print(f"{value=}")
#print(bit, end = '')
if (index+1)%6 == 0:
base64.append(Base64Encoder.base64Encoding[value])
#print(' ', end="")
#resetting value
value = 0
pass
#print()
#padding if there is less bytes and returning the result
return ''.join(reversed(base64))+''.join(['=']*bytes_to_append)
testEncoder.py
#!/usr/bin/env python3.10
from encoder import Base64Encoder
if __name__ == "__main__":
print(Base64Encoder.encode("Hello"))
print(Base64Encoder.encode("1 2 10 13 -7"))
print(Base64Encoder.encode("A"))
with open("image.jpg", "rb") as file_data:
print(Base64Encoder.encode(file_data.read()))
Output:
$ ./testEncoder.py
SGVsbG8=
MSAyIDEwIDEzIC03
QQ==

Can Python read from a Windows Powershell namedpipe?

I have the following named pipe created in Windows Powershell.
# .NET 3.5 is required to use the System.IO.Pipes namespace
[reflection.Assembly]::LoadWithPartialName("system.core") | Out-Null
$pipeName = "pipename"
$pipeDir = [System.IO.Pipes.PipeDirection]::InOut
$pipe = New-Object system.IO.Pipes.NamedPipeServerStream( $pipeName, $pipeDir )
Now, what i need is some Python code snippet to read from the above named pipe created. Can Python do that ?
Thanks in advance !
Courtesy :http://jonathonreinhart.blogspot.com/2012/12/named-pipes-between-c-and-python.html
Here's the C# Code
using System;
using System.IO;
using System.IO.Pipes;
using System.Text;
class PipeServer
{
static void Main()
{
var server = new NamedPipeServerStream("NPtest");
Console.WriteLine("Waiting for connection...");
server.WaitForConnection();
Console.WriteLine("Connected.");
var br = new BinaryReader(server);
var bw = new BinaryWriter(server);
while (true)
{
try
{
var len = (int)br.ReadUInt32(); // Read string length
var str = new string(br.ReadChars(len)); // Read string
Console.WriteLine("Read: \"{0}\"", str);
//str = new string(str.Reverse().ToArray()); // Aravind's edit: since Reverse() is not working, might require some import. Felt it as irrelevant
var buf = Encoding.ASCII.GetBytes(str); // Get ASCII byte array
bw.Write((uint)buf.Length); // Write string length
bw.Write(buf); // Write string
Console.WriteLine("Wrote: \"{0}\"", str);
}
catch (EndOfStreamException)
{
break; // When client disconnects
}
}
}
}
And here's the Python code:
import time
import struct
f = open(r'\\.\pipe\NPtest', 'r+b', 0)
i = 1
while True:
s = 'Message[{0}]'.format(i)
i += 1
f.write(struct.pack('I', len(s)) + s) # Write str length and str
f.seek(0) # EDIT: This is also necessary
print 'Wrote:', s
n = struct.unpack('I', f.read(4))[0] # Read str length
s = f.read(n) # Read str
f.seek(0) # Important!!!
print 'Read:', s
time.sleep(2)
Convert the C# code into a .ps1 file.

Python message partition error

I'm receiving the following message trough TCP:
{"message": "Start", "client": "134.106.74.21", "type": 1009}<EOM>
but when I'm trying to partition that
msg.partition( "<EOM>" )
I'm getting the following array:
('{\x00\x00\x00"\x00\x00\x00m\x00\x00\x00e\x00\x00\x00s\x00\x00\x00s\x00\x00\x00a\x00
\x00\x00g\x00\x00\x00e\x00\x00\x00"\x00\x00\x00:\x00\x00\x00 \x00\x00\x00"\x00\x00\x00#
\x00\x00\x00B\x00\x00\x00E\x00\x00\x00G\x00\x00\x00I\x00\x00\x00N\x00\x00\x00;\x00\x00
\x00A\x00\x00\x00l\x00\x00\x00l\x00\x00\x00;\x00\x00\x000\x00\x00\x00;\x00\x00\x001\x00\x00
\x00;\x00\x00\x000\x00\x00\x00;\x00\x00\x001\x00\x00\x003\x00\x00\x004\x00\x00\x00.\x00\x00
\x001\x00\x00\x000\x00\x00\x006\x00\x00\x00.\x00\x00\x007\x00\x00\x004\x00\x00\x00.\x00\x00
\x001\x00\x00\x002\x00\x00\x005\x00\x00\x00:\x00\x00\x003\x00\x00\x000\x00\x00\x000\x00\x00
\x000\x00\x00\x000\x00\x00\x00;\x00\x00\x00#\x00\x00\x00E\x00\x00\x00N\x00\x00\x00D\x00\x00
\x00"\x00\x00\x00,\x00\x00\x00 \x00\x00\x00"\x00\x00\x00c\x00\x00\x00l\x00\x00\x00i\x00\x00
\x00e\x00\x00\x00n\x00\x00\x00t\x00\x00\x00"\x00\x00\x00:\x00\x00\x00 \x00\x00\x00"\x00
\x00\x001\x00\x00\x003\x00\x00\x004\x00\x00\x00.\x00\x00\x001\x00\x00\x000\x00\x00\x006
\x00\x00\x00.\x00\x00\x007\x00\x00\x004\x00\x00\x00.\x00\x00\x001\x00\x00\x002\x00\x00
\x005\x00\x00\x00"\x00\x00\x00,\x00\x00\x00 \x00\x00\x00"\x00\x00\x00t\x00\x00\x00y\x00
\x00\x00p\x00\x00\x00e\x00\x00\x00"\x00\x00\x00:\x00\x00\x00 \x00\x00\x002\x00\x00\x000
\x00\x00\x000\x00\x00\x005\x00\x00\x00}\x00\x00\x00<\x00\x00\x00E\x00\x00\x00O\x00\x00\x00M
\x00\x00\x00>\x00\x00\x00{"message": "Start", "client": "134.106.74.21", "type": 1009}',
'', '')
Updated
try:
#Check if there are messages, if don't than throwing an exception otherwise continue
ans = self.request.recv( 20480 )
if( ans ):
recv = self.getMessage( recv + ans )
else:
#Master client disconnected
break
except:
...
def getMessage( self, msg ):
print( "masg:" + msg );
aSplit = msg.partition( "<EOM>" )
while( aSplit[ 1 ] == "<EOM>" ):
self.recvMessageHandler( json.loads( aSplit[ 0 ] ) )
#Get the new message id any
msg = aSplit[ 3 ]
aSplit = msg.partition( "<EOM>" )
return msg;
The problem has occurred when I'm trying to add two strings.
recv + ans
If you print msg.encode("hex") then you will likely see that this is exactly what is in the string.
In any case, you may have noticed that every 4th byte of the result is one of the characters that you expected. This suggests that you have a UCS4 Unicode string that you are not handling properly.
Did you receive UCS4 encoded bytes? If so then you should be stuffing them into a unicode string u"".append(stuff). But if you are receiving UCS4-encoded bytes and you have any influence over the sender, you really should get things changed to transmit and receive UTF-8 encoded strings since that is more normal over network connections.
Are you sure that the 5 literal bytes < E O M > are indeed the delimiter that you need to use for partitioning. Or is it supposed to be the single byte ASCII code named EOM? Or is it a UCS4 encoded u"<EOM>" ?

Categories

Resources