I'm trying to utilize a database from another program in a php based website tool, and apparently the original was built in python and puts some of it's data into a python tuple and serializes it to store it as a blob in the sql table.
I'm not a python programmer so I'm not sure how to even see what is in this blob, but I do know that some of the 'type' indicators for the data field are stored in there and I want to extract them and anything else useful.
Is there any way to 'unserialize' a python tuple in php?
The blob data turned out to be a pickled tuple (part of the reason I despise python - both data types that only python can read! Python programmers: 'standardized conventions? Who needs standardized conventions?!?!')
I came up with a cludgy way to 'unpickle' the data and json serialize it using a command line. To get the binary blob data into the command line, I base64 encode it. It's janky but it works for what I need:
/**
* use a python exec call to 'unpickle' the blob_data
* to get the binary blob into a command line argument, base64 encode it
* to get the data back out of python, json serialize it
* #param string $blob binary blob data
* #return mixed
*/
public static function unpickle($blob) {
$cmd = sprintf("import pickle; import base64; import json; print(json.dumps(pickle.loads(base64.b64decode('%s'))))", base64_encode($blob));
$pcmd = sprintf("python -c \"%s\"", $cmd);
$result = exec($pcmd);
$resdec = json_decode($result);
return $resdec;
}
With a little more playing on this concept, I gave myself a few more alternatives. First is, I took the command line version above and made it into a little more functional python script:
unpickle.py:
#!/usr/bin/env python3
import pickle
import json
import sys
import base64
import select
def isBase64(s):
try:
return s == base64.b64encode(base64.b64decode(s)).decode('ascii')
except Exception:
return False
bblob = None
if (len(sys.argv) > 1) and isBase64(sys.argv[1]):
bblob = base64.b64decode(sys.argv[1])
elif select.select([sys.stdin, ], [], [], 0.0)[0]:
try:
with open(0, 'rb') as f:
bblob = f.read()
except Exception as e:
err_unknown(e)
if bblob != None:
unpik = pickle.loads(bblob)
jsout = json.dumps(unpik)
print(jsout)
This script allows you to either specify the blob data from the pickled tuple 'byte' as a base64 encoded string on the command line, or you can pipe raw blob data into the script. Both variations will output json if the data is valid and formatted properly. (null if not)
You can convert this to a self-contained binary to plop on systems without python using pyinstaller -F if need be. To play with it in the event I am running it on systems with the pyinstaller binary vs one with the python script vs one with just python, I created the following static methods in my laravel model. (I'll eventually move it into a service module)
/**
* call either a pyinstaller binary or python script with raw blob data to be unpickled
*
* #param string $b binary data of blob
* #return false|mixed
*/
public static function unpickle($b)
{
$cmd = base_path(env('UNPICKLE_BINARY', 'bin/unpickle'));
if(!(is_file($cmd) && is_executable($cmd))) { // make sure unpickle cmd exists
// check for UNPICKLE_BINARY with .py after and python binary
$pyExe = env('PYTHON_EXE', '/usr/bin/python');
if (is_file($cmd.".py") && (is_file($pyExe) && is_executable($pyExe))) {
$cmd = sprintf("%s %s.py", $pyExe, $cmd);
} else
return static::unpyckle($b); // try direct python call
}
$descriptorspec = [
["pipe", "r"],
["pipe", "w"],
["pipe", "w"]
];
$cwd = dirname($cmd);
$env = [];
$process = proc_open($cmd, $descriptorspec, $pipes, $cwd, $env);
if (is_resource($process)) {
fwrite($pipes[0], $b);
fclose($pipes[0]);
$output = stream_get_contents($pipes[1]);
fclose($pipes[1]);
$return_value = proc_close($process);
if(static::isJson($output))
return json_decode($output);
else
return false;
}
return false;
}
/**
* use a python exec call to 'unpickle' the blob_data
* to get the binary blob into a command line argument, base64 encode it
* to get the data back out of python, json serialize it
* #param string $blob binary blob data
* #return mixed
*/
public static function unpyckle($blob) {
$pyExe = env('PYTHON_EXE', '/usr/bin/python');
if (!(is_file($pyExe) && is_executable($pyExe)))
throw new Exception('python executable not found!');
$bblob = base64_encode($blob);
$cmd = sprintf("import pickle; import base64; import json; print(json.dumps(pickle.loads(base64.b64decode('%s'))))", $bblob);
$pcmd = sprintf("%s -c \"%s\"", $pyExe, $cmd);
$result = exec($pcmd);
$resdec = json_decode($result);
return $resdec;
}
/**
* try to detect if a string is a json string
*
* #param $str
* #return bool
*/
public static function isJson($str) {
if(is_string($str) && !empty($str)) {
json_decode($str);
return (json_last_error() == JSON_ERROR_NONE);
}
return false;
}
example .env values:
UNPICKLE_BINARY=bin/unpickle
PYTHON_EXE=/usr/bin/python3
basically showing three different ways to call python to do essentially the same thing...
Related
I have been working on a problem for a while now which I cannot seem to resolve so I need some help! The problem is that I am writing a program in C# but I require a function from a Python file I created. This in itself is no problem:
...Usual Stuff
using IronPython.Hosting;
using IronPython.Runtime;
using Microsoft.Scripting;
using Microsoft.Scripting.Hosting;
namespace Program
{
public partial class Form1 : Form
{
Microsoft.Scripting.Hosting.ScriptEngine py;
Microsoft.Scripting.Hosting.ScriptScope s;
public Form1()
{
InitializeComponent();
py = Python.CreateEngine(); // allow us to run ironpython programs
s = py.CreateScope(); // you need this to get the variables
}
private void doPython()
{
//Step 1:
//Creating a new script runtime
var ironPythonRuntime = Python.CreateRuntime();
//Step 2:
//Load the Iron Python file/script into the memory
//Should be resolve at runtime
dynamic loadIPython = ironPythonRuntime.;
//Step 3:
//Invoke the method and print the result
double n = loadIPython.add(100, 200);
numericUpDown1.Value = (decimal)n;
}
}
}
However, this requires for the file 'first.py' to be wherever the program is once compiled. So if I wanted to share my program I would have to send both the executable and the python files which is very inconvenient. One way I thought to resolve this is by adding the 'first.py' file to the resources and running from there... but I don't know how to do this or even if it is possible.
Naturally the above code will not work for this as .UseFile method takes string arguments not byte[]. Does anyone know how I may progress?
Lets start with the simplest thing that could possibly work, you've got some code that looks a little like the following:
// ...
py = Python.CreateEngine(); // allow us to run ironpython programs
s = py.CreateScope(); // you need this to get the variables
var ironPythonRuntime = Python.CreateRuntime();
var x = py.CreateScriptSourceFromFile("SomeCode.py");
x.Execute(s);
var myFoo = s.GetVariable("myFoo");
var n = (double)myFoo.add(100, 200);
// ...
and we'd like to replace the line var x = py.CreateScriptSourceFromFile(... with something else; If we could get the embedded resource as a string, we could use ScriptingEngine.CreateScriptSourceFromString().
Cribbing this fine answer, we can get something that looks a bit like this:
string pySrc;
var resourceName = "ConsoleApplication1.SomeCode.py";
using (var stream = System.Reflection.Assembly.GetExecutingAssembly()
.GetManifestResourceStream(resourceName))
using (var reader = new System.IO.StreamReader(stream))
{
pySrc = reader.ReadToEnd();
}
var x = py.CreateScriptSourceFromString(pySrc);
I got a task to add a digital signature and the signer's public key to an XML file. The XML file in question would look like this:
<data>
<head>
<field1>foo</field2>
<fieldn>bar</fieldn>
</head>
<headSigner>John Doe</headSigner>
<signerPublicKey>dfgdgd...sdfgdgdsg</signerPublicKey>
<headSignature>sdafa...sfsafsasdfsafasd</headSignature>
</data>
Is this common or even feasible? I can write something in Python or Powershell that would:
1) Write the head XML and dump it to a file.
2) Run gpg to sign the file with the --clear-sign flag.
3) Parse the signed file that gpg makes for the signature string.
4) Add that string to the corresponding element in the XML.
Is there are an easier-or standard-way to do this? Maybe a Python or Powershell module that's already set up for that?
Use C# console app to sign the XML file using a certificate. Like this:
using System;
using System.Xml;
using System.Security.Cryptography.X509Certificates;
using System.Security.Cryptography;
using System.Security.Cryptography.Xml ;
var certStore = new X509Store(StoreLocation.CurrentUser);
certStore.Open(OpenFlags.ReadOnly);
// get cert rerquired
var certificateThumbPrint = "33eeededldodoijdlnkddzippy2e";
var certCollection = certStore.Certificates.Find(X509FindType.FindByThumbprint , certificateThumbPrint, true);
var myCert = certCollection[0];
// I can get the correct certificate but the following line throws "Invalid provider type specified." error
var SigningKey = myCert.GetRSAPrivateKey();
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(args[0] );
xmlDoc.PreserveWhitespace = true;
Sign file using this:
private static void SignXml2(XmlDocument xmlDoc, X509Certificate2 cert , string file1)
{
// Create a SignedXml object.
SignedXml signedXml = new SignedXml(xmlDoc);
// Add the key to the SignedXml document.
signedXml.SigningKey = cert.PrivateKey;
// Create a reference to be signed.
Reference reference = new Reference();
reference.Uri = "";
// Add an enveloped transformation to the reference.
var env = new XmlDsigEnvelopedSignatureTransform();
reference.AddTransform(env);
// Include the public key of the certificate in the assertion.
signedXml.KeyInfo = new KeyInfo();
signedXml.KeyInfo.AddClause(new KeyInfoX509Data(cert, X509IncludeOption.WholeChain));
// Add the reference to the SignedXml object.
signedXml.AddReference(reference);
// Compute the signature.
signedXml.ComputeSignature();
// Get the XML representation of the signature and save
// it to an XmlElement object.
XmlElement xmlDigitalSignature = signedXml.GetXml();
// Append the element to the XML document.
xmlDoc.DocumentElement.AppendChild(xmlDoc.ImportNode(xmlDigitalSignature, true));
xmlDoc.PreserveWhitespace = true;
xmlDoc.Save(file1);
}
I have a unity3d application that request a json string of image name including its hash in my django webserver. Then my unity app will check my existing image hash if its the same as the json requested. My problem is that unity hash result is different from my python hash result value. I also tried to hash string on both and it returns the same hash value.
Python Hash:
>>> image_file = open('C:/image.png').read()
>>> hashlib.md5(image_file).hexdigest()
'658e8dc0bf8b9a09b36994abf9242099'
Unity3d Hash:
public static string ComputeHash()
{
// Form hash
System.Security.Cryptography.MD5 h =System.Security.Cryptography.MD5.Create();
var myImage = File.OpenRead(PathByPlatform("image.png"));
var data = h.ComputeHash(myImage );
System.Text.StringBuilder sb = new System.Text.StringBuilder();
for (int i = 0; i < data.Length; ++i)
{
sb.Append(data[i].ToString("x2"));
}
return sb.ToString();
//This fucntion returns
//fac7f19792a696d81be77aca7dd499d0
}
Did you try open('C:/image.png', "rb").read() in order to read the file in binary mode?
Reading files without the "b" will change line ending characters on Windows from CR/LF to LF which has an impact on the hash. (at least for python2)
I have 5gb of data serialized with apache thrift and a .thrift file with the formatting of the data. I have tried using thriftpy and the official thrift package but I can't wrap my head around how to open the files.
The data is the expanded dataset from http://www.iesl.cs.umass.edu/data/wiki-links
A description of the data format can be found here https://code.google.com/p/wiki-link/wiki/ExpandedDataset
The Scala setup is to be found in the ThriftSerializerFactory.scala file. Since the naming of most things is consistent throughout the Thrift libraries, you more or less model your python code after the Scala example:
package edu.umass.cs.iesl.wikilink.expanded.process
import org.apache.thrift.protocol.TBinaryProtocol
import org.apache.thrift.transport.TIOStreamTransport
import java.io.File
import java.io.BufferedOutputStream
import java.io.FileOutputStream
import java.io.BufferedInputStream
import java.io.FileInputStream
import java.util.zip.{GZIPOutputStream, GZIPInputStream}
object ThriftSerializerFactory {
def getWriter(f: File) = {
val stream = new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(f)), 2048)
val protocol= new TBinaryProtocol(new TIOStreamTransport(stream))
(stream, protocol)
}
def getReader(f: File) = {
val stream = new BufferedInputStream(new GZIPInputStream(new FileInputStream(f)), 2048)
val protocol = new TBinaryProtocol(new TIOStreamTransport(stream))
(stream, protocol)
}
}
You basically set up a stream transport and the binary protocol. If you leave the data compressed, you will have to add the gzip piece to the puzzle, but once the data are decompressed this should not be needed anymore.
The code in WikiLinkItemIterator.scala shows how to read the data files using the factory class above.
class PerFileWebpageIterator(f: File) extends Iterator[WikiLinkItem] {
var done = false
val (stream, proto) = ThriftSerializerFactory.getReader(f)
private var _next: Option[WikiLinkItem] = getNext()
private def getNext(): Option[WikiLinkItem] = try {
Some(WikiLinkItem.decode(proto))
} catch {case _: TTransportException => {done = true; stream.close(); None}}
def hasNext(): Boolean = !done && (_next != None || {_next = getNext(); _next != None})
def next(): WikiLinkItem = if (hasNext()) _next match {
case Some(wli) => {_next = None; wli}
case None => {throw new Exception("Next on empty iterator.")}
} else throw new Exception("Next on empty iterator.")
}
Steps to implement:
implement Thrift protocol stack factory like above (recommendable pattern, BTW)
instantiate the root element of each record, in our case a WikiLinkItem
call instance.read(proto) to read one record of data
I'm new to programing in languages more suited to the web, but I have programmed in vba for excel.
What I would like to do is:
pass a list (in python) to a casper.js script.
Inside the casperjs script I would like to iterate over the python object (a list of search terms)
In the casper script I would like to query google for search terms
Once queried I would like to store the results of these queries in an array, which I concatenate together while iterating over the python object.
Then once I have searched for all the search-terms and found results I would like to return the RESULTS array to python, so I can further manipulate the data.
QUESTION --> I'm not sure how to write the python function to pass an object to casper.
QUESTION --> I'm also not sure how to write the casper function to pass an javascript object back to python.
Here is my python code.
import os
import subprocess
scriptType = 'casperScript.js'
APP_ROOT = os.path.dirname(os.path.realpath(__file__))
PHANTOM = '\casperjs\bin\casperjs'
SCRIPT = os.path.join(APP_ROOT, test.js)
params = [PHANTOM, SCRIPT]
subprocess.check_output(params)
js CODE
var casper = require('casper').create();
casper.start('http://google.com/', function() {
this.echo(this.getTitle());
});
casper.run();
Could you use JSON to send the data to the script and then decode it when you get it back?
Python:
json = json.dumps(stuff) //Turn object into string to pass to js
Load a json file into python:
with open(location + '/module_status.json') as data_file:
data = json.load(data_file);
Deserialize a json string to an object in python
Javascript:
arr = JSON.parse(json) //Turn a json string to a js array
json = JSON.stringify(arr) //Turn an array to a json string ready to send to python
You can use two temporary files, one for input and the other for output of the casperjs script. woverton's answer is ok, but lacks a little detail. It is better to explicitly dump your JSON into a file than trying to parse the console messages from casperjs as they can be interleaved with debug strings and such.
In python:
import tempfile
import json
import os
import subprocess
APP_ROOT = os.path.dirname(os.path.realpath(__file__))
PHANTOM = '\casperjs\bin\casperjs'
SCRIPT = os.path.join(APP_ROOT, test.js)
input = tempfile.NamedTemporaryFile(mode="w", delete=False)
output = tempfile.NamedTemporaryFile(mode="r", delete=False)
yourObj = {"someKey": "someData"}
yourJSON = json.dumps(yourObj)
input.file.write(yourJSON)
# you need to close the temporary input and output file because casperjs does operations on them
input.file.close()
input = None
output.file.close()
print "yourJSON", yourJSON
# pass only file names
params = [PHANTOM, SCRIPT, input.name, output.name]
subprocess.check_output(params)
# you need to open the temporary output file again
output = open(output.name, "r")
yourReturnedJSON = json.load(output)
print "returned", yourReturnedJSON
output.close()
output = None
At the end, the temporary files will be automatically deleted when the objects are garbage collected.
In casperjs:
var casper = require('casper').create();
var fs = require("fs");
var input = casper.cli.raw.get(0);
var output = casper.cli.raw.get(1);
input = JSON.parse(fs.read(input));
input.casper = "done"; // add another property
input = JSON.stringify(input);
fs.write(output, input, "w"); // input written to output
casper.exit();
The casperjs script isn't doing anything useful. It just writes the inputfile to the output file with an added property.