Is it possible to minify python code like javascript? - python

Python is a scripting language. It is hard to protect python code from being copied. No 100% protection is required but at least slow down those who have bad intentions. Is it possible to minify/uglify python code the way javascript front-end code is being done today?
EDIT: The python code will be used in Raspberry Pi, not server. On raspberry pi, anyone can take out the SDcard and gain access to the python code.

What about starting off with only distributing the pyc files? These are files created by Python interpreter for performance reasons--their load times are faster than .pys--but to the casual user they are difficult to decipher.
python -m compileall .
Ramp up the security by using Cython to compile your python src. To "cythonize" your code, run Cython + GCC on each module. The init.py files must be left intact to keep module imports working. A silly Hello world example:
$ cython helloworld.py -o helloworld.c
$ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python3.7 -o helloworld.so helloworld.c
YMMV using this approach; I've run into various gotchas using different modules.

I will answer my own question.
I found the following software tools that can do the job. I have not tried them, so I cannot comment on how effective they are. Comments are welcomed on their effectiveness.
https://liftoff.github.io/pyminifier/
https://mnfy.readthedocs.io/en/latest/

Sure, you could uglify it, but given the fact that python relies on indentation for syntax, you couldn't do the equivalent minification (which in JS relies largely upon removing all whitespace).
Beside the point, but JS is minified to make it download faster, not obfuscate it.

python is executed server-side. while sometimes it's fun to intentionally obfuscate code (look into perl obfuscation ;), it should never be necessary for server-side code.
if you're trying to hide your python from someone but they already have access to the directories and files it is stored in, you have bigger problems than code obfuscation.

Nuitka.net is a perfect way to convert your python code to compiled object code. This makes reverse engineering and exposing your algorithms extremely hard. Nuitka can also produce an standalone executable that is very portable.
While this may be a way to preserve trade secrets, it comes with some hard limitations.
a) some python libs are already binary distros which are difficult to bundle in a standalone exe (e.g. xgboost, pytorch).
b) wide pip distribution of a binary package is an exercise in deep frustration because it is linked to the cpython lib. manylinux and universal builds are vast wasteland waiting to be mapped and documented.
As for the downvotes, please consider that 1) not all python runs on servers - some run on the edge, 2) non-open source authors need to protect their intellectual property, 3) smaller always makes for faster installs.

Related

Packing Python with a C++ app that embeds Python using pybind11

I have a C++ app where I'm trying to use pybind11 in order to support a scripting system in my app that would allow users to write their own scripts. I'm attempting to use Python in much the same way that many people use Lua as a scripting language to extend their app's functionality.
There is one big difference I've found in regards to Lua vs Python/pybind11: with Lua I can statically link the scripting language into my executable or even package a single shared library, but with Python/pybind11 it seems that I am reliant on whether or not the end user has Python already installed on their system.
Is that correct?
Is there a way I can tell pybind11 to statically link the Python libraries?
It also seems that pybind11 will search the path for the python executable, and then assume the shared libraries are in the same folder. Is there a way I can distribute the shared libraries and then tell my embedded interpreter to use those shared libraries?
My ultimate goal is that users can install my C++ app on their machine, and my app will have a built-in Python interpreter that can execute the scripts, regardless if Python is actually installed on the machine. Ideally I would do this with static linking, but dynamically linking and distributing the required libraries is also acceptable.
Pybind11 does not search for Python executable or anything like that. You link against libpythonX.Y.so.Z, and it works just like with any other shared library. So you can link against your Python library that you distribute. You only need to make sure your executable will find your library at run time. Again, this is no different from distributing any other shared library. Use rpath, or use LD_LIBRARY_PATH in a wrapper script, or whatever. There is a ton of questions and answers about it right here on SO (first hit).
You can link a static Python library, if you have one that is usable. On my system (Ubuntu 20 x64), trying to link against libpython3.8.a produces a ton of relocation errors. The system library only works when the entire executable is built statically, i.e. with g++ -static ... If you need a dynamic executable, you need to build your own static Python library with -fPIC. For the record, here is the command I used to link against the system library.
g++ -static -pthread -I/usr/include/python3.8 example.cpp -lpython3.8 --ldl -lutil -lexpat -lz
Last but not least, if your users want to write Python scripts, they probably have Python installed.

How to use Nuitka to compile jsonpickle (a Python lib) to C/C++?

I'm using jsonpickle in my program, but found it to be a performance bottleneck. So I'm trying to see if I can compile it to C using Nuitka, then use the C version in my program (through some wrappers perhaps).
But to be honest, I'm new to Nuitka, so I don't even know if this is a legit use case. Can someone give me some hints?
Note: this question is not about how to make a program faster. I'm building a library, not an application, so certain approaches like Pypy won't work. I'm also aware of Cython and is investigating it too, but this question is not about Cython either.
Figured it out myself. Just git clone the repo, go into the directory, and run
python -m nuitka --module jsonpickle --include-package=jsonpickle
Nuitka will generate a .so file, and you can import and use it just like the original package.
This method should be universal and could apply to any package and not just jsonpickle.
Do note that Nuitka is meant to be used in an application, but not good for building a library. The main pain point is that it doesn't support cross-compilation.

Python's freeze.py doesn't install on Windows

I have been looking for the freeze.py utility which is supposed to come bundled with Python 3 in a Python 3.3 Windows install (albeit with distribute and pip installed) and haven't found it. The utility can be downloaded directly out of the Python svn repository here, but I'm wondering: does freeze come with a standard Windows Python 3 install?
It looks like Windows binary installations of Python don't come with the freeze tool. And there's apparently a good reason for this. According to the freeze README in the source tree:
Under Windows 95 or NT, you must use the -p option and point it to the top of the Python source tree.
If you read the whole section, it comes down to this: On Windows, freeze only works if you've built Python from source, and have the resulting tree sitting around to be used for freezing. So, there's no good reason to give you freeze in binary installations.
Meanwhile, I probably should have asked this in the first place, but… are you sure you want freeze in the first place?
The freeze utility is very out of date (you might have guessed that from the README talking about requiring VC++ 5.0, Windows 95 or NT 4.0, etc.). It also never worked that well on Windows (as you can tell from the documentation describing it as a utility "… to compile executables for Unix systems"). And there's just a lot of things it can't handle, or handles badly. At this point should probably be considered more as example code than as a useful tool.
There are a number of third-party alternatives out there: cx_freeze, py2exe, PyInstaller, etc. If you search PyPI for "freeze" (and other terms that seem reasonable), you will find a bunch of these alternatives. If your goal is to create a standalone executable out of your Python script (which, btw, freeze can never do on Windows anyway), experiment with a few of these and pick the one you like best.
If your goal is something different, the right tool will be different—you might be better off using venv or just zipping up a user site-packages directory or creating a local PyPI server.
In the comments, you said:
What I was actually looking for is a tool to convert Python code to C code. Apparently, that's impossible.
It's not impossible, it's just not what freeze (or its successors/competitors) does. Cython compiles almost a strict superset of Python to C code, although it's C code that uses Python runtime objects (except where you explicitly statically declare variables and functions with C types). If C++ is an acceptable alternative to C, Shed Skin compiles a restricted subset of Python 2.6 (using native C++ objects, and using type inference so you don't have to statically declare your types).
The question is why you want to compile Python code to C.
If you're looking to optimize some slow code, Cython is great at speeding up small pieces of bottleneck code. It takes a bit of effort (deciding what to move to Cython, what static type declarations to put in, etc.), but the curve of payoff to effort is pretty solid. Shed Skin takes a lot less effort—if it works, it just speeds up everything, automatically—but it also means you can't write a lot of idiomatic Python code in the first place. But really, before looking at either, you should consider PyPy, a complete implementation of Python 2.7.3 (and hopefully 3.3 soon) in a JIT-compiling interpreter, that often offers similar speedups, with pretty much no tradeoffs at all. Or, alternatively, you may just need to rewrite slow code to take advantage of already-optimized libraries (numpy instead of mapping over lists, itertools instead of explicit loops, lxml instead of html.parse, …).
If you're looking to write Python code that can interact directly with C code, without all the headaches of ctypes (or manually building Python bindings), Cython scores again. Cython code can effectively natively call both Python code and C code, and the compiler makes it all work like magic.
If you're looking to get C code that you can read, maintain, and improve on… there, you're out of luck. And this one may actually be impossible. Idiomatic Python code is just so different from idiomatic C code that it's hard to imagine how you could translate one into the other.
If you're wondering what the underlying problem is:
As far as I can tell, freeze makes a lot of assumptions about how things are laid out. It should be enough to have any Python installation that can build C extension modules and embedding apps, but it's not, because freeze goes under the covers and expects that building to work in specific ways. A standard binary installation on almost every *nix platform ends up looking like what freeze expects,* but a standard binary installation on Windows looks completely different.
It's not impossible to hack things up using Windows symlinks (at least if you have Vista or later and a drive with a modern version of NTFS) to get everything organized the way freeze expects (I found a blog where someone did that with 2.7.1…), but really, I don't think it's worth trying. It will be a lot of work (especially if you're just learning this stuff), and there's no guarantee you won't immediately run into another problem.
* This isn't actually true. On a Mac, both Apple's pre-installed Python and the binary installers at python.org actually give you the files organized as a Mac framework—but they provide a bunch of symlinks that simulate the traditional layout, which is good enough. On most linux distros, and many other platforms, the binary python package doesn't include any of the development files at all—but once you install an add-on binary package named something like python-devel, then you've got the right layout. Anyway, none of this matters to you, because if you wanted to learn about dpkg dependencies or framework builds you wouldn't be using Windows, right?

Why are there no Makefiles for automation in Python projects?

As a long time Python programmer, I wonder, if a central aspect of Python culture eluded me a long time: What do we do instead of Makefiles?
Most ruby-projects I've seen (not just rails) use Rake, shortly after node.js became popular, there was cake. In many other (compiled and non-compiled) languages there are classic Make files.
But in Python, no one seems to need such infrastructure. I randomly picked Python projects on GitHub, and they had no automation, besides the installation, provided by setup.py.
What's the reason behind this?
Is there nothing to automate? Do most programmers prefer to run style checks, tests, etc. manually?
Some examples:
dependencies sets up a virtualenv and installs the dependencies
check calls the pep8 and pylint commandlinetools.
the test task depends on dependencies enables the virtualenv, starts selenium-server for the integration tests, and calls nosetest
the coffeescript task compiles all coffeescripts to minified javascript
the runserver task depends on dependencies and coffeescript
the deploy task depends on check and test and deploys the project.
the docs task calls sphinx with the appropiate arguments
Some of them are just one or two-liners, but IMHO, they add up. Due to the Makefile, I don't have to remember them.
To clarify: I'm not looking for a Python equivalent for Rake. I'm glad with paver. I'm looking for the reasons.
Actually, automation is useful to Python developers too!
Invoke is probably the closest tool to what you have in mind, for automation of common repetitive Python tasks: https://github.com/pyinvoke/invoke
With invoke, you can create a tasks.py like this one (borrowed from the invoke docs)
from invoke import run, task
#task
def clean(docs=False, bytecode=False, extra=''):
patterns = ['build']
if docs:
patterns.append('docs/_build')
if bytecode:
patterns.append('**/*.pyc')
if extra:
patterns.append(extra)
for pattern in patterns:
run("rm -rf %s" % pattern)
#task
def build(docs=False):
run("python setup.py build")
if docs:
run("sphinx-build docs docs/_build")
You can then run the tasks at the command line, for example:
$ invoke clean
$ invoke build --docs
Another option is to simply use a Makefile. For example, a Python project's Makefile could look like this:
docs:
$(MAKE) -C docs clean
$(MAKE) -C docs html
open docs/_build/html/index.html
release: clean
python setup.py sdist upload
sdist: clean
python setup.py sdist
ls -l dist
Setuptools can automate a lot of things, and for things that aren't built-in, it's easily extensible.
To run unittests, you can use the setup.py test command after having added a test_suite argument to the setup() call. (documentation)
Dependencies (even if not available on PyPI) can be handled by adding a install_requires/extras_require/dependency_links argument to the setup() call. (documentation)
To create a .deb package, you can use the stdeb module.
For everything else, you can add custom setup.py commands.
But I agree with S.Lott, most of the tasks you'd wish to automate (except dependencies handling maybe, it's the only one I find really useful) are tasks you don't run everyday, so there wouldn't be any real productivity improvement by automating them.
There is a number of options for automation in Python. I don't think there is a culture against automation, there is just not one dominant way of doing it. The common denominator is distutils.
The one which is closed to your description is buildout. This is mostly used in the Zope/Plone world.
I myself use a combination of the following: Distribute, pip and Fabric. I am mostly developing using Django that has manage.py for automation commands.
It is also being actively worked on in Python 3.3
Any decent test tool has a way of running the entire suite in a single command, and nothing is stopping you from using rake, make, or anything else, really.
There is little reason to invent a new way of doing things when existing methods work perfectly well - why re-invent something just because YOU didn't invent it? (NIH).
The make utility is an optimization tool which reduces the time spent building a software image. The reduction in time is obtained when all of the intermediate materials from a previous build are still available, and only a small change has been made to the inputs (such as source code). In this situation, make is able to perform an "incremental build": rebuild only a subset of the intermediate pieces that are impacted by the change to the inputs.
When a complete build takes place, all that make effectively does is to execute a set of scripting steps. These same steps could just be deposited into a flat script. The -n option of make will in fact print these steps, which makes this possible.
A Makefile isn't "automation"; it's "automation with a view toward optimized incremental rebuilds." Anything scripted with any scripting tool is automation.
So, why would Python project eschew tools like make? Probably because Python projects don't struggle with long build times that they are eager to optimize. And, also, the compilation of a .py to a .pyc file does not have the same web of dependencies like a .c to a .o.
A C source file can #include hundreds of dependent files; a one-character change in any one of these files can mean that the source file must be recompiled. A properly written Makefile will detect when that is or is not the case.
A big C or C++ project without an incremental build system would mean that a developer has to wait hours for an executable image to pop out for testing. Fast, incremental builds are essential.
In the case of Python, probably all you have to worry about is when a .py file is newer than its corresponding .pyc, which can be handled by simple scripting: loop over all the files, and recompile anything newer than its byte code. Moreover, compilation is optional in the first place!
So the reason Python projects tend not to use make is that their need to perform incremental rebuild optimization is low, and they use other tools for automation; tools that are more familiar to Python programmers, like Python itself.
The original PEP where this was raised can be found here. Distutils has become the standard method for distributing and installing Python modules.
Why? It just happens that python is a wonderful language to perform the installation of Python modules with.
Here are few examples of makefile usage with python:
https://blog.horejsek.com/makefile-with-python/
https://krzysztofzuraw.com/blog/2016/makefiles-in-python-projects.html
I think that a most of people is not aware "makefile for python" case. It could be useful, but "sexiness ratio" is too small to propagate rapidly (just my PPOV).
Is there nothing to automate?
Not really. All but two of the examples are one-line commands.
tl;dr Very little of this is really interesting or complex. Very little of this seems to benefit from "automation".
Due to documentation, I don't have to remember the commands to do this.
Do most programmers prefer to run stylechecks, tests, etc. manually?
Yes.
generation documentation,
the docs task calls sphinx with the appropiate arguments
It's one line of code. Automation doesn't help much.
sphinx-build -b html source build/html. That's a script. Written in Python.
We do this rarely. A few times a week. After "significant" changes.
running stylechecks (Pylint, Pyflakes and the pep8-cmdtool).
check calls the pep8 and pylint commandlinetools
We don't do this. We use unit testing instead of pylint.
You could automate that three-step process.
But I can see how SCons or make might help someone here.
tests
There might be space for "automation" here. It's two lines: the non-Django unit tests (python test/main.py) and the Django tests. (manage.py test). Automation could be applied to run both lines.
We do this dozens of times each day. We never knew we needed "automation".
dependecies sets up a virtualenv and installs the dependencies
Done so rarely that a simple list of steps is all that we've ever needed. We track our dependencies very, very carefully, so there are never any surprises.
We don't do this.
the test task depends on dependencies enables the virtualenv, starts selenium-server for the integration tests, and calls nosetest
The start server & run nosetest as a two-step "automation" makes some sense. It saves you from entering the two shell commands to run both steps.
the coffeescript task compiles all coffeescripts to minified javascript
This is something that's very rare for us. I suppose it's a good example of something to be automated. Automating the one-line script could be helpful.
I can see how SCons or make might help someone here.
the runserver task depends on dependencies and coffeescript
Except. The dependencies change so rarely, that this seems like overkill. I supposed it can be a good idea of you're not tracking dependencies well in the first place.
the deploy task depends on check and test and deploys the project.
It's an svn co and python setup.py install on the server, followed by a bunch of customer-specific copies from the subversion area to the customer /www area. That's a script. Written in Python.
It's not a general make or SCons kind of thing. It has only one actor (a sysadmin) and one use case. We wouldn't ever mingle deployment with other development, QA or test tasks.

Disassembling with python - no easy solution?

I'm trying to create a python script that will disassemble a binary (a Windows exe to be precise) and analyze its code.
I need the ability to take a certain buffer, and extract some sort of struct containing information about the instructions in it.
I've worked with libdisasm in C before, and I found it's interface quite intuitive and comfortable.
The problem is, its Python interface is available only through SWIG, and I can't get it to compile properly under Windows.
At the availability aspect, diStorm provides a nice out-of-the-box interface, but it provides only the Mnemonic of each instruction, and not a binary struct with enumerations defining instruction type and what not.
This is quite uncomfortable for my purpose, and will require a lot of what I see as spent time wrapping the interface to make it fit my needs.
I've also looked at BeaEngine, which does in fact provide the output I need, a struct with binary info concerning each instruction, but its interface is really odd and counter-intuitive, and it crashes pretty much instantly when provided with wrong arguments.
The CTypes sort of ultimate-death-to-your-python crashes.
So, I'd be happy to hear about other solutions, which are a little less time consuming than messing around with djgcc or mingw to make SWIGed libdisasm, or writing an OOP wrapper for diStorm.
If anyone has some guidance as to how to compile SWIGed libdisasm, or better yet, a compiled binary (pyd or dll+py), I'd love to hear/have it. :)
Thanks ahead.
Well, after much meddling around, I managed to compile SWIGed libdisasm!
Unfortunately, it seems to crash python on incorrect (and sometimes correct) usage.
How I did it:
I compiled libdisasm.lib using Visual Studio 6, the only thing you need for this is the source code in whichever libdisasm release you use, and stdint.h and inttypes.h (The Visual C++ compatible version, google it).
I SWIGed the given libdisasm_oop.i file with the following command line
swig -python -shadow -o x86disasm_wrap.c -outdir . libdisasm_oop.i
Used Cygwin to run ./configure in the libdisasm root dir. The only real thing you get from this is config.h
I then created a new DLL project, added x86disasm_wrap.c to it, added the c:\PythonXX\libs and c:\PythonXX\Include folders to the corresponding variables, set to Release configuration (important, either this or do #undef _DEBUG before including python.h).
Also, there is a chance you'll need to fix the path to config.h.
Compiled the DLL project, and named the output _x86disasm.dll.
Place that in the same folder as the SWIG generated x86disasm.py and you're done.
Any suggestions for other, less crashy disasm libs for python?
You might try using ctypes to interface directly with libdisasm instead of going through a SWIG layer. It may be take more development time but AFAIK you should be able to access the underlying functionality using ctypes.
I recommend you look at Pym's disassembly library which is also the backend for Pym's online disassembler.
You can use the distorm library: https://code.google.com/p/distorm/
Here's another build: http://breakingcode.wordpress.com/2009/08/31/using-distorm-with-python-2-6-and-python-3-x-revisited/
There's also BeaEngine: http://www.beaengine.org/
Here's a Windows installer for BeaEngine: http://breakingcode.wordpress.com/2012/04/08/quickpost-installer-for-beaenginepython/

Categories

Resources