Compiling Python extensions with different Visual Studio version - python

According to the Python documentation, when compiling a Python extension on Windows, "you should use the same version of VC++ that was used to build Python itself". The explanation usually given is that the mismatch in VC runtime version will cause problems. However, it is possible to compile extensions using newer Visual Studio versions that appear to work just fine.
What are the cases where the different runtimes would cause problems? The most information I've seen on this topics was this thread on the python-dev mailing list. Is there a (hopefully small) set of use cases that lead to problematic behavior, or is it just a matter of luck that I haven't run into any trouble yet?

That mailing thread is the most comprehensive list that I've seen of cases where mismatched C runtimes causes problems. The general problem is each runtime doesn't share anything with the other runtime, each has its own separate state, and anything they expose externally can't be shared between the runtimes by your own code. The former problem means that each runtime has its own errno and the second the means that you can't use FILE *' objects opened with one runtime with the file I/O functions of the other.
Enumerating all the possible problems would mean enumerating the entire visible state (including indirectly visible state) of the of the runtimes, and then enumerating every value they can generate and receive that might be incompatible.
Somewhat offsetting this though is Microsoft's promise that object files (.OBJ) compiled with one version of the Microsoft C/C++ compiler should be compatible with subsequent versions of the compiler. This means for example, two different runtimes won't use a completely different set of values for errno (eg. ENOENT is always 2) because those values will appear as constants in object files.

Related

PyInstaller ImportError DLL not found when testing EXE on other computer

I built an EXE file from a Python script using PyInstaller, using
pyinstaller --onefile myscript.py
Packages I used:
pandas, numpy, imutils, opencv, logging, os, random, json, string, csv, datetime, uuid
The EXE runs fine on my PC. However, when I try it on another PC I get the error shown in this screenshot: https://www.screencast.com/t/msZrURL4v
Any idea what the problem is?
The error you post just says "I was looking for one specific DLL and did not find it".
Rather than installing other packages and extensions that might, or might not, be or somehow contain the right DLL, you now need to determine exactly what it is that isn't to be found.
I can suggest three complementary methods, none absolutely certain to pinpoint the exact problem (of course the voodoo method of "install some package at random and see whether it fixes it" might also work, and often does -- but that's magic, not computer science):
the quickest: check the pyimod03_importers.py file at line 714, see what it was doing when the exception was thrown. Due to Windows' library loading strategies, you might be handed a red herring, with a file reported not to be there when it actually is, because it relies on a second missing file whose name you won't be told.
the easiest: use a tool like SysInternals' DEPENDS.EXE to inspect the OMR.EXE file. This is almost guaranteed not to work in this case, because the needed imports might be specified in Python format, not in any form that DEPENDS.EXE will recognize.
the most comprehensive, but least easy: use a tool like SysInternals' PROCMON, set up the filters to exclude the background noise of Windows' idle state - there will be an awful lot of that - and then fake running OMR.EXE; exclude the additional noise generated by that. You'll need about fortyish filters to be set up. Finally run OMR.EXE. Near the end, you will see a series of attempt to load SOMETHING.DLL, all failed; the first is where the DLL is supposed to be (by either Python or OMR), the others are all suitable alternatives.
Then:
if the DLL is one of yours, find out how to pack them with the EXE bundle.
if it is not, you need to reliably assess where it can be found.
It might well be that the suggestion you were given - install MSVC redistributable that-version-or-other - was absolutely correct. Libraries with names like MSVCnn... belong to that package. MSO... files usually belong to Microsoft Office redistributables. MSJET... files are found in several Microsoft package, for example the .NET redistributable.
otherwise, Google and possibly MSDN Search Engine are your friends.
From past experience, I suggest setting up a virtual machine for testing, then seeing what packages are needed. This is because the first DLL crash will hide any subsequent ones, and you might need to repeat the above steps several times. The fact that the first library you need is supplied by the NETFX64 package and the second by the Microsoft Office runtime might be true, but when you find out that the second library is needed, you might also find out that the MSO runtime would have supplied the first also; so at that point, and not before, you discover that the NETFX64 package wasn't really needed, and can simplify your installation requirements to the MSO runtime alone.
Boiling down the requirements to a short list might be a lengthy task and you will want to restart the machine from scratch more than once. With a VM, that is easy to do.
(I've kept referring to the MSO runtime because I figure that your program will process a checkbox answers module, and will likey need or believe it needs some scanner recognition features, which the MSO runtime supplies. If that is so, they'll probably come last).

How to change the stack size of subprocess in Python

Here's my somewhat complicated setup that has just begun to cause a StackOverflow exception a couple of days ago:
On my windows-based continuous integration platform I have got a Jenkins job that starts a Python script.
This Python script runs a cmake command, an msbuild call and then executes the newly compiled gtest-based test framwork.
The msbuild produces a dll and the gtest executable. The executable itself then loads the dll in order to test it.
A couple of days ago I made some changes in the source code of the dll that alter the memory footprint of some of my structures (basically just array lengths). It's plain C code. Now some of the tests exit with a stack-overflow exception.
I admit I'm putting some data structures on the stack that don't necessarily have to be there but it's the best I've got for information hiding in C (better than using static global variables).
if(myCondition)
{
int hugeBuffer[20000];
...
}
Apart from that there is no recursion or anything fancy going on that could be a legit source of trouble. Large chunks of data are always passed by reference (pointer).
Anyway, the stack overflow exception doesn't occur on my local machine running the gtest executable directly from Visual Studio unless I significantly reduce the reserved stack memory in the linker settings.
Then in debug mode I clearly run into a point where the stack just overflows at the beginning of a function.
Unfortunately I couldn't find any way of debugging how full the stack is. In VS I've only got the call stack window which doesn't show the current "fill level" of the stack.
So although you guys might kill mir for this I'm guessing I really just don't have enough stack memory available when running the Jenkins job.
So I'm wondering what step actually defines the amount of stack memory available for my DLL code. It's clearly less than the default 10MB I have in VisualStudio on my local machine.
In the msbuild step there is no STACK parameter used for the linker so I'm guessing the exe header should contain the same value as in Visual Studio (10MB?).
The Python script runs a subprocess.call which could ignore the value set by the linker and overwrite it. I could neither find any information on that nor on how to change the stack memory allocated. I don't even know whether it spawns a thread or a process which may also affect the stack size.
The DLL loading mechanism in windows is also somewhat mysterious to me but I'm guessing the dll uses the same stack as the executable using it. I'm using the LoadLibrary() macro from WinBase.h.
by sheer luck I found out that although the same CMakeLists.txt is used for creating the projects (locally and on Jenkins) the resulting projects generated from them differ.
My local projects (and the GUI-VisualStudio solution I manually created on the Server for error finding) had 10MB of stack reserved whereas the python-based call to CMake was only using the default value of 1MB which is not enough for the project.
This strange behavior of Cmake may be compensated for by adding this line to CMakeLists.txt:
# Set linker to use 10MB of stack memory for all gtest executables, the default of 1MB is not enough!
SET( CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} /STACK:\"10000000\"")
sorry to bother you :)
I'm glad it's over

When using the Python Interpreter, is the compiler used at all?

In Google's Python Class it reads
Python is a dynamic, interpreted (bytecode-compiled) language
I know what an interpreter is and know what bytecode is but the two together seem not to fit. After doing some reading it became a bit clearer that basically Python source code is automatically compiled before it is interpreted; but some new questions emerged.
When using the Python Interpreter does no compilation happen? If it does, when? For example if you're just typing code at the command line and it gets run each time you hit enter, when does a compiler have the opportunity to do its work?
Also in the linked to question above, #delnan gives a pretty broad definition of a compiler
A compiler is, more generally, a program that converts a program in
one programming language into a program in another programming
language...JIT compilers compile to native machine code at runtime
I guess my question is: what's the difference between an interpreter and automatic compiler? To refine the question a bit, if Python is compiled, why not compile all the way to machine code (or assembly, since I know writing compilers that can produce pure machine code is difficult)?
Perhaps it is best to forget semantics and just try to learn what Cpython is actually doing. When you invoke the Cpython binary, it does a number of things. Generally speaking, you can expect it to translate the code you've written into a sequence of bytecode instructions. This is the "compiling" stage that people will sometimes reference for python code. These are a more compact and efficient way to tell the interpreter what to do than your hand-written code. Frequently, python will cache these files for reuse in .pyc files (only re-generating if the associated .py file is newer). You can think of python bytecode as the set of instructions that the python virtual machine can run -- In a lot of ways, it's not really all that different than what you get for Java. When people speak of compiled languages (e.g. C), the compiler's job is to translate your code into a set of instructions that will run directly on your computer's hardware. Languages like Cpython and Java have an extra level of indirection (e.g. the Virtual Machine). The Virtual Machine runs directly on the computer's hardware and is responsible for interpreting the domain specific language.
Compared to standard "compiled" languages (e.g. C, Fortran), this stage is really light-weight -- and python doesn't do a lot of the checking that "traditional" compilers will do (e.g. typechecking). It pretty much only checks the syntax and does a few very simple optimizations using the peephole optimizer.

Python portability issues

Basically, I am a Java programmer who wants to learn Python language. I want to clarify why some of python libaries are distributing using non-portable manner.
Let me explain my thoughts. If someone creates a regular library using Java he prepares 1 (one) JAR file which can be used on different platforms:
my-great-lib-1.2.4.jar
I can use this lib (the same file) on any version of Windows or Linux.
In contrast to Java, python libraries may look like this:
bsdiff4-1.1.4.win-amd64-py2.5.exe
bsdiff4-1.1.4.win-amd64-py2.6.exe
bsdiff4-1.1.4.win-amd64-py2.7.exe
bsdiff4-1.1.4.win-amd64-py3.2.exe
bsdiff4-1.1.4.win-amd64-py3.3.exe
bsdiff4-1.1.4.win32-py2.5.exe
bsdiff4-1.1.4.win32-py2.6.exe
bsdiff4-1.1.4.win32-py2.7.exe
bsdiff4-1.1.4.win32-py3.2.exe
bsdiff4-1.1.4.win32-py3.3.exe
See full list on page.
It looks very strange for me. Even 32bit and 64bit platforms require different installers. Installers! Why do I need an installer in order to use one library? Moreover, outlined installers are only for Windows. Each of them is bind to particular python version. Where is portability?
Could anyone explain a necessity of 10 different files above?
In general, Python libraries are portable across platforms. Problems appear between different major Python versions (3 introduced some big changes from 2, but 2.7 is backwards compatible with 2.6) or when you use C code for optimizing CPU intensive code. On Linux, compiling it yourself is not a problem, when you call pip install package, it will do it for you. The problem is on Windows, where it is much more difficult to compile a C program, especially because not everybody has a compiler. So, for Windows, packages that need something in C, you usually get an installer.
Also, installers are used because they set up everything nicely, look in the registry for the appropriate place to put everything, offer a standard way to uninstall them (the ones from Chrisopther Goelke's site can be removed using Add/Remove programs in Control Panel) and because that's the standard on Windows: most of the programs on Windows are installed via an exe, because it doesn't have a standard and widespread package manager.
All these libraries are then portable: you can use them from any platform, but installing them is what differs.
There are many complications. In Java where your code and then byte-code is interpreted by JVM, the inherent computer architecture do not play lot of role as long as your code is interpreted well by JVM. In fact, that is one of the primary reason Java got so popular because your code should only worry about rightly compiled by JVM.
However, in Python situation is different. I am trying to summarize some of the reason which I think is important in following lines:
The language itself is evolving (although it is long in the scenario if you think!) and changes are happening inside the language. New features are added and sometime, even some remodeling of language is done ( Python 2.x to Python 3.x)
Python relies heavily on its C extensions and so does the applications written in Python. If you write a python program and have some CPU intensive code, you can choose to write it in C. This also adds in the necessity of creating number of libraries for various distribution.
For one python versions jump around. In python 3, the syntax of some builtins completely changed. For example:
raw_input()
changed to:
input()
also, a lot of the standard library has changed even in the alpha of 3.4. As for the 32/64 bit question, I cannot fully answer. I know that certain platforms have trouble when trying to run 32/64, and that may be the point there.

Compiling Python to native code? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Is it feasible to compile Python to machine code?
Is it possible to compile Python code (plus its dependencies, plus the interpreter library) into a single, native Windows executable (with nothing else bundled along with it) from a Python file? (Kind of like how the GNU compiler for Java compiles Java into a native (humongous) executable, which contains everything in true machine code.)
If so, how would I go about doing this?
(Specifically, py2exe does not do what I want -- it includes the libraries inside a separate ZIP file, and it includes the interpreter as a separate DLL.)
Note 1:
To emphasize, I'm not asking for a "self-extracting archive", an "executable packer", or some other way of 'cheating' by bundling the files inside an exe -- I'm looking for something that genuinely converts Python into a native executable, like what GCJ does for Java.
Note 2:
Only if the above isn't possible:
Is it possible to at least generate a single executable from a Python code containing the interpreter bundled along with all the library dependencies, such that the resulting executable does not need to self-extract onto the target disk before running?
In this scenario, the 'compilation' requirement is relaxed: it doesn't matter if the code is actually compiled into machine code (it could simply be embedded as a text resource into the target executable), but the result must nevertheless be a single exe file [and nothing else] that can run standalone, specifically without needing to unpack/install anything onto the target disk before running.
Shed Skin can compile Python to C++, but only a restricted subset of it. Some aspects of Python are very difficult to compile to native code.
The short answer is no, and that is going to go for almost any language: any program you write is going to depend on some external libraries even if just the Windows system DLLs.
If you wrote a C program and compiled it with Microsoft's compiler you would still need the C runtime libraries to be installed. Chances are they already will be on most systems but it isn't guaranteed. Likewise even if you managed to compile a C Python interpreter statically linked to its libraries you still have to get the C runtime from somewhere.
What I suspect you are really asking is whether you can compile to a single .exe that depends only on libraries which you have a reasonable expectation of already being installed. So it all depends on what you are willing to consider part of the base system? Can you assume .Net framework 4 or Silverlight are installed? If so you might want to look at IronPython.
Likewise pypy can be built with either the Visual Studio toolchain or MinGW but I'm pretty sure in both cases you'll still need some external libraries at runtime.

Categories

Resources