I would like to port a semi-HPC code scriptable with Python to Xeon Phi, to try out the performance increase; it cannot be run in offload mode (data transfers would be prohibitive), the whole code must be run on the co-processor.
Can someone knowledgeable confirm that it means I will have to "cross-compile" all the libraries (including Python) for the Xeon Phi arch, have those libs mounted over NFS on the Xeon Phi, and then execute it all there?
For cross-compilation: what is the target arch? Of course, for numerics the xeon-phi is a must due to extended intrinsics, but for e.g. Python, would the binaries and libs be binary-compatible with amd64? That would make it much easier, essentially only changing some flags for the number-crunching parts.
UPDATE: For the record, we've had a very bad support from Intel on the forums; realizing poor technical state of the software stack (yocto could not compile and so on), very little documentation and so on, we abandoned this path. Goodbye, Xeon Phi.
Why not first port it from Python (which is bytecode for a virtual machine -- that is a software emulation of a CPU -- then to be translated and executed on a certain 'real' hardware CPU). You could port to C++ or so, which -- when compiled for the target platform -- produces machine code that runs natively on the target. That should improve execution speed, right, so you may not even need a Xeon Phi.
Related
Since the Yocto Linux distribution can run on both machines, I'm assuming it would have no trouble compiling and using any language, which ordinary developers with a Linux system would use. Am I right in making this assumption?
It says, on the Intel page, that compatible languages are:
C/C++, Python, Node.js, HTML5, JavaScript
Shouldn't these languages be compatible on a Linux system? Just install the compiler on Linux and you should be fine, no?
The only explanation that comes to mind is that these languages have libraries specifically written to interact with Arduino hardware.
If this is the case, which languages are strongest in terms of resources, libraries, compatibility, etc.?
Also, please, correct me if I said anything marginally wrong. Thanks for any help, hugely appreciated.
I believe you are referring to the documentation for IoT Developer Kit. The IoT devkit is solution comprised of various hardware and software options to create IoT projects using Intel's maker boards such as Intel Edison and Intel Galileo. It includes a set of I/O and Sensor libraries specifically libmraa and upm currently available for C/C++, Python and JavaScript.
Libmraa provides APIs to interface with the I/O on board. With board detection done at runtime, you can create portable code that works across multiple platforms.
UPM is more of a high-level repository of sensors that uses mraa. You can find code samples for various sensors currently supported which helps in speeding up the development time.
Recently Java is also added to the list of supported languages, you can find samples in the repository.
I am targetting an embedded platform with linux_rt, and would like to compile cpython. I am not asking whether python is appropriate for realtime, or its latency. I AM asking about compiling under platform constraints.
I would like an interpretter embedded in a C shared library, but will also accept an exectuable binary if needs be.
Any C compiling ive done is for mainstream OS deployment, and i usually just hit make install. Im not afraid to get a little dirty, but am afraid of longterm maintenance and repeatability.
To avoid as much memory overhead as possible, are there any compiler configurations that can be changed from defaults? Can I easily strip sections of the standard library I know will not be needed?
Target platform has a 600 MHz Celeron, and 256mb RAM. The required firmware is built for a v2.6 kernel (might be 2.4). The default OS image uses Busybox, and most standard system libraries are minimally available. The root filesystem is around 100mB (flash), although I will have an external memory card mounted and can extended root onto there.
Python should have 70% Cpu and 128mB ram at most times, although I could imagine sloppy execution of the interpretter at times, and on RT linux, that could start to add up. Just trying to take precautions before I dive in.
Looking for simple Do's or Don'ts. Reference to similar projects would be great, but I really want to stick with CPython where possible.
I do not have the target platform in the shop yet, so I cannot post any tests. Will have the unit in 2 weeks and will update this post, at that time, if needed.
make a VM with the target configuration to help you get started. VirtualBox or QEmu. If you don't have a root FS one place to start is TinyCore, which is very small, configurable, but also can run on your laptop -- http://www.linuxjournal.com/article/11023
I am running a program that does simple data processing:
parses text
populates dictionaries
calculates some functions over the resulting data
The program only uses CPU, RAM, and HDD:
run from Windows command line
input/output to the local hard drive
nothing displayed on or printed to screen
no networking
The same program is run on:
desktop: Windows 7, i7-930 CPU overclocked #3.6 GHz (with matching memory speed), Intel X-25M SSD
laptop: Windows XP, Intel Core2 Duo T9300 #2.5GHz, 7200 rpm HDD
The CPU is 1.44 faster frequency, HDD is 4 times higher benchmark score (Passmark - Disk Mark). I found the program runs just around 1.66 times faster on the desktop. So apparently, the CPU is the bottleneck.
It seems there's only 15% benefit from the i7 Core vs Intel Core2 Duo architecture (most of the performance boost is due to the straight CPU frequency). Is there anything I can do in the code to increase the benefit of the new architecture?
EDIT: forgot to mention that I use ActivePython 3.1.2 if that matters.
The increasing performance of hardware brings in most cases automatically results in benefit to user applications. The much maligned "GIL" means that you may not be able to take advantage of multicores with CPython unless you design your program to take advantage via various multiprocessing modules / libraries.
SO discussion on the same : Does python support multiprocessor/multicore programming?
A related collation of solutions on python wiki: http://wiki.python.org/moin/ParallelProcessing
Split your processing into multiple threads. Your particular i7 should be able to support up to 8 threads in parallel.
Consider repeating on regular HDD's - that SSD could well result in a substantial performance difference depending on caches, and the nature of that data.
I wrote a very short program that parses a "program" using python and converts it to assembler, allowing me to compile my little proramming language to an executable.
You can read my blog for more information here http://spiceycurry.blogspot.com/2010/05/simple-compilable-programming-language.html
my question is... Where can I find more kernel commands so that I can further expand on my script in the above blog?
I'd rather recomment using LLVM:
It allows you not to bother with low-level details like register allocations (you provide only SSA form)
It does optimizations for you. It can be faster then the hand-written and well-optimized compiler as the LLVM pipeline in GHC is showing (at the beginning - before much optimalization it had equal or better performance than mature native code generator).
It is cross-platform - you don't tight yourself to for example x86
I'n not quite sure what you mean by 'kernel' commands. If you mean opcodes:
There are Intel manuals
There is Wikipedia page containing all of the documented memnonics (however not the opcodes and not always description)
There is NASM manual
However the ARM or PowerPC have totally different opcodes.
If you mean the operating systen syscalls (system calls) then:
You can just use C library. It is in every operating system and is cross platform.
You can use directly syscalls. However they are harder to use, may be slower (libc may use additional buffering) and are not cross platform (Linux syscalls on x86 - may be not up-to-date).
Is Python generally slower on Windows vs. a *nix machine? Python seems to blaze on my Mac OS X machine whereas it seems to run slower on my Window's Vista machine. The machines are similar in processing power and the vista machine has 1GBs more memory.
I particularly notice this in Mercurial but I figure this may simply be how Mercurial is packaged on windows.
I wanted to follow up on this and I found something that I believe is 'my answer'. It appears that Windows (vista, which is what I notice this on) is not as fast in handling files. This was mentioned by tony-p-lee.
I found this comparisons of Ubuntu vs Vista vs Win7. Their results are interesting and like they say, you need to take the results with a grain of salt. But I think the results lead me to the cause. Python, which I feel was indirectly tested, is about equivalent if not a tad-bit faster on Windows.. See the section "Richards benchmark".
Here is their graph for file transfers:
(source: tuxradar.com)
I think this specifically help address the question because Hg is really just a series of file reads, copies and overall handling. Its likely this is causing the delay.
http://www.tuxradar.com/content/benchmarked-ubuntu-vs-vista-vs-windows-7
No real numbers here but it certainly feels like the start up time is slower on Windows platforms. I regularly switch between Ubuntu at home and Windows 7 at work and it's an order of magnitude faster starting up on Ubuntu, despite my work machine being at least 4x the speed.
As for runtime performance, it feels about the same for "quiet" applications. If there are any GUI operations using Tk on Windows, they are definitely slower. Any console applications on windows are slower, but this is most likely due to the Windows cmd rendering being slow more than python running slowly.
Maybe the python has more depend on a lot of files open (import different modules).
Windows doesn't handle file open as efficiently as Linux.
Or maybe Linux probably have more utilities depend on python and python scripts/modules are more likely to be buffered in the system cache.
I run Python locally on Windows XP and 7 as well as OSX on my Macbook. I've seen no noticable performance differences in the command line interpreter, wx widget apps run the same, and Django apps also perform virtually identically.
One thing I noticed at work was that the Kaspersky virus scanner tended to slow the python interpreter WAY down. It would take 3-5 seconds for the python prompt to properly appear and 7-10 seconds for Django's test server to fully load. Properly disabling its active scanning brought the start up times back to 0 seconds.
With the OS and network libraries, I can confirm slower performance on Windows, at least for versions =< 2.6.
I wrote a CLI podcast-fetcher script which ran great on Ubuntu, but then wouldn't download anything faster than about 80 kB/s (where ~1.6 MB/s is my usual max) on either XP or 7.
I could partially correct this by tweaking the buffer size for download streams, but there was definitely a major bottleneck on Windows, either over the network or IO, that simply wasn't a problem on Linux.
Based on this, it seems that system and OS-interfacing tasks are better optimized for *nixes than they are for Windows.
Interestingly I ran a direct comparison of a popular Python app on a Windows 10 x64 Machine (low powered admittedly) and a Ubuntu 14.04 VM running on the same machine.
I have not tested load speeds etc, but am just looking at processor usage between the two. To make the test fair, both were fresh installs and I duplicated a part of my media library and applied the same config in both scenarios. Each test was run independently.
On Windows Python was using 20% of my processor power and it triggered System Compressed Memory to run up at 40% (this is an old machine with 6GB or RAM).
With the VM on Ubuntu (linked to my windows file system) the processor usage is about 5% with compressed memory down to about 20%.
This is a huge difference. My trigger for running this test was that the app using python was running my CPU up to 100% and failing to operate. I have now been running it in the VM for 2 weeks and my processor usage is down to 65-70% on average. So both on a long and short term test, and taking into account the overhead of running a VM and second operating system, this Python app is significantly faster on Linux. I can also confirm that the Python app responds better, as does everything else on my machine.
Now this could be very application specific, but it is at minimum interesting.
The PC is an old AMD II X2 X265 Processor, 6GB of RAM, SSD HD (which Python ran from but the VM used a regular 5200rpm HD which gets used for a ton of other stuff including recording of 2 CCTV cameras).