Does Intel vs. AMD matter for running python? - python

I do a lot of coding in Python (Anaconda install v. 3.6). I don't compile anything, I just run machine learning models (mainly sci-kit and tensor flow) Are there any issues with running these on an workstation with AMD chipset? I've only used Intel before and want to make sure I don't buy wrong.
If it matters it is the AMD Ryzen 7-1700 processor.

Are you asking about compatibility or performance?
Both AMD and Intel market CPU products compatible with x86(_64) architecture and are functionally compatible with all software written for it. That is, they will run it with high probability (there always may be issues when changing hardware, even while staying with the same vendor, as there are too many variables to account).
Both Intel and AMD offer a huge number of products with widely varying level of marketed performance. Performance of any application is determined not only by a chosen vendor of a central processor, but by a huge number of other factors, such as amount and speed of memory, disk, and not the least the architecture of the application itself. In the end, it is only real-world measurements that decide, but some estimations can be made by looking at relevant benchmarks and understanding underlying principles of computer performance.

Related

Automatic GPU offloading in python

I have written a piece of scientific code in python, mainly using the numpy library (especially Fast Fourier Transforms), and a bit of Cython. Nothing in CUDA or anything GPU related that I am aware of. There is no graphic interface, everything runs in the terminal (I'm using WSL2 on Windows). The whole code is mostly about number crunching, nothing fancy at all.
When I run my program, I see that CPU usage is ~ 100% (to be expected of course), but GPU usage also rises, to around 5%.
Is it possible that a part of the work gets offloaded automatically to the GPU? How else can I explain this small but consistent increase in GPU usage ?
Thanks for the help
No, there is no automatic offloading in Numpy, at least not with the standard Numpy implementation. Note that some specific FFT libraries can use the GPU, but the standard implementation of Numpy uses its own implementation of FFT called PocketFFT based on FFTPack that do not use the GPU. Cython do not perform any automatic implicit GPU offloading. The code need to do that explicitly/manually.
No GPU offloading are automatically performed because GPUs are not faster than CPUs for all tasks and offloading data to the GPU is expensive, especially with small arrays (due to the relatively high-latency of the PCI bus and kernel calls in such a case). Moreover, this is hard to do efficiently even in case where the GPUs could be theoretically faster.
The 5% GPU usage is relative to the frequency of the GPU which is often configured to use an adaptative frequency. For example my discrete Nv-1660S GPU frequency is currently 300 MHz while it can automatically reach 1.785 GHz. Using actually 5% of a GPU running at a 17% of its maximum frequency with a 2D rendering of a terminal is totally possible. On my machine, printing lines in a for loop at 10 FPS in a Windows terminal takes 6% of my GPU still running at low-frequency (0-1% without running anything).
If you want to check the frequency of your GPU and the load there are plenty of tools for that starting from vendor tools often installed with the driver to softwares like GPU-z on Windows. For Nvidia GPU, you can list the processes currently using you GPU with nvidia-smi (it should be rocm-smi on AMD GPUs).

Macbook m1 and python libraries [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Is new macbook m1 suitable for Data Science?
Do Data Science python libraries such as pandas, numpy, sklearn etc work on the macbook m1 (Apple Silicon) chip and how fast compared to the previous generation intel based macbooks?
This GitHub repository has lots of useful information about the Apple M1 chip and data science in Python https://github.com/neurolabusc/AppleSiliconForNeuroimaging. I have included representative quotes below. This repository focuses on software for brain imaging analysis, but the takeaways are broad.
Updated on 27 September 2021.
TL;DR
Unless you are a developer, I would strongly discourage scientists from purchasing an Apple Silicon computer in the short term. Productive work will require core tools to be ported. In the longer term, this architecture could have a profound impact on science. In particular if Apple develops servers that exploit the remarkable power efficiency of their CPUs (competing with AWS Graviton) and leverage the Metal language and GPUs for compute tasks (competing with NVidia's Tesla products and CUDA language).
Limitations facing Apple Silicon
The infrastructure scientists depend on is not yet available for this architecture. Here are some of the short term limitations:
Native R can use the unstable R-devel 4.1. However, RStudio will require gdb.
Julia does not yet natively support Apple Silicon.
Python natively supports Apple Silicon. However, some modules have issues or are slow. See the NiBabel section below.
Scientific modules of Python, R, and Julia require a Fortran compiler, which is currently only available in experimental form.
While Apple's C Clang compiler generates fast native code, many scientific tools will need to wait until gcc and gFortran compilers are available.
Tools like VirtualBox, VMware Fusion, Boot Camp and Parallels do not yet support Apple Silicon. Many users rely on these tools for using Windows and Linux programs on their macOS computers.
Docker can support Apple Silicon. However, attempts to run Intel-based containers on Apple Silicon machines can crash as QEMU sometimes fails to run the container. These containers are popular with many neuroimaging tools.
Homebrew 3.0 supports Apple Silicon. However, many homebrew components do not support Apple Silicon.
MATLAB is used by many scientific tools, including SPM. While Matlab works in translation, it is not yet available natively (and mex files will need to be recompiled).
FSL and AFNI do not yet natively support this architecture. While code may work in translation, creating some native tools must wait for compilers and libraries to be updated. This will likely require months.
The current generation M1 only has four high performance cores. Most neuroimaging pipelines combine sequential tasks that only require a single core (where the M1 excels) as well as parallel tasks. Those parallel tasks could exploit a CPU with more cores (as shown in the pigz and niimath tests below). Bear in mind that this mixture of serial and parallel code faces Amdahls law, with diminishing returns for extra cores.
The current generation M1 has a maximum of 16 Gb of RAM. Neuroimaging datasets often have large memory demands (especially multi-band accelerated functional, resting-state and diffusion datasets).
In general, the M1 and Intel-based Macs have identical OpenGL compatibility, with the M1 providing better performance than previous integrated solutions. However, there are corner cases that may break OpenGL tools. Here I describe four limitations. First, OpenGL geometry shaders are not supported (there is no Metal equivalent). Second, the new retina displays support wide color with 16 bitsPerSample that can cause issues for code that assumes 32-bit RGBA textures (such as the text in this Apple example code). Third, textures can be handled differently. Fourth, use of the GL_INT_2_10_10_10_REV data type will cripple performance (tested on macOS 11.2). This is unfortunate, as Apple advocated for this datatype once upon a time. In this case, code msut be changed to use the less compact GL_HALF_FLOAT which is natively supported by the M1 GPU. This impacts neuroimaging scientists visualizing DTI tractography where GPU resources can be overwhelmed.

Running Python on Xeon Phi

I would like to port a semi-HPC code scriptable with Python to Xeon Phi, to try out the performance increase; it cannot be run in offload mode (data transfers would be prohibitive), the whole code must be run on the co-processor.
Can someone knowledgeable confirm that it means I will have to "cross-compile" all the libraries (including Python) for the Xeon Phi arch, have those libs mounted over NFS on the Xeon Phi, and then execute it all there?
For cross-compilation: what is the target arch? Of course, for numerics the xeon-phi is a must due to extended intrinsics, but for e.g. Python, would the binaries and libs be binary-compatible with amd64? That would make it much easier, essentially only changing some flags for the number-crunching parts.
UPDATE: For the record, we've had a very bad support from Intel on the forums; realizing poor technical state of the software stack (yocto could not compile and so on), very little documentation and so on, we abandoned this path. Goodbye, Xeon Phi.
Why not first port it from Python (which is bytecode for a virtual machine -- that is a software emulation of a CPU -- then to be translated and executed on a certain 'real' hardware CPU). You could port to C++ or so, which -- when compiled for the target platform -- produces machine code that runs natively on the target. That should improve execution speed, right, so you may not even need a Xeon Phi.

Installing PyPy from source on low RAM devices

I have a little bit of a wreck of a computer; 7+ years old, Intel Celeron # 430 1.78 GHz, 448 MB of RAM, Lord only knows what motherboard graphics chip, etc. etc. running Lubuntu 14.04 LTS 32-bit, and I want to install PyPy, but due to my Linux version I need to build it from source, their install guide basically suggested that I need much more RAM than I have...
I have tons of hard drive space, which could be used as a pagefile, is there any way to provide it with enough RAM to do this, (either up to ~1.6 GB and use their low RAM tweak in their guide, or 2-3 GB to do so without the tweak), by making it use a page file?
Also, don't worry about the speed of the process, as long as it doesn't exceed much more than 24 hours to build it...
With only 400MB of RAM and a big pagefile, it's very likely to never finish in "just" one day (but I don't actually know). You need to build on some other machine and then copy the binary, or take the binary we provide and then add the exact same version of all missing libraries...

Python execution speed: laptop vs desktop

I am running a program that does simple data processing:
parses text
populates dictionaries
calculates some functions over the resulting data
The program only uses CPU, RAM, and HDD:
run from Windows command line
input/output to the local hard drive
nothing displayed on or printed to screen
no networking
The same program is run on:
desktop: Windows 7, i7-930 CPU overclocked #3.6 GHz (with matching memory speed), Intel X-25M SSD
laptop: Windows XP, Intel Core2 Duo T9300 #2.5GHz, 7200 rpm HDD
The CPU is 1.44 faster frequency, HDD is 4 times higher benchmark score (Passmark - Disk Mark). I found the program runs just around 1.66 times faster on the desktop. So apparently, the CPU is the bottleneck.
It seems there's only 15% benefit from the i7 Core vs Intel Core2 Duo architecture (most of the performance boost is due to the straight CPU frequency). Is there anything I can do in the code to increase the benefit of the new architecture?
EDIT: forgot to mention that I use ActivePython 3.1.2 if that matters.
The increasing performance of hardware brings in most cases automatically results in benefit to user applications. The much maligned "GIL" means that you may not be able to take advantage of multicores with CPython unless you design your program to take advantage via various multiprocessing modules / libraries.
SO discussion on the same : Does python support multiprocessor/multicore programming?
A related collation of solutions on python wiki: http://wiki.python.org/moin/ParallelProcessing
Split your processing into multiple threads. Your particular i7 should be able to support up to 8 threads in parallel.
Consider repeating on regular HDD's - that SSD could well result in a substantial performance difference depending on caches, and the nature of that data.

Categories

Resources