I've trained my model locally and now I want to use it in my Kubernetes cluster. Unfortunately, all the Docker images for Pytorch are 5+ GBs because they contain the scripts for training which I won't need now. I've created my own image which is only 3.5 GBs but still huge. Is there a slim Pytorch version for predictions? If not, which parts of the package can I safely remove and how?
No easy answer for Python version of PyTorch unfortunately (or at least none I’m aware of).
Python, in general, is not well-suited for Docker deployments as it carries over the dependencies (even if you don't need all of their functionality, imports are often at the top of the file making your aforementioned removal infeasible for projects of PyTorch size and complexity).
There is a way out though...
torchscript
Given your trained model you can convert it to traced/scripted version (see here). After you manage that:
Inference in other languages
Write your inference code in another language, either Java or C++(see here for more info).
I have only used C++, but you might get there easier with Java, I think.
Results
Managed to get PyTorch for CPU inference to roughly ~32MB, GPU would weight more and be way more complex though and would probably need ~1GB of CUDNN dependency itself.
C++ way
Please note torchlambda project is not currently maintained and I’m the creator, hopefully it gives you some tips at least.
See:
Dockerfile for the image build
CMake used for building
Docs for more info about compilation options etc.
C++ inference code
Additional notes:
It also uses AWS SDKs and you would have to remove them from at least these files
You don't need static compilation - it would help to reach the lowest possible (I could come up with) image size, but not strictly necessary (additional ‘100MB’ or so)
Final
Try Java first as it’s packaging is probably saner (although final image would probably be a little bigger)
The C++ way not tested for the newest PyTorch version and might be subject to change with basically any release
In general it takes A LOT of time and debugging, unfortunately.
Related
Hello I know that the key to analyzing data and working with artificial intelligence is to use the gpu and not the cpu. The problem is that I don't know how to use it with Python in the visual studio code, I use Ubuntu, I already have nvidia installed
You have to use with the libraries that are designed to work with the GPUs.
You can use Numba to compile Python code directly to binary with CUDA/ROC support, but I don't really know how limiting it is.
Another way is to call APIs that are designed for parallel computing such as OpenCL, there is PyOpenCL binding for this.
A bit limiting, but sometimes valid way - OpenGL/DirectX compute shaders, they are extremely easy, but not so fast if you need to transfer data back and forth in small batches.
I would like to train tensorflow models with the python API but use the trained graphs for inference in Matlab. I searched for possibilities to do this, but I can't seem to figure it out.
Does anybody have a good idea how to do this? Do I have to compile the model with bazel? Do I do it with tensorflow serving? Do I load the metagraph in a C++ function that I include in Matlab?
Please keep in mind that I'm an enigeer and don't have extensive programming knowledge :)
In case someone lands here with a similar question, I'd like to suggest tensorflow.m - a Matlab package I am currently writing (available on GitHub).
Although still in development, simple functionality like importing a frozen graph and running an inference is already possible (see the examples) - I believe this is what you were looking for?
The advantage is that you don't need any expensive toolbox nor a Python/Tensorflow installation on your machine. I'd be glad if the package can be of use for someone looking for similar solutions; even more so, in case you extend/implement something and open a PR.
I'm curious about what tf.contrib is, and why the code would be included in TensorFlow, but not in the main repository.
Furthermore, looking at the example here (from the tensorflow master branch), and I want to find the source for tf.contrib.layers.sparse_column_with_hash_bucket.
It seems like some cool routines, but I wanted to make sure they were properly using queues, etc, for pre-fetching/pre-processing examples to actually use them in a production setting.
It appears to be documented here, but it is from the tflearn project, but tf.contrib.layers.sparse_column_with_hash_bucket doesn't seem to be in that repository either.
In general, tf.contrib contains contributed code. It is meant to contain features and contributions that eventually should get merged into core TensorFlow, but whose interfaces may still change, or which require some testing to see whether they can find broader acceptance.
The code in tf.contrib isn't supported by the Tensorflow team. It is included in the hope that it is helpful, but it might change or be removed at any time; there are no guarantees.
The source of tf.contrib.layers.sparse_column_with_hash_bucket can be found at
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/feature_column.py#L365
I'm implementing a real-time LMS algorithm, and numpy.dot takes more time than my sampling time, so I need numpy to be faster (my matrices are 1D and 100 long).
I've read about building numpy with ATLAS and such, but never done such thing and spent all my day trying to do it, with zero succes...
Can someone explain why there aren't builds with ATLAS included? Can anyone provide me with one? Is there any other way to speed up dot product?
I've tried numba, and scipy.linalg.gemm_dot but none of them seemed to speed things up.
my system is Windows8.1 with Intel processor
If you download the official binaries, they should come linked with ATLAS. If you want to make sure, check the output of np.show_config(). The problem is that ATLAS (Automatically Tuned Linear Algebra System) checks many different combinations and algorithms, and keeps the best at compile time. So, when you run a precompiled ATLAS, you are running it optimised for a computer different than yours.
So, your options to improve dot are:
Compile ATLAS yourself. On Windows it may be a bit challenging, but it is doable. Note: you must use the same compiler used to compile Python. That is, if you decide to go for MinGW, you need to get Python compiled with MinGW, or build it yourself.
Try Christopher Gohlke's Numpy. It is linked against MKL, that is much faster than ATLAS (and does all the optimisations at run time).
Try Continuum analytics' Conda with accelerate (also linked with MKL). It costs money, unless you are an academic. In Linux, Conda is slower than system python because they have to use an old compiler for compatibility purposes; I don't know if that is the case on Windows.
Use Linux. Your Python life will be much easier, setting up the system to compile stuff is very easy. Also, setting up Cython is simple too, and then you can compile your whole algorithm, and probably get further speed up.
The note regarding Cython is valid for Windows too, it is just more difficult to get it working. I tried a few years ago (when I used Windows), and failed after a few days; I don't know if the situation has improved.
Alternative:
You are doing the dot product of two vectors. Then, np.dot is probably not the most efficient way. I would give a shot to spell it out in plain Python (vec1*vec2).sum() (could be very good for Numba, this expression it can actually optimise) or using numexpr:
ne.evaluate(`sum(vec1 * vec2)`)
Numexpr will also parallelise the expression automatically.
I would like to know if there are any documented performance differences between a Python interpreter that I can install from an rpm (or using yum) and a Python interpreter compiled from sources (with a priori well set flags for compilations).
I am using a Redhat 6.3 machine as Django/Apache/Mod_WSGI production server. I have already properly compiled everything in different setups and in different orders. However, I usually keep the build-dev dependencies on such machine. For some various ego-related (and more or less practical) reasons, I would like to use Python-2.7.3. By default, Redhat comes with Python-2.6.6. I think I could go with it but it would hurt me somehow (I would have to drop and find a replacement for a few libraries and my ego).
However, besides my ego and dependencies, I would like to know what would be the impact in terms of performance for a Django server.
If you compile with the exact same flags that were used to compile the RPM version, you will get a binary that's exactly as fast. And you can get those flags by looking at the RPM's spec file.
However, you can sometimes do better than the pre-built version. For example, you can let the compiler optimize for your specific CPU, instead of for "general 386 compatible" (or whatever the RPM was optimized for). Of course if you don't know what you're doing (or are doing it on purpose), it's always possible to build something slower than the pre-built version, too.
Meanwhile, 2.7.3 is faster in a few areas than 2.6.6. Most of them usually won't affect you, but if they do, they'll probably be a big win.
Finally, for the vast majority of Python code, the speed of the Python interpreter itself isn't relevant to your overall performance or scalability. (And when it is, you probably want to try PyPy, Jython, or IronPython to replace CPython.) This is especially true for a WSGI service. If you're not doing anything slow, Apache will probably be the bottleneck. If you are doing anything slow, it's probably something I/O bound and well outside of Python's control (like reading files).
Ultimately, the only way you can know how much gain you get is by trying it both ways and performance testing. But if you just want a rule of thumb, I'd say expect a 0% gain, and be pleasantly surprised if you get lucky.