I'm trying to start work with this: https://github.com/deepsound-project/samplernn-pytorch
I've installed all the library dependencies through Anaconda console but I'm then not sure how I'm to run the python training scripts.
I guess I just need general help with getting a git RNN in python working? I've found a lot of tutorials that show working from notebooks in Jupyter or even from scratch but can't find ones working from python code files?
I'm sorry if my terminology is backward, I'm an architect who is attempting coding, note a software engineer.
There are instructions for getting the SampleRNN implementation working in terminal on the git page. All of the commands listed on the page are for calling the Python scripts from terminal, not from a Jupyter Notebook. If you've installed all the correct dependencies then in theory all you should need to do is call the terminal scripts to try it out.
FYI it took me a while to find a combination of parameters with which this model would train without running into memory errors, but I was working with my own dataset, not the one provided. It's also very intensive - the default train time is 1000 epochs which even on my relatively capable GPU was prohibitively high, so you might want to reduce that value considerably just to reach the end of a training cycle unless you have a sweet setup :)
Related
I've trained my model locally and now I want to use it in my Kubernetes cluster. Unfortunately, all the Docker images for Pytorch are 5+ GBs because they contain the scripts for training which I won't need now. I've created my own image which is only 3.5 GBs but still huge. Is there a slim Pytorch version for predictions? If not, which parts of the package can I safely remove and how?
No easy answer for Python version of PyTorch unfortunately (or at least none I’m aware of).
Python, in general, is not well-suited for Docker deployments as it carries over the dependencies (even if you don't need all of their functionality, imports are often at the top of the file making your aforementioned removal infeasible for projects of PyTorch size and complexity).
There is a way out though...
torchscript
Given your trained model you can convert it to traced/scripted version (see here). After you manage that:
Inference in other languages
Write your inference code in another language, either Java or C++(see here for more info).
I have only used C++, but you might get there easier with Java, I think.
Results
Managed to get PyTorch for CPU inference to roughly ~32MB, GPU would weight more and be way more complex though and would probably need ~1GB of CUDNN dependency itself.
C++ way
Please note torchlambda project is not currently maintained and I’m the creator, hopefully it gives you some tips at least.
See:
Dockerfile for the image build
CMake used for building
Docs for more info about compilation options etc.
C++ inference code
Additional notes:
It also uses AWS SDKs and you would have to remove them from at least these files
You don't need static compilation - it would help to reach the lowest possible (I could come up with) image size, but not strictly necessary (additional ‘100MB’ or so)
Final
Try Java first as it’s packaging is probably saner (although final image would probably be a little bigger)
The C++ way not tested for the newest PyTorch version and might be subject to change with basically any release
In general it takes A LOT of time and debugging, unfortunately.
I am computational chemist and use python codes (through jupyter notebook) to make analysis of my system. Today, while doing Principal Component Analysis and trying plot some results, it returns MemoryError.
I try to find cause by googling, the suggestions were to look if I am not using 34 bit version, but I am sure that this is not my problem (and in addition - I am using PC with Linux). Another suggestion - to delete something...so, I deleted some bigger files, which were not needed anymore - it doesnt help.
Then I found some suggestins, which were made for the specific task the other people ask and those were not my topic.
In particular, MemoryError occurs using mdanalysis package for PCA and using matplotlib inline.
I am learning Python for data science, but my problem is that I still don't understand the difference between Spyder and Jupyter!
I would like you guys to help me to understand the difference, please; I would appreciate that.
Here's just a basic summary of the two tools.
Jupyter is a very popular application used for data analysis. It's an IPython notebook ("interactive python"). You can run each block of code separately. For example, I can print a graph using matplotlib. Create a new block of code and print another graph. There are also cool functions like %timeit that test the speed of your code.
Spyder is an Integrated Development Environment (IDE) for Python like Atom, Visual Studio, etc. I use VS Code and I suggest you install it as well. It's easier to learn and get running. There's also tons of helpful youtube videos due to its popularity.
I prefer to use Jupyter notebook to analyze data whether it be in pandas dataframes or plots. When I'm developing a program or implementing new code on data I already analyzed, I use a text editor like VS Code.
There's a lot more to it, but I think that's all you need to know for now. As you gain more experience you'll learn more about the tools and find your preferences. If you want to know more, there a ton of information about them online with people who can probably explain this much better than I can.
I hope your journey into data science goes well! Just be patient and remember struggling is part of learning. Good luck!
Spyder Pros:
Code completion
Code cells: You can create code cells using Spyder.
Scientific libraries
PDB debugger
Help feature
cons:
Limited to python only.
Bad layout not customizable
Jupyter pros:
Easy to learn
Secure and free server - The Jupyter server can be utilized free of charge.
Keyboard shortcuts makes it easy and fast
Share Notebook
cons:
Not recommended for running long, nonconcurrent errands.
No IDE integration, no linting, and no code-style adjustment.
Read more in detail https://ssiddique.info/pycharm-vs-spyder-vs-jupyter.html
I got a Neural Network implemented with numpy (Python 2.7) and a machine to test it faster. Lately my code on this machine got freeze but if a test it on my notebook (less cpu, ram, etc) it run without problem (only slower).
Which can be the problem? I thought that it was my code but if it works on a slower pc, so I think that machine have a trouble.
edit: Also, sometimes it works without problems.
edit 2: Both pcs are Ubuntu 16.04
edit 3: It happens event with same input and parameters
If it doesn't always occurs and is confined to one machine it could very well be a hardware problem.
The problem is that they are often hard to test, because they generally leave little evidence in the way of log files.
Try testing the RAM.
If that doesn't turn up errors, try logging the CPU temperatures to check that it doesn't get too hot.
Also, log the different voltages. It could be that the power supply is on the way out.
Try compiling code on the same machine where your code gets frozen. Each machine (more precisely microprocessor) has different instruction set. The flaws in instruction set may be covered by using Microcode. This could be the place where the problem may exist.
I am running a Python script using Spyder 2.3.9. I have a fairly large script and when running it through with (300x600) iterations (a loop inside another loop), everything appears to be working fine and takes approximately 40 minutes. But when I increase the number to (500x600) iterations, after 2 hours, the output yields:
It seems the kernel died unexpectedly. Use 'Restart kernel' to continue using this console.
I've been trying to go through the code but don't see anything that might be causing this in particular. I am using Python 2.7.12 64bits, Qt 4.8.7, PyQt4 (API v2) 4.11.4. (Anaconda2-4.0.0-MacOSX-x86_64)
I'm not entirely sure what additional information is pertinent, but if you have any suggestions or questions, I'd be happy to read them.
https://github.com/spyder-ide/spyder/issues/3114
It seems this issue has been opened on their GitHub profile, should be addressed soon given the repo record.
Some possible solutions:
It may be helpful, if possible, to modify your script for faster convergence. Very often, for most practical purposes, the incremental value of iterations after a certain point is negligible.
An upgrade or downgrade of the Spyder environment may help.
Check your local firewall for blocked connections to 127.0.0.1 from pythonw.exe.
If nothing works, try using Spyder on Ubuntu.