Machine learning development environment [closed]

Machine learning development environment [closed] - python

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I use python for doing prototypes in machine learning but have often been frustrated with the slow interpreter. Is there a language which is good for prototype (enough libraries like sklearn, numpy, scipy) but at the same time is fast and is a powerful language.
What I am looking for is something that I can prototype in and deploy in production as well. What do people commonly use ?

As far as I know, Python is as good as it gets if you want a real language with lots of libraries.
MATLAB is probably the most popular commercial solution for prototyping. It has numerous built-ins and is easy to handle. In terms of performance, MATLAB is currently king in prototyping, second only to compiled languages for production (C, Fortran, C++, ...). It's not a proper language, though, so I guess this isn't what you are looking for.

Python is pretty much as good as it gets for the sort of prototyping you describe. However, I have to ask, if you're frustrated with its speed as a numeric language: how are you writing your code? The way to do this in Python is with Numpy, which is a package for numerical computing where the underlying operations on arrays (matrices) are performed using compiled C code. It does mean learning how to express your computations as matrix operations however, so if you're not used to linear algebra/matrix manipulation then it might require a bit of getting used to. It's basically a Matlab-like environment.
My experience: if you're writing your python code using a lot of loops, element-wise operations, etc. it is slow and ugly. Once you learn the equivalent Numpy/Scipy way, the speed gains are phenomenal (and what you write is much closer to the mathematical expression too).

You can use R within Python RPy. This way you can use R functionalities within a python program for further usage.
Depending on what you want to do, you can also have a look on OpenCV Python, for lower level machine learning tools (SVM ...)

Related

Best language for battery modelling? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm interested to learn if there's a general consensus around using one language or environment for building physics-based computational models of batteries?
The modelling typically involves mathematically representing electrochemical, mechanical and thermal phenomena, solving partial differential equations and outputting plots of different variables in two and three dimensions.
So far, I've seen various academic research groups using MATLAB, but from other questions here, I can see that Fortran and Python have been suggested for relatively generic physics modelling. (See here: https://goo.gl/3ACddi)
I have a preference for a free (as in beer & speech) environment, wherever possible, but I recognise that some proprietary environments may have built-in toolboxes that are useful. Additionally, I would like the environment to allow the code to be easily parallelized so that it can run across many cores.

This is a broad question, but I'll share what I've experienced so far. Maybe it's of some use. Keep in mind that this is all my personal option.
MATLAB: It's widely used in academic environments. One reason is that Mathworks is following a smart business strategy where educational licenses are very cheap compared to the retail prize, thus many students and professors get used to MATLAB, even if there might be something better for them out there.
MATLAB has the advantage of being very easy to code. It will often take you a short time to get the first prototype of your code running. This comes at the expense of performance (compared to C/C++ and Python, which are often a bit faster than MATLAB). One of the downsides is that Matlab was not meant to compete with C/C++ and the like. You don't even have namespaces in matlab. Writing frameworks in matlab is therefore a whole lot more tiresome (if not impossible) and inefficient than writing one in C/C++. For instance if you create a function in your workspace called max which does absolutely nothing, you have no way to call Matlab's built in max function as long as yours is in the workspace.
C++: I'm studying engineering and here C++ is the favourite choice when it comes to physical simulations. Compared to other languages it's really fast. And since the programmer is responsible for memory management, he or she can get the last 10% bit of performance by writing efficient and case specific code for handling memory. There's also a ton of Open Source libraries out there, for example Eigen which is a library for Matrix and Vector calculation.
C: Some people (hello Linus) are convinced, that C++ is not a good language and prefer the plain C since it is a bit faster and the library "bloat" (in C++ coming from STD, Boost and the likes) is smaller. More arguments against C++ are that it seduces the programmer into creating classes for every little thing and use Polymorphism out of laziness. Both things can have a negative impact on performance, but if it makes it worth refusing to work with C++ at all is up to you to decide. As a sidenote: The complete Linux Kernel is written in C, not C++ and many tools like GIT are also written in plain C.
Python: Another language suitable for rapid prototyping since you don't need to compile a lot and the syntax is optimized to be easy and intuitive to use. Debuggers are not necessary since you can simply use the Interpreter to check out different variables and their values, much like in matlab. But contrary to Matlab Python also allows you to create objects with methods and everything like C++. (I know that Matlab recently added classes, but I refuse to say it's equivalent to C++/Python). Python is also widely used for academic purposes. There are open source libraries for Machine Learning, Artificial Intelligence and everything. There are also libraries which allow you to use Fractions without approximations. I.e. 1/6 is stored as two numbers, numerator and denominator, and not as a double. In the open source community people are putting a great effort into copying many features Matlab has over to Python, which is why you'll find many open source enthusiasts using it.
You see, some languages are good for rapid prototyping, meaning for scenarios where you want to get a proof of concept. MATLAB is useful since you don't have to compile anything and you can quickly visualize results. Python is also worth noting for rapid prototyping. But once you need to deploy the code on actual hardware or want to sell a finished product with user interface and everything, you'd probably go with something like C/C++ or Python, but not Matlab.

Where to begin with Distributed Computing / Parallel Processing? (Python / C) [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I'm currently investigating topics for graduate studies in Computer Science and I've found a relatively large area of interest, Distributed Computing, that I'd like to get more information on. There are a handful of other questions [1,2,3] on StackOverflow that address similar matters, but not necessarily the question I'm going to ask, specifically related to the languages I'm looking for.
I've searched the web and found plenty of papers, articles, and even courses, such as this course from Rutgers, describing the theory and mechanics behind Distributed Computing. Unfortunately, most of these papers and courses I've found are fairly limited on describing the actual concepts of Distributed Computing in code. I'm looking for websites that can give me an introduction to the programming parts of Distributed Computing. (Preferably in C or Python.)
As a side note, I'd like to mention that this may even be more specifically towards how Parallel Computing fits into the field of Distributed Computing. (I haven't taken a course in either yet!)

Disclamer: I am a developer of SCOOP.
It really depends on your personality. If you prefer getting theoretical information before moving forth, you should read some books or get along with the technologies first. A list of books covering a good part of the subject would be:
Parallel Programming for multicore and cluster systems by Thomas Rauber, and Gudula Rünger (Springer-Verlag).
Principles of Parallel Programming by Calvin Lin and Lawrence Snyder (Addison-Wesley)
Patterns for Parallel Programming by Timothy G. Mattson and al. (Addison-Wesley)
Data-based technologies you may want to get acquainted with would be the MPI standard (for multi-computers) and OpenMP (for single-computer), as well as the pretty good multiprocessing module which is builtin in Python.
If you prefer getting your hands dirty first, you should begin with task-based frameworks which provides a simple and user-friendly usage. Both of these were an utmost focus while creating SCOOP. You can try it with pip -U scoop. On Windows, you may wish to install PyZMQ first using their executable installers. You can check the provided examples and play with the various parameters to understand what causes performance degradation or increase with ease. I encourage you to compare it to its alternatives such as Celery for similar work or Gevent for a coroutine framework. If you feel adventurous, don't be shy to test the builtin coroutines functionnalities of Python and plug them with various networking stacks.
Using a task-based framework will ease you the burden of theoretical analysis such as load balancing implementation details, serialization and so on which is non-trivial and can take a long time to debug and get working. It provides all the desired level of understanding of distributed systems. Bonus with open source software: Check the code to understand under-the-hood mechanical details.

I have had good experiences using the built in packages for python on a single machine. My friend has had great success using ipython on a machine with 128 cores.
Now there are different kinds of distributed computing like on clusters, clouds, or any machine on the internet like folding#home (including PS3s!) Don't forget about GPUs as well!
Some Python Links:
Various Python libraries
Ipython
Python and Parallel Computing presentation

Implementation of string functions in Python? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
Is there any documentation on how Python's string functions are implemented in Python?
I understand that str is a built-in module, and so its functions are implemented in C.
But isn't there code for it anyways? How about in Pypy? From what I've read so far, they've re-implemented a lot of built-in modules in Python itself.
Example Question: How is the split method of strings implemented? (Without writing my own implementation of it)
EDIT: I am not looking for an implementation written in C (which is the default implementation in the source code of Python/CPython).

This doesn't really answer the question posed, it's just a bit too long to be a comment.
Some quick digging through the source shows that PyPy has two implementations of split(), this high-level, readable version and this lower-level version, which appears to be the implementation of split() in rpython itself.
Neither of these implementations are equivalent to CPython's split() method (most obviously they do not handle the special case CPython does where the sep is not supplied). However, if you are merely interested in the basic algorithm used rather than the details, PyPy's implementations could be a guide (at a quick glance, it looks to be doing basically the same thing as both CPython and Jython).
As a general resource, though, there's no reason to think that PyPy's implementation of all string functions would mirror the algorithms used in CPython -- PyPy is, after all, intended as an optimized version of Python running in a JIT, and this may have significant impact on what the most reasonable implementation of a method is (especially string functions, which frequently can be performance bottlenecks and which the implementors of an "optimized" runtime therefore have incentive to optimize).
Thinking about the more general question, there's very little incentive for the CPython developers to maintain a separate set of pure Python implementations of the low-level library already maintained in C. It seems that there is to much risk that the mirror implementations would grow stale or inaccurate to what is actually being done, which could ultimately be harmful for people who were trying to understand the inner workings of Python without reading the C code.

Best language for Molecular Dynamics Simulator, to be run in production. (Python+Numpy?) [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I need to build a heavy duty molecular dynamics simulator. I am wondering if python+numpy is a good choice. This will be used in production, so I wanted to start with a good language. I am wondering if I should rather start with a functional language like eg.scala. Do we have enough library support for scientific computation in scala? Or any other language/paradigm combination you think is good - and why. If you had actually built something in the past and are talking from experience, please mention it as it will help me with collecting data points.
thanks much!

The high performing MD implementations tend to be decidedly imperative (as opposed to functional) with big arrays of data trumping object-oriented design. I've worked with LAMMPS, and while it has its warts, it does get the job done. A perhaps more appealing option is HOOMD, which has been optimized from the beginning for Nvidia GPUs with CUDA. HOOMD doesn't have all the features of LAMMPS, but the interface seems a bit nicer (it's scriptable from Python) and it's very high performance.
I've actually implemented my own MD code a couple times (Java and Scala) using a high level object oriented design, and have found disappointing performance compared to the popular MD implementations that are heavily tuned and use C++/CUDA. These days, it seems few scientists write their own MD implementations, but it is useful to be able to modify existing ones.

I believe that most highly performant MD codes are written in native languages like Fortran, C or C++. Modern GPU programming techniques are also finding favour more recently.
A language like Python would allow for much more rapid development that native code. The flip side of that is that the performance is typically worse than for compiled native code.
A question for you. Why are you writing your own MD code? There are many many libraries out there. Can't you find one to suit your needs?

Why would you do this? There are many good, freely available, molecular dynamics packages out there you could use: LAMMPS, Gromacs, NAMD, HALMD all come immediately to mind (along with less freely available ones like CHARMM, AMBER, etc.) Modifying any of these to suit your purpose is going to be vastly easier than writing your own, and any of these packages, with thousands of users and dozens of contributors, are going to be better than whatever you'd write yourself.
Python+numpy is going to be fine for prototyping, but it's going to be vastly slower (yes, even with numpy linked against fast libraries) than C/C++/Fortran, which is what all the others use. Unless you're using GPU, in which case all the hard work is done in kernels written in C/C++ anyway.

Another alternative if you want to use Python is to take a look at OpenMM:
https://simtk.org/home/openmm
It's a Molecular Dynamics API that has many of the basic elements that you need (integrators, thermostats, barostats, etc) and supports running on the CPU via OpenCL and GPU via CUDA and OpenCL. It has a python wrapper that I've used before and basically mimics the underlying c-api calls. It's been incorporated into Gromacs, and MDLab, so you have some examples of how to integrate it if you're really dead set on building something from (semi) scratch
However as others have said, I highly recommend taking a look at NAMD, Gromacs, HOOMD, LAMMPS, DL_POLY, etc to see if it fits your needs before you embark on re-inventing the wheel.

python implemented in assembly [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
Am just being curious but I would like to know whether python can be implemented in assembly and if not why has it not been done to help for speed issues. forgive my naivete in matters of programming languages.

The main implementation is written in C, and that's compiled to machine code (i.e. assembly made readable for the CPU). So writing it assembly is certainly possible, and if it's possible for a compiler, it's possible for humans - in theory. In practice, it is not even remotely practical. Not only asm is even more low-level than C (increasing development time significantly, perhaps even expotentially to the project size), it's also highly platform-specific, so each port takes a huge lot of work (and maintaince is multiplied by the number of supported platforms - quite a few in the case of CPython).
Apart from that, it's highly questionable if this would give a notable speed bonus. Writing it closer to the metal doesn't make stuff go faster magically (the contrary can be the case - you'd be hard-pressed to find a programmer who can consistently write better assembly than the four or five well-known C compilers). And much of Python's slowness comes from the many many abstractions and indirections the language consists of, not from a sloppy implementation of these.
A more promising approach (which is indeed followed by several alternative implementations) is a clever Just In Time-Compiler (JIT), which preserves all the dynamicness but exploits the fact that most Python programs make little use of that dynamicness by recognizing the most common paths at runtime and optimizing for these. Such complex programs are again not written in asm.

Native code isn't a magic make-it-go-faster operation. The language semantics really dictate quite a bit about how fast (or not) a language is. (For instance, erlang compiled to native code via Hipe is still fairly slow).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.