Acquiring basic skills working with visualizing/analyzing large data sets [closed] - python

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking for a way to learn to be comfortable with large data sets. I'm a university student, so everything I do is of "nice" size and complexity. Working on a research project with a professor this semester, and I've had to visualize relationships between a somewhat large (in my experience) data set. It was a 15 MB CSV file.
I wrote most of my data wrangling in Python, visualized using GNUPlot.
Are there any accessible books or websites on the subject out there? Bonus points for using Python, more bonus points for a more "basic" visualization system than relying on gnuplot. Cairo or something, I suppose.
Looking for something that takes me from data mining, to processing, to visualization.
EDIT: I'm more looking for something that will teach me the "big ideas". I can write the code myself, but looking for techniques people use to deal with large data sets. I mean, my 15 MB is small enough where I can put everything I would ever need into memory and just start crunching. What do people do to visualize 5 GB data sets?

I'd say the most basic skill is a good grounding in math and statistics. This can help
you assess and pick from the variety of techniques for filtering data, and
reducing its volume and dimensionality while keeping its integrity. The last
thing you'd want to do is make something pretty that shows patterns or
relationships which aren't really there.
Specialized math
To tackle some types of problems you'll need to learn some math to understand how particular algorithms work and what effect they'll have on your data. There are various algorithms for clustering data, dimensionality reduction, natural
language processing, etc. You may never use many of these, depending on the type of data you wish to analyze, but there are abundant resources on the Internet
(and Stack Exchange sites) should you need help.
For an introductory overview of data mining techniques, Witten's Data Mining is good. I have the 1st edition, and it explains concepts in plain language with a bit of math thrown in. I recommend it because it provides a good overview and it's not too expensive -- as you read more into the field you'll notice many of the books are quite expensive. The only drawback is a number of pages dedicated to using WEKA, an Java data mining package, which might not be too helpful as you're using Python (but is open source, so you may be able to glean some ideas from the source code. I also found Introduction to Machine Learning to provide a good overview, also reasonably priced, with a bit more math.
Tools
For creating visualizations of your own invention, on a single machine, I think the basics should get you started: Python, Numpy, Scipy, Matplotlib, and a
good graphics library you have experience with, like PIL or
Pycairo. With these you can crunch numbers, plot them on graphs, and pretty things up via custom drawing routines.
When you want to create moving, interactive visualizations, tools like the
Java-based Processing library make this easy. There
are even ways of writing Processing sketches in
Python via Jython, in case you don't want to write Java.
There are many more tools out there, should you need them, like OpenCV (computer vision,
machine learning), Orange (data mining,
analysis, viz), and NLTK (natural language, text
analysis).
Presentation principles and techniques
Books by folks in the field like Edward
Tufte and references like
Information
Graphics
can help you get a good overview of the ways of creating visualizations and
presenting them effectively.
Resources to find Viz examples
Websites like Flowing Data, Infosthetics, Visual Complexity and Information is
Beautiful show recent, interesting
visualizations from across the web. You can also look through the many compiled lists of of visualization sites out there on the Internet. Start with these as a seed and start navigating around, I'm sure you'll find a lot of useful sites and inspiring examples.
(This was originally going to be a comment, but grew too long)

Check out Information is beautiful. It is not a technical book but it might give you a couple of ideas for visualising data.
And maybe have a look at the first 3 chapters of Principles of Data Mining, it goes through some concepts of visualizing data in data mining context, I found some parts of it useful during university.
Hope this helps

If you are looking for visualization rather than data mining and analysis, The Visual Display of Quantitative Information by Edward Tufte is considered one of the best books in the field.

I like the book Data Analysis with Open Source Tools by Janert. It is a pretty broad survey of data analysis methods, focusing on how to understand the system that produced the data, rather than on sophisticated statistical methods. One caveat: while the mathematics used isn't especially advanced, I do think you will need to be comfortable with mathematical arguments to gain much from the book.

Related

Best language for Molecular Dynamics Simulator, to be run in production. (Python+Numpy?) [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I need to build a heavy duty molecular dynamics simulator. I am wondering if python+numpy is a good choice. This will be used in production, so I wanted to start with a good language. I am wondering if I should rather start with a functional language like eg.scala. Do we have enough library support for scientific computation in scala? Or any other language/paradigm combination you think is good - and why. If you had actually built something in the past and are talking from experience, please mention it as it will help me with collecting data points.
thanks much!
The high performing MD implementations tend to be decidedly imperative (as opposed to functional) with big arrays of data trumping object-oriented design. I've worked with LAMMPS, and while it has its warts, it does get the job done. A perhaps more appealing option is HOOMD, which has been optimized from the beginning for Nvidia GPUs with CUDA. HOOMD doesn't have all the features of LAMMPS, but the interface seems a bit nicer (it's scriptable from Python) and it's very high performance.
I've actually implemented my own MD code a couple times (Java and Scala) using a high level object oriented design, and have found disappointing performance compared to the popular MD implementations that are heavily tuned and use C++/CUDA. These days, it seems few scientists write their own MD implementations, but it is useful to be able to modify existing ones.
I believe that most highly performant MD codes are written in native languages like Fortran, C or C++. Modern GPU programming techniques are also finding favour more recently.
A language like Python would allow for much more rapid development that native code. The flip side of that is that the performance is typically worse than for compiled native code.
A question for you. Why are you writing your own MD code? There are many many libraries out there. Can't you find one to suit your needs?
Why would you do this? There are many good, freely available, molecular dynamics packages out there you could use: LAMMPS, Gromacs, NAMD, HALMD all come immediately to mind (along with less freely available ones like CHARMM, AMBER, etc.) Modifying any of these to suit your purpose is going to be vastly easier than writing your own, and any of these packages, with thousands of users and dozens of contributors, are going to be better than whatever you'd write yourself.
Python+numpy is going to be fine for prototyping, but it's going to be vastly slower (yes, even with numpy linked against fast libraries) than C/C++/Fortran, which is what all the others use. Unless you're using GPU, in which case all the hard work is done in kernels written in C/C++ anyway.
Another alternative if you want to use Python is to take a look at OpenMM:
https://simtk.org/home/openmm
It's a Molecular Dynamics API that has many of the basic elements that you need (integrators, thermostats, barostats, etc) and supports running on the CPU via OpenCL and GPU via CUDA and OpenCL. It has a python wrapper that I've used before and basically mimics the underlying c-api calls. It's been incorporated into Gromacs, and MDLab, so you have some examples of how to integrate it if you're really dead set on building something from (semi) scratch
However as others have said, I highly recommend taking a look at NAMD, Gromacs, HOOMD, LAMMPS, DL_POLY, etc to see if it fits your needs before you embark on re-inventing the wheel.

How to use python, PyLab, NumPy, etc for my Physics lab class over excel

I took a scientific programming course this semester that I really enjoyed and experimented with a lot. We used python, and all the related modules. I am taking a physics lab next semester and I just wanted to hear from some of you how python can help me in ways that excel can't or in ways that are better than excel's capabilities. I use Mathematica for symbolic stuff so I would use python for data purposes.
Off the top of my head, here are the related things I can do:
All of the things you would expect in a intro course (loops, arrays, slicing arrays, etc).
Reading data from a text file.
Plotting scatter, line, and bar graphs.
Learning how to plot linear regression but haven't totally figured it out.
I have done 7 of the problems on Project Euler (nothing to brag about, but it might give you a better idea of where I stand in skills).
Looking forward to hearing from some of you. You don't have to explain how to use the things you mention, I could look up the documentation.
The paper Python all a scientist needs comes to mind. I hope you can make the needed transformations from Biology to Physics.
Scipy will also be useful to you, as it includes many more advanced analysis tools. For example, Scipy includes a linear regression, and gets more interesting from there. Along with the other tools you mentioned, you'll probably find most of your needs covered.
Other notes on tool selection:
Mathematica is a great tool, if you can afford it. I've played around with the other options, like Sympy, and sadly, they don't come close to being as useful as Mathematica.
I can't imagine using Excel for any serious scientific work. If you're planning to continue forward using the tools that you learn in class, you might as well start with tools that offer you that potential.
Don't reject Excel outright. It's still great for doing simple data analysis and plotting. Excel also has the considerable advantage of being installed on most engineer and scientist's computers, making it a lot easier to share your work with colleagues.
That said, I do use Python when Excel just won't cut it; times when I've had to:
color the points in a scatter plot based on a third column
plot a field of vectors
extract a few values from each of several thousand data files to do statistical process control
generate dozens of scatter plots over different dimensions of a large data set to find which variables are important
solve a nonlinear equation at several intermediate points of a calculation, not just as the final result.
accept variable length input from a user to define a problem
VBA in Excel can do a lot of those things too, but it becomes painful fast in such a primitive language. I dream that Microsoft will make IronPython a first-class scripting language in the next version of Excel. Until then, you might want to try Resolver One
I can recall 2 presentations by Jan Martinek on EuroScipy 2008, he's PhD candidate and presented some fun experiments with Physics in the background. Abstracts are here and I'm sure he would't mind to share more if you contact him directly. Also, take a look at other presentation from EuroScipy, there are some more Physics-related ones.

Programming a Self Learning Music Maker [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I want to learn how to program a music application that will analyze songs.
How would I get started in this and is there a library for analyzing soundwaves?
I know C, C++, Java, Python, some assembly, and some Perl.
Related question: Algorithm for music imitation
Composition and analysis of music by computer is a huge field. There are two basic areas in this type of work, which overlap somewhat.
Algorithmic composition is concerned with the generation of music. This can be based on statistical approaches such as Markov chaining, mathematical models employing fractal or chaotic processes, or leveraging techniques from AI such as expert systems, neural networks and genetic algorithms.
Music information retrieval is concerned with identifying common grammars, commonalities and similarity metrics between pieces of music, and identifying uniqueness (sometimes called acoustic fingerprinting).
Many, many libraries, tools and specialised programming languages exist which can help with different parts of these problems. Here's a list of music-related programs and libraries for Python. There is a lot of technology available; you should be able to find something that will do the brunt of the work for you. Reimplementing a 'musical parser' through very low-level frequency analysis tools such as Fourier Transforms, as other answers have suggested, while possible, will be quite difficult and is almost certainly unnecessary.
For further advice and specific questions, the International Society for Music Information Retrieval has a mailing list which you would probably find very helpful.
Once you get past the FFT stuff that Lennart mentioned, you might want to have a look at Markov chains for analyzing intervals between notes, and aggregated patterns.
This is kind of treaded ground, but Markov chains have been used in the past to build a kind of statistical model of melodies from various songs which can be used to generate new melodies. Markov chains can do the same with written english sentences. For an example of how that looks, have a play with the megahal chatterbot to see how markov chains can produce mangled output that statistically looks like its input (in megahal's case, it looks like english sentences)
You could concievably mash up the top 100, and have a markov chain generator blast out the next big hit.
On the other hand, you may want to consider the possibility that it is not any quality of the music itself that makes a song popular. Or perhaps it is a quality of music issue combined with marketing.
To analyze soundwaves you need some sort of fourier transformation (fft), so you can split the song up into it's frequencies and how they change over time. There exists fft support in numpy, I haven't used it, so I don't know if it's any good. But it would be a great place to start.
After that you then need to make some sort of statistical analysis on frequencies and patterns, and then I no longer have any clue what I'm talking about.
Cool stuff though, go for it!
You may like to start by looking at the MIDI format, it's reasonable simple compared to the compressed formats, and you can generate some nice things in it.
Depends what you want to do really.
There's the Echo Nest remix API that lets you analyze and manipulate music in Python. Some examples here: Where's the pow and here: You make me quantized miss lizzie. There's a nifty tutorial here: An overview of the Echo Nest API

Sentiment analysis for Twitter in Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm looking for an open source implementation, preferably in python, of Textual Sentiment Analysis (http://en.wikipedia.org/wiki/Sentiment_analysis). Is anyone familiar with such open source implementation I can use?
I'm writing an application that searches twitter for some search term, say "youtube", and counts "happy" tweets vs. "sad" tweets.
I'm using Google's appengine, so it's in python. I'd like to be able to classify the returned search results from twitter and I'd like to do that in python.
I haven't been able to find such sentiment analyzer so far, specifically not in python.
Are you familiar with such open source implementation I can use? Preferably this is already in python, but if not, hopefully I can translate it to python.
Note, the texts I'm analyzing are VERY short, they are tweets. So ideally, this classifier is optimized for such short texts.
BTW, twitter does support the ":)" and ":(" operators in search, which aim to do just this, but unfortunately, the classification provided by them isn't that great, so I figured I might give this a try myself.
Thanks!
BTW, an early demo is here and the code I have so far is here and I'd love to opensource it with any interested developer.
Good luck with that.
Sentiment is enormously contextual, and tweeting culture makes the problem worse because you aren't given the context for most tweets. The whole point of twitter is that you can leverage the huge amount of shared "real world" context to pack meaningful communication in a very short message.
If they say the video is bad, does that mean bad, or bad?
A linguistics professor was lecturing
to her class one day. "In English,"
she said, "A double negative forms a
positive. In some languages, though,
such as Russian, a double negative is
still a negative. However, there is no
language wherein a double positive can
form a negative."
A voice from the back of the room
piped up, "Yeah . . .right."
With most of these kinds of applications, you'll have to roll much of your own code for a statistical classification task. As Lucka suggested, NLTK is the perfect tool for natural language manipulation in Python, so long as your goal doesn't interfere with the non commercial nature of its license. However, I would suggest other software packages for modeling. I haven't found many strong advanced machine learning models available for Python, so I'm going to suggest some standalone binaries that easily cooperate with it.
You may be interested in The Toolkit for Advanced Discriminative Modeling, which can be easily interfaced with Python. This has been used for classification tasks in various areas of natural language processing. You also have a pick of a number of different models. I'd suggest starting with Maximum Entropy classification so long as you're already familiar with implementing a Naive Bayes classifier. If not, you may want to look into it and code one up to really get a decent understanding of statistical classification as a machine learning task.
The University of Texas at Austin computational linguistics groups have held classes where most of the projects coming out of them have used this great tool. You can look at the course page for Computational Linguistics II to get an idea of how to make it work and what previous applications it has served.
Another great tool which works in the same vein is Mallet. The difference between Mallet is that there's a bit more documentation and some more models available, such as decision trees, and it's in Java, which, in my opinion, makes it a little slower. Weka is a whole suite of different machine learning models in one big package that includes some graphical stuff, but it's really mostly meant for pedagogical purposes, and isn't really something I'd put into production.
Good luck with your task. The real difficult part will probably be the amount of knowledge engineering required up front for you to classify the 'seed set' off of which your model will learn. It needs to be pretty sizeable, depending on whether you're doing binary classification (happy vs sad) or a whole range of emotions (which will require even more). Make sure to hold out some of this engineered data for testing, or run some tenfold or remove-one tests to make sure you're actually doing a good job predicting before you put it out there. And most of all, have fun! This is the best part of NLP and AI, in my opinion.
Thanks everyone for your suggestions, they were indeed very useful!
I ended up using a Naive Bayesian classifier, which I borrowed from here.
I started by feeding it with a list of good/bad keywords and then added a "learn" feature by employing user feedback. It turned out to work pretty nice.
The full details of my work as in a blog post.
Again, your help was very useful, so thank you!
I have constructed a word list labeled with sentiment. You can access it from here:
http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/6010/zip/imm6010.zip
You will find a short Python program on my blog:
http://finnaarupnielsen.wordpress.com/2011/06/20/simplest-sentiment-analysis-in-python-with-af/
This post displays how to use the word list with single sentences as well as with Twitter.
Word lists approaches have their limitations. You will find a investigation of the limitations of my word list in the article "A new ANEW: Evaluation of a word list for sentiment analysis in microblogs". That article is available from my homepage.
Please note a unicode(s, 'utf-8') is missing from the code (for paedagogic reasons).
A lot of research papers indicate that a good starting point for sentiment analysis is looking at adjectives, e.g., are they positive adjectives or negative adjectives. For a short block of text this is pretty much your only option... There are papers that look at entire documents, or sentence level analysis, but as you say tweets are quite short... There is no real magic approach to understanding the sentiment of a sentence, so I think your best bet would be hunting down one of these research papers and trying to get their data-set of positively/negatively oriented adjectives.
Now, this having been said, sentiment is domain specific, and you might find it difficult to get a high-level of accuracy with a general purpose data-set.
Good luck.
I think you may find it difficult to find what you're after. The closest thing that I know of is LingPipe, which has some sentiment analysis functionality and is available under a limited kind of open-source licence, but is written in Java.
Also, sentiment analysis systems are usually developed by training a system on product/movie review data which is significantly different from the average tweet. They are going to be optimised for text with several sentences, all about the same topic. I suspect you would do better coming up with a rule-based system yourself, perhaps based on a lexicon of sentiment terms like the one the University of Pittsburgh provide.
Check out We Feel Fine for an implementation of similar idea with a really beautiful interface (and twitrratr).
Take a look at Twitter sentiment analysis tool. It's written in python, and it uses Naive Bayes classifier with semi-supervised machine learning. The source can be found here.
Maybe TextBlob (based on NLTK and pattern) is the right sentiment analysis tool for you.
I came across Natural Language Toolkit a while ago. You could probably use it as a starting point. It also has a lot of modules and addons, so maybe they already have something similar.
Somewhat wacky thought: you could try using the Twitter API to download a large set of tweets, and then classifying a subset of that set using emoticons: one positive group for ":)", ":]", ":D", etc, and another negative group with ":(", etc.
Once you have that crude classification, you could search for more clues with frequency or ngram analysis or something along those lines.
It may seem silly, but serious research has been done on this (search for "sentiment analysis" and emoticon). Worth a look.
There's a Twitter Sentiment API by TweetFeel that does advanced linguistic analysis of tweets, and can retrieve positive/negative tweets. See http://www.webservius.com/corp/docs/tweetfeel_sentiment.htm
For those interested in coding Twitter Sentiment Analyis from scratch, there is a Coursera course "Data Science" with python code on GitHub (as part of assignment 1 - link). The sentiments are part of the AFINN-111.
You can find working solutions, for example here. In addition to the AFINN-111 sentiment list, there is a simple implementation of builing a dynamic term list based on frequency of terms in tweets that have a pos/neg score (see here).

What is MATLAB good for? Why is it so used by universities? When is it better than Python? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I've been recently asked to learn some MATLAB basics for a class.
What does make it so cool for researchers and people that works in university?
I saw it's cool to work with matrices and plotting things... (things that can be done easily in Python using some libraries).
Writing a function or parsing a file is just painful. I'm still at the start, what am I missing?
In the "real" world, what should I think to use it for? When should it can do better than Python? For better I mean: easy way to write something performing.
UPDATE 1: One of the things I'd like to know the most is "Am I missing something?" :D
UPDATE 2: Thank you for your answers. My question is not about buy or not to buy MATLAB. The university has the possibility to give me a copy of an old version of MATLAB (MATLAB 5 I guess) for free, without breaking the license. I'm interested in its capabilities and if it deserves a deeper study (I won't need anything more than basic MATLAB in oder to pass the exam :P ) it will really be better than Python for a specific kind of task in the real world.
Adam is only partially right. Many, if not most, mathematicians will never touch it. If there is a computer tool used at all, it's going to be something like Mathematica or Maple. Engineering departments, on the other hand, often rely on it and there are definitely useful things for some applied mathematicians. It's also used heavily in industry in some areas.
Something you have to realize about MATLAB is that it started off as a wrapper on Fortran libraries for linear algebra. For a long time, it had an attitude that "all the world is an array of doubles (floats)". As a language, it has grown very organically, and there are some flaws that are very much baked in, if you look at it just as a programming language.
However, if you look at it as an environment for doing certain types of research in, it has some real strengths. It's about as good as it gets for doing floating point linear algebra. The notation is simple and powerful, the implementation fast and trusted. It is very good at generating plots and other interactive tasks. There are a large number of `toolboxes' with good code for particular tasks, that are affordable. There is a large community of users that share numerical codes (Python + NumPy has nothing in the same league, at least yet)
Python, warts and all, is a much better programming language (as are many others). However, it's a decade or so behind in terms of the tools.
The key point is that the majority of people who use MATLAB are not programmers really, and don't want to be.
It's a lousy choice for a general programming language; it's quirky, slow for many tasks (you need to vectorize things to get efficient codes), and not easy to integrate with the outside world. On the other hand, for the things it is good at, it is very very good. Very few things compare. There's a company with reasonable support and who knows how many man-years put into it. This can matter in industry.
Strictly looking at your Python vs. MATLAB comparison, they are mostly different tools for different jobs. In the areas where they do overlap a bit, it's hard to say what the better route to go is (depends a lot on what you're trying to do). But mostly Python isn't all that good at MATLAB's core strengths, and vice versa.
Most of answers do not get the point.
There is ONE reason matlab is so good and so widely used:
EXTREMELY FAST CODING
I am a computer vision phD student and have been using matlab for 4 years, before my phD I was using different languages including C++, java, php, python... Most of the computer vision researchers are using exclusively matlab.
1) Researchers need fast prototyping
In research environment, we have (hopefully) often new ideas, and we want to test them really quick to see if it's worth keeping on in that direction. And most often only a tiny sub-part of what we code will be useful.
Matlab is often slower at execution time, but we don't care much. Because we don't know in advance what method is going to be successful, we have to try many things, so our bottle neck is programming time, because our code will most often run a few times to get the results to publish, and that's all.
So let's see how matlab can help.
2) Everything I need is already there
Matlab has really a lot of functions that I need, so that I don't have to reinvent them all the time:
change the index of a matrix to 2d coordinate: ind2sub extract all patches of an image: im2col; compute a histogram of an image: hist(Im(:)); find the unique elements in a list unique(list); add a vector to all vectors of a matrix bsxfun(#plus,M,V); convolution on n-dimensional arrays convn(A); calculate the computation time of a sub part of the code: tic; %%code; toc; graphical interface to crop an image: imcrop(im);
The list could be very long...
And they are very easy to find by using the help.
The closest to that is python...But It's just a pain in python, I have to go to google each time to look for the name of the function I need, and then I need to add packages, and the packages are not compatible one with another, the format of the matrix change, the convolution function only handle doubles but does not make an error when I give it char, just give a wrong output... no
3) IDE
An example: I launch a script. It produces an error because of a matrix. I can still execute code with the command line. I visualize it doing: imagesc(matrix). I see that the last line of the matrix is weird. I fix the bug. All variables are still set. I select the remaining of the code, press F9 to execute the selection, and everything goes on. Debuging becomes fast, thanks to that.
Matlab underlines some of my errors before execution. So I can quickly see the problems. It proposes some way to make my code faster.
There is an awesome profiler included in the IDE. KCahcegrind is such a pain to use compared to that.
python's IDEs are awefull. python without ipython is not usable. I never manage to debug, using ipython.
+autocompletion, help for function arguments,...
4) Concise code
To normalize all the columns of a matrix ( which I need all the time), I do:
bsxfun(#times,A,1./sqrt(sum(A.^2)))
To remove from a matrix all colums with small sum:
A(:,sum(A)<e)=[]
To do the computation on the GPU:
gpuX = gpuarray(X);
%%% code normally and everything is done on GPU
To paralize my code:
parfor n=1:100
%%% code normally and everything is multi-threaded
What language can beat that?
And of course, I rarely need to make loops, everything is included in functions, which make the code way easier to read, and no headache with indices. So I can focus, on what I want to program, not how to program it.
5) Plotting tools
Matlab is famous for its plotting tools. They are very helpful.
Python's plotting tools have much less features. But there is one thing super annoying. You can plot figures only once per script??? if I have along script I cannot display stuffs at each step ---> useless.
6) Documentation
Everything is very quick to access, everything is crystal clear, function names are well chosen.
With python, I always need to google stuff, look in forums or stackoverflow.... complete time hog.
PS: Finally, what I hate with matlab: its price
I've been using matlab for many years in my research. It's great for linear algebra and has a large set of well-written toolboxes. The most recent versions are starting to push it into being closer to a general-purpose language (better optimizers, a much better object model, richer scoping rules, etc.).
This past summer, I had a job where I used Python + numpy instead of Matlab. I enjoyed the change of pace. It's a "real" language (and all that entails), and it has some great numeric features like broadcasting arrays. I also really like the ipython environment.
Here are some things that I prefer about Matlab:
consistency: MathWorks has spent a lot of effort making the toolboxes look and work like each other. They haven't done a perfect job, but it's one of the best I've seen for a codebase that's decades old.
documentation: I find it very frustrating to figure out some things in numpy and/or python because the documentation quality is spotty: some things are documented very well, some not at all. It's often most frustrating when I see things that appear to mimic Matlab, but don't quite work the same. Being able to grab the source is invaluable (to be fair, most of the Matlab toolboxes ship with source too)
compactness: for what I do, Matlab's syntax is often more compact (but not always)
momentum: I have too much Matlab code to change now
If I didn't have such a large existing codebase, I'd seriously consider switching to Python + numpy.
Hold everything. When's the last time you programed your calculator to play tetris? Did you actually think you could write anything you want in those 128k of RAM? Likely not. MATLAB is not for programming unless you're dealing with huge matrices. It's the graphing calculator you whip out when you've got Megabytes to Gigabytes of data to crunch and/or plot. Learn just basic stuff, but also don't kill yourself trying to make Python be a graphing calculator.
You'll quickly get a feel for when you want to crunch, plot or explore in MATLAB and when you want to have all that Python offers. Lots of engineers turn to pre and post processing in Python or Perl. Occasionally even just calling out to MATLAB for the hard bits.
They are such completely different tools that you should learn their basic strengths first without trying to replace one with the other. Granted for saving money I'd either use Octave or skimp on ease and learn to work with sparse matrices in Perl or Python.
MATLAB is great for doing array manipulation, doing specialized math functions, and for creating nice plots quick.
I'd probably only use it for large programs if I could use a lot of array/matrix manipulation.
You don't have to worry about the IDE as much as in more formal packages, so it's easier for students without a lot of programming experience to pick up.
MATLAB is a popular and widely adapted piece of a
sophisticated software package. It'd be a mistake to think
it's merely a math software since it has a wide range of
"toolboxes". I recently used Matplotlib to plot some data
from a database and it did the job without needing all the
bells and whistles of MATLAB. However, it may not be proper
to compare Python and MATLAB in every situation. As with
everything else the decision depends on what you need to do.
I used MATLAB in undergrad for control systems design and
simulation and also for image processing in grad school. For
these fields MATLAB makes the most sense because of the
powerful control and image processing toolboxes. As everyone
mentioned, array operations, which are used in every MATLAB
script you'd need to write, are very easy with MATLAB.
Another nice thing about MATLAB is that it's very easy and
fast to do prototyping and trying out ideas using the built
in toolbox functions. For instance, it takes no effort to
import an image and compute it's histogram or do some simple
processing on it. One disadvantage of MATLAB could be it's
speed because of its interpreted nature. However, if one
really needs speed than he can choose to implement the
tested logic in C/C++, etc.
For further comparison with Python, I can say that MATLAB
provides a full package for you to do your work without the
need of looking around for external libraries and
implementing extra functions.
One last point about MATLAB which I see is not mentioned in
the answers here is that it has a very powerful visual
modeling/simulation environment called Simulink. It's
easier to design and simulate larger systems with Simulink.
Finally, again, it all depends on the problem you need to
solve. If your problem domain can make use of one of
MATLAB's toolboxes and you have access to MATLAB then you
can be sure that you'll have the right tool to solve it.
MATLAB, as mentioned by others, is great at matrix manipulation, and was originally built as an extension of the well-known BLAS and LAPACK libraries used for linear algebra. It interfaces well with other languages like Java, and is well favored by engineering and scientific companies for its well developed and documented libraries. From what I know of Python and NumPy, while they share many of the fundamental capabilities of MATLAB, they don't have the full breadth and depth of capabilities with their libraries.
Personally, I use MATLAB because that's what I learned in my internship, that's what I used in grad school, and that's what I used in my first job. I don't have anything against Python (or any other language). It's just what I'm used too.
Also, there is another free version in addition to scilab mentioned by #Jim C from gnu called Octave.
Personally, I tend to think of Matlab as an interactive matrix calculator and plotting tool with a few scripting capabilities, rather than as a full-fledged programming language like Python or C. The reason for its success is that matrix stuff and plotting work out of the box, and you can do a few very specific things in it with virtually no actual programming knowledge. The language is, as you point out, extremely frustrating to use for more general-purpose tasks, such as even the simplest string processing. Its syntax is quirky, and it wasn't created with the abstractions necessary for projects of more than 100 lines or so in mind.
I think the reason why people try to use Matlab as a serious programming language is that most engineers (there are exceptions; my degree is in biomedical engineering and I like programming) are horrible programmers and hate to program. They're taught Matlab in college mostly for the matrix math, and they learn some rudimentary programming as part of learning Matlab, and just assume that Matlab is good enough. I can't think of anyone I know who knows any language besides Matlab, but still uses Matlab for anything other than a few pure number crunching applications.
The most likely reason that it's used so much in universities is that the mathematics faculty are used to it, understand it, and know how to incorporate it into their curriculum.
Between matplotlib+pylab and NumPy I don't think there's much actual difference between Matlab and python other than cultural inertia as suggested by #Adam Bellaire.
I believe you have a very good point and it's one that has been raised in the company where I work. The company is limited in it's ability to apply matlab because of the licensing costs involved. One developer proved that Python was a very suitable replacement but it fell on ignorant ears because to the owners of those ears...
No-one in the company knew Python although many of us wanted to use it.
MatLab has a name, a company, and task force behind it to solve any problems.
There were some (but not a lot) of legacy MatLab projects that would need to be re-written.
If it's worth £10,000 (??) it's gotta be worth it!!
I'm with you here. Python is a very good replacement for MatLab.
I should point out that I've been told the company uses maybe 5% to 10% of MatLabs capabilities and that is the basis for my agreement with the original poster
MATLAB is a fantastic tool for
prototyping
engineering simulation and
fast visualization of data
You can really play with, visualize and test your ideas on a data set very effectively. It should not be regarded as an alternative to other software languages used for product development. I highly recommend it for the above tasks, though it is expensive - free alternatives like Octave and Python are catching up.
Seems to be pure inertia. Where it is in use, everyone is too busy to learn IDL or numpy in sufficient detail to switch, and don't want to rewrite good working programs. Luckily that's not strictly true, but true enough in enough places that Matlab will be around a long time. Like Fortran (in active use where i work!)
The main reason it is useful in industry is the plug-ins built on top of the core functionality. Almost all active Matlab development for the last few years has focused on these.
Unfortunately, you won't have much opportunity to use these in an academic environment.
I know this question is old, and therefore may no longer be
watched, but I felt it was necessary to comment. As an
aerospace engineer at Georgia Tech, I can say, with no
qualms, that MATLAB is awesome. You can have it quickly
interface with your Excel spreadsheets to pull in data about
how high and fast rockets are flying, how the wind affects
those same rockets, and how different engines matter. Beyond
rocketry, similar concepts come into play for cars, trucks,
aircraft, spacecraft, and even athletics. You can pull in
large amounts of data, manipulate all of it, and make sure
your results are as they should be. In the event something is
off, you can add a line break where an error occurs to debug
your program without having to recompile every time you want
to run your program. Is it slower than some other programs?
Well, technically. I'm sure if you want to do the number
crunching it's great for on an NVIDIA graphics processor, it
would probably be faster, but it requires a lot more effort
with harder debugging.
As a general programming language, MATLAB is weak. It's not
meant to work against Python, Java, ActionScript, C/C++ or
any other general purpose language. It's meant for the
engineering and mathematics niche the name implies, and it
does so fantastically.
MATLAB WAS a wrapper around commonly available libraries.
And in many cases it still is. When you get to larger
datasets, it has many additional optimizations, including
examining and special casing common problems (reducing to
sparse matrices where useful, for example), and handling
edge cases. Often, you can submit a problem in a standard
form to a general function, and it will determine the best
underlying algorithm to use based on your data. For small
N, all algorithms are fast, but MATLAB makes determining the
optimal algorithm a non-issue.
This is written by someone who hates MATLAB, and has tried
to replace it due to integration issues. From your
question, you mention getting MATLAB 5 and using it for a
course. At that level, you might want to look at
Octave, an open source implementation with the same
syntax. I'm guessing it is up to MATLAB 5 levels by now (I
only play around with it). That should allow you to "pass
your exam". For bare MATLAB functionality it seems to be
close. It is lacking in the toolbox support (which, again,
mostly serves to reformulate the function calls to forms
familiar to engineers in the field and selects the right
underlying algorithm to use).
One reason MATLAB is popular with universities is the same reason a lot of things are popular with universities: there's a lot of professors familiar with it, and it's fairly robust.
I've spoken to a lot of folks who are especially interested in MATLAB's nascent ability to tap into the GPU instead of working serially. Having used Python in grad school, I kind of wish I had the licks to work with MATLAB in that case. It sure would make vector space calculations a breeze.
It's been some time since I've used Matlab, but from memory it does provide (albeit with extra plugins) the ability to generate source to allow you to realise your algorithm on a DSP.
Since python is a general purpose programming language there is no reason why you couldn't do everything in python that you can do in matlab. However, matlab does provide a number of other tools - eg. a very broad array of dsp features, a broad array of S and Z domain features.
All of these could be hand coded in python (since it's a general purpose language), but if all you're after is the results perhaps spending the money on Matlab is the cheaper option?
These features have also been tuned for performance. eg. The documentation for Numpy specifies that their Fourier transform is optimised for power of 2 point data sets. As I understand Matlab has been written to use the most efficient Fourier transform to suit the size of the data set, not just power of 2.
edit: Oh, and in Matlab you can produce some sensational looking plots very easily, which is important when you're presenting your data. Again, certainly not impossible using other tools.
I think you answered your own question when you noted that Matlab is "cool to work with matrixes and plotting things". Any application that requires a lot of matrix maths and visualisation will probably be easiest to do in Matlab.
That said, Matlab's syntax feels awkward and shows the language's age. In contrast, Python is a much nicer general purpose programming language and, with the right libraries can do much of what Matlab does. However, Matlab is always going to have a more concise syntax than Python for vector and matrix manipulation.
If much of your programming involves these sorts of manipulations, such as in signal processing and some statistical techniques, then Matlab will be a better choice.
First Mover Advantage. Matlab has been around since the late 1970s. Python came along more recently, and the libraries that make it suitable for Matlab type tasks came along even more recently. People are used to Matlab, so they use it.
Matlab is good at doing number crunching. Also Matrix and matrix manipulation. It has many helpful built in libraries(depends on the what version) I think it is easier to use than python if you are going to be calculating equations.

Categories

Resources