Why does python print only one error message? - python

A lot of the languages I know of (like C++, C, Rust, etc.) print multiple error messages at a time. Then why does python print only one error message?

First off, I assume we are talking about syntax errors, i.e. those that can (and should) be detected and reported by the compiler.
It is primarily a design choice. Python is basically built on the notion that everything should be done during runtime. And the compiler is deliberately kept as simple as possible.
Simple and easy to understand, or complex and sophisticated:
Simply put, you have a choice between either using a very simple compiler that is easy to understand and maintain, or you have a complex piece of machinery with sophisticated program analysis and optimisations.
Languages like C, C++, and Rust take their strength from heavily optimising code during compilationg and thus took the second route with highly complex and extremely sophisticated compilers. Dealing with syntax errors is rather one of their less impressive feats.
Python, on the other hand, went to other way. Indeed, in general, it is quite impossible for a Python compiler to predict what exacty a piece of Python code is doing without actually running it—which precludes all the interesting opportunities for optimisation in the first place, and hence a sophisticated compiler would not really make sense, anyway. Keeping Python's compiler simple and rather focus on runtime optimisations is thus the right choice. But it comes with the downside that the compiler simply bails out whenever it discovers an error.
To give a bit more context...
1. Error Recovery
Dealing with errors and recover from a syntax error in a compiler is hard.
Compilers are usually very good at translating (syntactically) correct programs fast and efficiently to machine code that represents the original program. However, if there is a syntax error, it is often impossible for the compiler to guess the original intention of the programmer, and it is therefore not clear what to do with an erroneous piece of code.
Here is a very simple example:
pen color("red")
Obviously, there is something wrong here, but without further context it is impossible to tell, whether the original intent of this line was pen = color("red"), pencolor("red"), pen.color("red") or something else entirely.
If a compiler wants to continue with looking at the rest of the program (and thus discover potentially more syntax errors), it needs a stretegy of how to cope with such situations and recover so as to move on: it needs an error recovery strategy. This might something as simple as just skipping the entire line or individual tokens, but there is no clear-cut "correct" solution to this.
2. Python's Compiler
Python compiles your program one symbol at a time.
Python's current compiler works by looking at one symbol at a time (called a LL(1) compiler). This makes it extremely simple to automatically build the compiler for Python, and it is quite fast and efficient. But it means that there are situations where, despite an "obvious" syntax error, Python happily moves on compiling the program until it is really lost.
Take a look at this example:
x = foo(
y = bar()
if x > y:
As humans, we quickly see the missing closing parenthesis in line 1. However, from the compiler's perspective, this looks rather like a call with a named argument, something like this:
x = foo(y = bar() if x > y else 0)
Accordingly, Python will only notice that something is wrong when it hits the colon in line 3—the first symbol that does not work with its "assumption". But at that point, it is extremely hard to figure out what to do with this piece of code, and how to correctly recover: do you just skip the colon in this case? Or should you go back and correct something earlier on—and if so, how far back are you going?
3. Follow-up Errors
Error recovery can create "ghost" errors.
In the first example above, the compiler could just skip the entire line and move on without any issue. But there are situations, where the choice of how to recover from a syntax errors influences (potentially) everything that follows, as in this example:
deffoo(x):
The intent behind this could either be def foo(x): or simply a call deffoo(x). But this distinction determines how the compiler will look at the code that follows, and either report an indentation error, or perhaps a return outside a function, etc.
The danger of error recovery is that the compiler's guess might actually be wrong, which could lead to a whole series of reported follow-up errors—which might not even be true errors, but rather ghosts created by the compiler's wrong decision.
Bottom-line: getting error recovery and error reporting right is extremely hard. Python's choice to only report the first syntax error it encounters is thus sensible and works for most users and situations just fine.
I have actually written a parser with more sophisticated error detection, which can list all errors it discovers in a Python program. But to my experience, too many of the additional errors beyond the first one are just rubbish, and I therefore always stuck to displaying only the first error in a program.

Difficult to answer the exact why. I cannot look into the head of the developers of the C-python, jython, pypy or others.
Many errors will only be seen at run time as python is an interpreted language without strict typing.
However each file is compiled to byte code if there is no syntax error.
So for syntax errors I cannot give you the reason, as it should have been technically possible. However I never had issues with this as I use tools like pylint or flake8 to check the code for me.
These tools can detect multiple errors and give a lot of warnings about coding style as well.
So I cannot tell you the why, but only what to do to get multiple errors in one go.
These tools can be configured to just display certain kinds of errors.
to install one or the other tool, just type:
pip install flake8 or pip install pylint
Then type just flake8 or pylint in the directory where all your code is located
or type flake8 <filename> or pylint <filename> to check just one file.
Please note, that many IDEs like for example Microsoft Visual Studio Code, Pycharm, and others can be configured to run these tools automatically for you and signal any issues even before you execute your code.

C or C++ are use compiler to compiled and then execute but python is an interpreted language it means that the interpreter read each line and then execute it when the interpreter see error in on line of program it stopped and display the error in other interpreted language like JS it is the same.
I hope your problem solved but if you want to read more you can google "Interpreted and compiled languages" or see this

Related

How to view the implementation of python's built-in functions in pycharm?

When I try to view the built-in function all() in PyCharm, I could just see "pass" in the function body. How to view the actual implementation so that I could know what exactly the built-in function is doing?
def all(*args, **kwargs): # real signature unknown
"""
Return True if bool(x) is True for all values x in the iterable.
If the iterable is empty, return True.
"""
pass
Assuming you’re using the usual CPython interpreter, all is a builtin function object, which just has a pointer to a compiled function statically linked into the interpreter (or libpython). Showing you the x86_64 machine code at that address probably wouldn’t be very useful to the vast majority of people.
Try running your code in PyPy instead of CPython. Many things that are builtins in CPython are plain old Python code in PyPy.1 Of course that isn’t always an option (e.g., PyPy doesn’t support 3.7 features yet, there are a few third-party extension modules that are still way too slow to use, it’s harder to build yourself if you’re on some uncommon platform…), so let’s go back to CPython.
The actual C source for that function isn’t too hard to find online. It’s in bltinmodule.c. But, unlike the source code to the Python modules in your standard library, you probably don’t have these files around. Even if you do have them, the only way to connect the binary to the source is through debugging output emitted when you compiled CPython from that source, which you probably didn’t do. But if you’re thinking that sounds like a great idea—it is. Build CPython yourself (you may want a Py_DEBUG build), and then you can just run it in your C source debugger/IDE and it can handle all the clumsy bits.
But if that sounds more scary than helpful, even though you can read basic C code and would like to find it…
How did I know where to find that code on GitHub? Well, I know where the repo is; I know the basic organization of the source into Python, Objects, Modules, etc.; I know how module names usually map to C source file names; I know that builtins is special in a few ways…
That’s all pretty simple stuff. Couldn’t you just program all that knowledge into a script, which you could then build a PyCharm plugin out of?
You can do the first 50% or so in a quick evening hack, and such things litter the shores of GitHub. But actually doing it right requires handling a ton of special cases, parsing some ugly C code, etc. And for anyone capable of writing such a thing, it’s easier to just lldb Python than to write it.
1. Also, even the things that are builtins are written in a mix of Python and a Python subset called RPython, which you might find easier to understand than C—then again, it’s often even harder to find that source, and the multiple levels that all look like Python can be hard to keep straight.

Are there Python statical analysis/validation tools?

I've never been a huge Python fan. I learned it for a course where the teacher was really into it, but his enthusiasm never quite made it to the rest of our class it seems: as soon as we had the chance, we all jumped off to C#/Java.
Anyways. This wasn't a concluding experience, and what annoyed me the most in the language was that to find out if Python code would work, you actually have to execute it, and risk dying halfway through because of something stupid like a typo in a variable name (throwing up a NameError). Stuff that compilers for compiled languages catch at the very first glance, but that Python won't bother to complain about until it's too late. (I know you can always die half through a test with compiled programs too, but at least it won't be from a typo.)
I'm not really giving it a second chance yet, but for the sake of the next students, are there Python statical analysis or validation tools out there that would catch most errors (I understand you can't catch them all) compilers would catch at compile-time?
"but that Python won't bother to complain about until it's too late"
It's not that the message comes too late. It's that you're waiting too long to use Python. Don't type a mountain of code and then complain that one small piece is bad.
Use Unit Testing. Write less code before running a test.
Use python Interactively to experiment. You can do most statistical processing from the >>> prompt.
Don't write long, main-program-like scripts. Write short scripts -- in small pieces -- and test the small pieces.
Take a look at the following programs:
pylint
pyflakes
pychecker
In addition to the ones mentioned by ars.
Try Pydev, it has static code analysis build-in. Or Pida which has a couple of different static analysis tools available.
Or if you are looking for a standalone library, try Rope

How can I make sure all my Python code "compiles"?

My background is C and C++. I like Python a lot, but there's one aspect of it (and other interpreted languages I guess) that is really hard to work with when you're used to compiled languages.
When I've written something in Python and come to the point where I can run it, there's still no guarantee that no language-specific errors remain. For me that means that I can't rely solely on my runtime defense (rigorous testing of input, asserts etc.) to avoid crashes, because in 6 months when some otherwise nice code finally gets run, it might crack due to some stupid typo.
Clearly a system should be tested enough to make sure all code has been run, but most of the time I use Python for in-house scripts and small tools, which ofcourse never gets the QA attention they need. Also, some code is so simple that (if your background is C/C++) you know it will work fine as long as it compiles (e.g. getter-methods inside classes, usually a simple return of a member variable).
So, my question is the obvious - is there any way (with a special tool or something) I can make sure all the code in my Python script will "compile" and run?
Look at PyChecker and PyLint.
Here's example output from pylint, resulting from the trivial program:
print a
As you can see, it detects the undefined variable, which py_compile won't (deliberately).
in foo.py:
************* Module foo
C: 1: Black listed name "foo"
C: 1: Missing docstring
E: 1: Undefined variable 'a'
...
|error |1 |1 |= |
Trivial example of why tests aren't good enough, even if they cover "every line":
bar = "Foo"
foo = "Bar"
def baz(X):
return bar if X else fo0
print baz(input("True or False: "))
EDIT: PyChecker handles the ternary for me:
Processing ternary...
True or False: True
Foo
Warnings...
ternary.py:6: No global (fo0) found
ternary.py:8: Using input() is a security problem, consider using raw_input()
Others have mentioned tools like PyLint which are pretty good, but the long and the short of it is that it's simply not possible to do 100%. In fact, you might not even want to do it. Part of the benefit to Python's dynamicity is that you can do crazy things like insert names into the local scope through a dictionary access.
What it comes down to is that if you want a way to catch type errors at compile time, you shouldn't use Python. A language choice always involves a set of trade-offs. If you choose Python over C, just be aware that you're trading a strong type system for faster development, better string manipulation, etc.
I think what you are looking for is code test line coverage. You want to add tests to your script that will make sure all of your lines of code, or as many as you have time to, get tested. Testing is a great deal of work, but if you want the kind of assurance you are asking for, there is no free lunch, sorry :( .
If you are using Eclipse with Pydev as an IDE, it can flag many typos for you with red squigglies immediately, and has Pylint integration too. For example:
foo = 5
print food
will be flagged as "Undefined variable: food". Of course this is not always accurate (perhaps food was defined earlier using setattr or other exotic techniques), but it works well most of the time.
In general, you can only statically analyze your code to the extent that your code is actually static; the more dynamic your code is, the more you really do need automated testing.
Your code actually gets compiled when you run it, the Python runtime will complain if there is a syntax error in the code. Compared to statically compiled languages like C/C++ or Java, it does not check whether variable names and types are correct – for that you need to actually run the code (e.g. with automated tests).

Partial evaluation for parsing

I'm working on a macro system for Python (as discussed here) and one of the things I've been considering are units of measure. Although units of measure could be implemented without macros or via static macros (e.g. defining all your units ahead of time), I'm toying around with the idea of allowing syntax to be extended dynamically at runtime.
To do this, I'm considering using a sort of partial evaluation on the code at compile-time. If parsing fails for a given expression, due to a macro for its syntax not being available, the compiler halts evaluation of the function/block and generates the code it already has with a stub where the unknown expression is. When this stub is hit at runtime, the function is recompiled against the current macro set. If this compilation fails, a parse error would be thrown because execution can't continue. If the compilation succeeds, the new function replaces the old one and execution continues.
The biggest issue I see is that you can't find parse errors until the affected code is run. However, this wouldn't affect many cases, e.g. group operators like [], {}, (), and `` still need to be paired (requirement of my tokenizer/list parser), and top-level syntax like classes and functions wouldn't be affected since their "runtime" is really load time, where the syntax is evaluated and their objects are generated.
Aside from the implementation difficulty and the problem I described above, what problems are there with this idea?
Here are a few possible problems:
You may find it difficult to provide the user with helpful error messages in case of a problem. This seems likely, as any compilation-time syntax error could be just a syntax extension.
Performance hit.
I was trying to find some discussion of the pluses, minuses, and/or implementation of dynamic parsing in Perl 6, but I couldn't find anything appropriate. However, you may find this quote from Nicklaus Wirth (designer of Pascal and other languages) interesting:
The phantasies of computer scientists
in the 1960s knew no bounds. Spurned
by the success of automatic syntax
analysis and parser generation, some
proposed the idea of the flexible, or
at least extensible language. The
notion was that a program would be
preceded by syntactic rules which
would then guide the general parser
while parsing the subsequent program.
A step further: The syntax rules would
not only precede the program, but they
could be interspersed anywhere
throughout the text. For example, if
someone wished to use a particularly
fancy private form of for statement,
he could do so elegantly, even
specifying different variants for the
same concept in different sections of
the same program. The concept that
languages serve to communicate between
humans had been completely blended
out, as apparently everyone could now
define his own language on the fly.
The high hopes, however, were soon
damped by the difficulties encountered
when trying to specify, what these
private constructions should mean. As
a consequence, the intreaguing idea of
extensible languages faded away rather
quickly.
Edit: Here's Perl 6's Synopsis 6: Subroutines, unfortunately in markup form because I couldn't find an updated, formatted version; search within for "macro". Unfortunately, it's not too interesting, but you may find some things relevant, like Perl 6's one-pass parsing rule, or its syntax for abstract syntax trees. The approach Perl 6 takes is that a macro is a function that executes immediately after its arguments are parsed and returns either an AST or a string; Perl 6 continues parsing as if the source actually contained the return value. There is mention of generation of error messages, but they make it seem like if macros return ASTs, you can do alright.
Pushing this one step further, you could do "lazy" parsing and always only parse enough to evaluate the next statement. Like some kind of just-in-time parser. Then syntax errors could become normal runtime errors that just raise a normal Exception that could be handled by surrounding code:
def fun():
not implemented yet
try:
fun()
except:
pass
That would be an interesting effect, but if it's useful or desirable is a different question. Generally it's good to know about errors even if you don't call the code at the moment.
Macros would not be evaluated until control reaches them and naturally the parser would already know all previous definitions. Also the macro definition could maybe even use variables and data that the program has calculated so far (like adding some syntax for all elements in a previously calculated list). But this is probably a bad idea to start writing self-modifying programs for things that could usually be done as well directly in the language. This could get confusing...
In any case you should make sure to parse code only once, and if it is executed a second time use the already parsed expression, so that it doesn't lead to performance problems.
Here are some ideas from my master's thesis, which may or may not be helpful.
The thesis was about robust parsing of natural language.
The main idea: given a context-free grammar for a language, try to parse a given
text (or, in your case, a python program). If parsing failed, you will have a partially generated parse tree. Use the tree structure to suggest new grammar rules that will better cover the parsed text.
I could send you my thesis, but unless you read Hebrew this will probably not be useful.
In a nutshell:
I used a bottom-up chart parser. This type of parser generates edges for productions from the grammar. Each edge is marked with the part of the tree that was consumed. Each edge gets a score according to how close it was to full coverage, for example:
S -> NP . VP
Has a score of one half (We succeeded in covering the NP but not the VP).
The highest-scored edges suggest a new rule (such as X->NP).
In general, a chart parser is less efficient than a common LALR or LL parser (the types usually used for programming languages) - O(n^3) instead of O(n) complexity, but then again you are trying something more complicated than just parsing an existing language.
If you can do something with the idea, I can send you further details.
I believe looking at natural language parsers may give you some other ideas.
Another thing I've considered is making this the default behavior across the board, but allow languages (meaning a set of macros to parse a given language) to throw a parse error at compile-time. Python 2.5 in my system, for example, would do this.
Instead of the stub idea, simply recompile functions that couldn't be handled completely at compile-time when they're executed. This will also make self-modifying code easier, as you can modify the code and recompile it at runtime.
You'll probably need to delimit the bits of input text with unknown syntax, so that the rest of the syntax tree can be resolved, apart from some character sequences nodes which will be expanded later. Depending on your top level syntax, that may be fine.
You may find that the parsing algorithm and the lexer and the interface between them all need updating, which might rule out most compiler creation tools.
(The more usual approach is to use string constants for this purpose, which can be parsed to a little interpreter at run time).
I don't think your approach would work very well. Let's take a simple example written in pseudo-code:
define some syntax M1 with definition D1
if _whatever_:
define M1 to do D2
else:
define M1 to do D3
code that uses M1
So there is one example where, if you allow syntax redefinition at runtime, you have a problem (since by your approach the code that uses M1 would be compiled by definition D1). Note that verifying if syntax redefinition occurs is undecidable. An over-approximation could be computed by some kind of typing system or some other kind of static analysis, but Python is not well known for this :D.
Another thing that bothers me is that your solution does not 'feel' right. I find it evil to store source code you can't parse just because you may be able to parse it at runtime.
Another example that jumps to mind is this:
...function definition fun1 that calls fun2...
define M1 (at runtime)
use M1
...function definition for fun2
Technically, when you use M1, you cannot parse it, so you need to keep the rest of the program (including the function definition of fun2) in source code. When you run the entire program, you'll see a call to fun2 that you cannot call, even if it's defined.

When to use the Python debugger

Since Python is a dynamic, interpreted language you don't have to compile your code before running it. Hence, it's very easy to simply write your code, run it, see what problems occur, and fix them. Using hotkeys or macros can make this incredibly quick.
So, because it's so easy to immediately see the output of your program and any errors that may occur, I haven't uses a debugger tool yet. What situations may call for using a real debugger vs. the method I currently use?
I'd like to know before I get into a situation and get frustrated because I don't know how to fix the problem.
In 30 years of programming I've used a debugger exactly 4 times. All four times were to read the core file produced from a C program crashing to locate the traceback information that's buried in there.
I don't think debuggers help much, even in compiled languages. Many people like debuggers, there are some reasons for using them, I'm sure, or people wouldn't lavish such love and care on them.
Here's the point -- software is knowledge capture.
Yes, it does have to run. More importantly, however, software has meaning.
This is not an indictment of your use of a debugger. However, I find that the folks who rely on debugging will sometimes produce really odd-looking code and won't have a good justification for what it means. They can only say "it may be a hack, but it works."
My suggestion on debuggers is "don't bother".
"But, what if I'm totally stumped?" you ask, "should I learn the debugger then?" Totally stumped by what? The language? Python's too simple for utter befuddlement. Some library? Perhaps.
Here's what you do -- with or without a debugger.
You have the source, read it.
You write small tests to exercise the library. Using the interactive shell, if possible. [All the really good libraries seem to show their features using the interactive Python mode -- I strive for this level of tight, clear simplicity.]
You have the source, add print functions.
I use pdb for basic python debugging. Some of the situations I use it are:
When you have a loop iterating over 100,000 entries and want to break at a specific point, it becomes really helpful.(conditional breaks)
Trace the control flow of someone else's code.
Its always better to use a debugger than litter the code with prints.
Normally there can be more than one point of failures resulting in a bug, all are not obvious in the first look. So you look for obvious places, if nothing is wrong there, you move ahead and add some more prints.. debugger can save you time here, you dont need to add the print and run again.
Usually when the error is buried in some function, but I don't know exactly what or where. Either I insert dozens of log.debug() calls and then have to take them back out, or just put in:
import pdb
pdb.set_trace ()
and then run the program. The debugger will launch when it reaches that point and give me a full REPL to poke around in.
Any time you want to inspect the contents of variables that may have caused the error. The only way you can do that is to stop execution and take a look at the stack.
pydev in Eclipse is a pretty good IDE if you are looking for one.
I find it very useful to drop into a debugger in a failing test case.
I add import pdb; pdb.set_trace() just before the failure point of the test. The test runs, building up a potentially quite large context (e.g. importing a database fixture or constructing an HTTP request). When the test reaches the pdb.set_trace() line, it drops into the interactive debugger and I can inspect the context in which the failure occurs with the usual pdb commands looking for clues as to the cause.
You might want to take a look at this other SO post:
Why is debugging better in an IDE?
It's directly relevant to what you're asking about.

Categories

Resources