Is Python interpreted or compiled? [duplicate] - python

From my understanding:
An interpreted language is a high-level language run and executed by an interpreter (a program which converts the high-level language to machine code and then executing) on the go; it processes the program a little at a time.
A compiled language is a high-level language whose code is first converted to machine-code by a compiler (a program which converts the high-level language to machine code) and then executed by an executor (another program for running the code).
Correct me if my definitions are wrong.
Now coming back to Python, I am bit confused about this. Everywhere you learn that Python is an interpreted language, but it's interpreted to some intermediate code (like byte-code or IL) and not to the machine code. So which program then executes the IM code? Please help me understand how a Python script is handled and run.

First off, interpreted/compiled is not a property of the language but a property of the implementation. For most languages, most if not all implementations fall in one category, so one might save a few words saying the language is interpreted/compiled too, but it's still an important distinction, both because it aids understanding and because there are quite a few languages with usable implementations of both kinds (mostly in the realm of functional languages, see Haskell and ML). In addition, there are C interpreters and projects that attempt to compile a subset of Python to C or C++ code (and subsequently to machine code).
Second, compilation is not restricted to ahead-of-time compilation to native machine code. A compiler is, more generally, a program that converts a program in one programming language into a program in another programming language (arguably, you can even have a compiler with the same input and output language if significant transformations are applied). And JIT compilers compile to native machine code at runtime, which can give speed very close to or even better than ahead of time compilation (depending on the benchmark and the quality of the implementations compared).
But to stop nitpicking and answer the question you meant to ask: Practically (read: using a somewhat popular and mature implementation), Python is compiled. Not compiled to machine code ahead of time (i.e. "compiled" by the restricted and wrong, but alas common definition), "only" compiled to bytecode, but it's still compilation with at least some of the benefits. For example, the statement a = b.c() is compiled to a byte stream which, when "disassembled", looks somewhat like load 0 (b); load_str 'c'; get_attr; call_function 0; store 1 (a). This is a simplification, it's actually less readable and a bit more low-level - you can experiment with the standard library dis module and see what the real deal looks like. Interpreting this is faster than interpreting from a higher-level representation.
That bytecode is either interpreted (note that there's a difference, both in theory and in practical performance, between interpreting directly and first compiling to some intermediate representation and interpret that), as with the reference implementation (CPython), or both interpreted and compiled to optimized machine code at runtime, as with PyPy.

The CPU can only understand machine code indeed. For interpreted programs, the ultimate goal of an interpreter is to "interpret" the program code into machine code. However, usually a modern interpreted language does not interpret human code directly because it is too inefficient.
The Python interpreter first reads the human code and optimizes it to some intermediate code before interpreting it into machine code. That's why you always need another program to run a Python script, unlike in C++ where you can run the compiled executable of your code directly. For example, c:\Python27\python.exe or /usr/bin/python.

The answer depends on what implementation of python is being used. If you are using lets say CPython (The Standard implementation of python) or Jython (Targeted for integration with java programming language)it is first translated into bytecode, and depending on the implementation of python you are using, this bycode is directed to the corresponding virtual machine for interpretation. PVM (Python Virtual Machine) for CPython and JVM (Java Virtual Machine) for Jython.
But lets say you are using PyPy which is another standard CPython implementation. It would use a Just-In-Time Compiler.

According to the official Python site, it's interpreted.
https://www.python.org/doc/essays/blurb/
Python is an interpreted, object-oriented, high-level programming language...
...
Since there is no compilation step ...
...
The Python interpreter and the extensive standard library are available...
...
Instead, when the interpreter discovers an error, it raises an
exception. When the program doesn't catch the exception, the
interpreter prints a stack trace.

Yes, it is both compiled and interpreted language. Then why we generally call it as interpreted language?
see how it is both- compiled and interpreted?
First of all I want to tell that you will like my answer more if you are from the Java world.
In the Java the source code first gets converted to the byte code through javac compiler then directed to the JVM(responsible for generating the native code for execution purpose). Now I want to show you that we call the Java as compiled language because we can see that it really compiles the source code and gives the .class file(nothing but bytecode) through:
javac Hello.java -------> produces Hello.class file
java Hello -------->Directing bytecode to JVM for execution purpose
The same thing happens with python i.e. first the source code gets converted to the bytecode through the compiler then directed to the PVM(responsible for generating the native code for execution purpose). Now I want to show you that we usually call the Python as an interpreted language because the compilation happens behind the scene
and when we run the python code through:
python Hello.py -------> directly excutes the code and we can see the output provied that code is syntactically correct
# python Hello.py it looks like it directly executes but really it first generates the bytecode that is interpreted by the interpreter to produce the native code for the execution purpose.
CPython- Takes the responsibility of both compilation and interpretation.
Look into the below lines if you need more detail:
As I mentioned that CPython compiles the source code but actual compilation happens with the help of cython then interpretation happens with the help of CPython
Now let's talk a little bit about the role of Just-In-Time compiler in Java and Python
In JVM the Java Interpreter exists which interprets the bytecode line by line to get the native machine code for execution purpose but when Java bytecode is executed by an interpreter, the execution will always be slower. So what is the solution? the solution is Just-In-Time compiler which produces the native code which can be executed much more quickly than that could be interpreted. Some JVM vendors use Java Interpreter and some use Just-In-Time compiler. Reference: click here
In python to get around the interpreter to achieve the fast execution use another python implementation(PyPy) instead of CPython.
click here for other implementation of python including PyPy.

Its a big confusion for people who just started working in python and the answers here are a little difficult to comprehend so i'll make it easier.
When we instruct Python to run our script, there are a few steps that Python carries out before our code actually starts crunching away:
It is compiled to bytecode.
Then it is routed to virtual machine.
When we execute some source code, Python compiles it into byte code. Compilation is a translation step, and the byte code is a low-level platform-independent representation of source code.
Note that the Python byte code is not binary machine code (e.g., instructions for an Intel chip).
Actually, Python translate each statement of the source code into byte code instructions by decomposing them into individual steps. The byte code translation is performed to speed execution.
Byte code can be run much more quickly than the original source code statements. It has.pyc extension and it will be written if it can write to our machine.
So, next time we run the same program, Python will load the .pyc file and skip the compilation step unless it's been changed. Python automatically checks the timestamps of source and byte code files to know when it must recompile. If we resave the source code, byte code is automatically created again the next time the program is run.
If Python cannot write the byte code files to our machine, our program still works. The byte code is generated in memory and simply discarded on program exit. But because .pyc files speed startup time, we may want to make sure it has been written for larger programs.
Let's summarize what happens behind the scenes.
When Python executes a program, Python reads the .py into memory, and parses it in order to get a bytecode, then goes on to execute. For each module that is imported by the program, Python first checks to see whether there is a precompiled bytecode version, in a .pyo or .pyc, that has a timestamp which corresponds to its .py file. Python uses the bytecode version if any. Otherwise, it parses the module's .py file, saves it into a .pyc file, and uses the bytecode it just created.
Byte code files are also one way of shipping Python codes. Python will still run a program if all it can find are.pyc files, even if the original .py source files are not there.
Python Virtual Machine (PVM)
Once our program has been compiled into byte code, it is shipped off for execution to Python Virtual Machine (PVM). The PVM is not a separate program. It need not be installed by itself. Actually, the PVM is just a big loop that iterates through our byte code instruction, one by one, to carry out their operations. The PVM is the runtime engine of Python. It's always present as part of the Python system. It's the component that truly runs our scripts. Technically it's just the last step of what is called the Python interpreter.

If ( You know Java ) {
Python code is converted into bytecode like java does.
That bytecode is executed again everytime you try to access it.
} else {
Python code is initially traslated into something called bytecode that is quite
close to machine language but not actual machine code
so each time we access or run it that bytecode is executed again
}

It really depends on the implementation of the language being used! There is a common step in any implementation, though: your code is first compiled (translated) to intermediate code - something between your code and machine (binary) code - called bytecode (stored into .pyc files). Note that this is a one-time step that will not be repeated unless you modify your code.
And that bytecode is executed every time you are running the program. How? Well, when we run the program, this bytecode (inside a .pyc file) is passed as input to a Virtual Machine (VM)1 - the runtime engine allowing our programs to be executed - that executes it.
Depending on the language implementation, the VM will either interpret the bytecode (in the case of CPython2 implementation) or JIT-compile3 it (in the case of PyPy4 implementation).
Notes:
1 an emulation of a computer system
2 a bytecode interpreter; the reference implementation of the language, written in C and Python - most widely used
3 compilation that is being done during the execution of a program (at runtime)
4 a bytecode JIT compiler; an alternative implementation to CPython, written in RPython (Restricted Python) - often runs faster than CPython

Almost, we can say Python is interpreted language. But we are using some part of one time compilation process in python to convert complete source code into byte-code like java language.

For newbies
Python automatically compiles your script to compiled code, so called byte code, before running it.
Running a script is not considered an import and no .pyc will be created.
For example, if you have a script file abc.py that imports another module xyz.py, when you run abc.py, xyz.pyc will be created since xyz is imported, but no abc.pyc file will be created since abc.py isn’t being imported.

Python(the interpreter) is compiled.
Proof: It won't even compile your code if it contains syntax error.
Example 1:
print("This should print")
a = 9/0
Output:
This should print
Traceback (most recent call last):
File "p.py", line 2, in <module>
a = 9/0
ZeroDivisionError: integer division or modulo by zero
Code gets compiled successfully. First line gets executed (print) second line throws ZeroDivisionError (run time error) .
Example 2:
print("This should not print")
/0
Output:
File "p.py", line 2
/0
^
SyntaxError: invalid syntax
Conclusion: If your code file contains SyntaxError nothing will execute as compilation fails.

As sone one already said, "interpreted/compiled is not a property of the language but a property of the implementation." Python can be used in interpretation mode as well as compilation mode. When you run python code directly from terminal or cmd then the python interpreter starts. Now if you write any command then this command will be directly interpreted. If you use a file containing Python code and running it in IDE or using a command prompt it will be compiled first the whole code will be converted to byte code and then it will run. So it depends on how we use it.

The python code you write is compiled into python bytecode, which creates file with extension .pyc. If compiles, again question is, why not compiled language.
Note that this isn't compilation in the traditional sense of the word. Typically, we’d say that compilation is taking a high-level language and converting it to machine code. But it is a compilation of sorts. Compiled in to intermediate code not into machine code (Hope you got it Now).
Back to the execution process, your bytecode, present in pyc file, created in compilation step, is then executed by appropriate virtual machines, in our case, the CPython VM
The time-stamp (called as magic number) is used to validate whether .py file is changed or not, depending on that new pyc file is created. If pyc is of current code then it simply skips compilation step.

Seems to be a case of semantics. I think most of us infer that the usual result of compiling is machine-code. With that in mind I say to myself that python is not compiled. I would be wrong though because compile really means convert to a lower level so converting from source to byte-code is also compiling.

In my opinion Python is put into an interpreter categoty because its designed to be capable of fully processing ( from python code to execution in cpu) individual python statement. I.e. you write one statement and you can execute it and if no errors then get the corresponding result.
Having an intermediate code (like bytecode) i believe doesnt make difference to categorize it overall as compiler. Though this component (intermediate code generation) is typically been part of compiler but it can also be used in interpreters. See wiki definition of interpreter https://en.m.wikipedia.org/wiki/Interpreter_(computing). Its a crucial piece to add efficiency in terms of execution speed. With cache its even more powerful so that if you havent changed code in current program scope you skip heavy processing steps like lexical, semantic analysis and even some of code optimization.

Related

Are syntax errors in Python found at 'compile time' or 'runtime'? [duplicate]

From my understanding:
An interpreted language is a high-level language run and executed by an interpreter (a program which converts the high-level language to machine code and then executing) on the go; it processes the program a little at a time.
A compiled language is a high-level language whose code is first converted to machine-code by a compiler (a program which converts the high-level language to machine code) and then executed by an executor (another program for running the code).
Correct me if my definitions are wrong.
Now coming back to Python, I am bit confused about this. Everywhere you learn that Python is an interpreted language, but it's interpreted to some intermediate code (like byte-code or IL) and not to the machine code. So which program then executes the IM code? Please help me understand how a Python script is handled and run.
First off, interpreted/compiled is not a property of the language but a property of the implementation. For most languages, most if not all implementations fall in one category, so one might save a few words saying the language is interpreted/compiled too, but it's still an important distinction, both because it aids understanding and because there are quite a few languages with usable implementations of both kinds (mostly in the realm of functional languages, see Haskell and ML). In addition, there are C interpreters and projects that attempt to compile a subset of Python to C or C++ code (and subsequently to machine code).
Second, compilation is not restricted to ahead-of-time compilation to native machine code. A compiler is, more generally, a program that converts a program in one programming language into a program in another programming language (arguably, you can even have a compiler with the same input and output language if significant transformations are applied). And JIT compilers compile to native machine code at runtime, which can give speed very close to or even better than ahead of time compilation (depending on the benchmark and the quality of the implementations compared).
But to stop nitpicking and answer the question you meant to ask: Practically (read: using a somewhat popular and mature implementation), Python is compiled. Not compiled to machine code ahead of time (i.e. "compiled" by the restricted and wrong, but alas common definition), "only" compiled to bytecode, but it's still compilation with at least some of the benefits. For example, the statement a = b.c() is compiled to a byte stream which, when "disassembled", looks somewhat like load 0 (b); load_str 'c'; get_attr; call_function 0; store 1 (a). This is a simplification, it's actually less readable and a bit more low-level - you can experiment with the standard library dis module and see what the real deal looks like. Interpreting this is faster than interpreting from a higher-level representation.
That bytecode is either interpreted (note that there's a difference, both in theory and in practical performance, between interpreting directly and first compiling to some intermediate representation and interpret that), as with the reference implementation (CPython), or both interpreted and compiled to optimized machine code at runtime, as with PyPy.
The CPU can only understand machine code indeed. For interpreted programs, the ultimate goal of an interpreter is to "interpret" the program code into machine code. However, usually a modern interpreted language does not interpret human code directly because it is too inefficient.
The Python interpreter first reads the human code and optimizes it to some intermediate code before interpreting it into machine code. That's why you always need another program to run a Python script, unlike in C++ where you can run the compiled executable of your code directly. For example, c:\Python27\python.exe or /usr/bin/python.
The answer depends on what implementation of python is being used. If you are using lets say CPython (The Standard implementation of python) or Jython (Targeted for integration with java programming language)it is first translated into bytecode, and depending on the implementation of python you are using, this bycode is directed to the corresponding virtual machine for interpretation. PVM (Python Virtual Machine) for CPython and JVM (Java Virtual Machine) for Jython.
But lets say you are using PyPy which is another standard CPython implementation. It would use a Just-In-Time Compiler.
Yes, it is both compiled and interpreted language. Then why we generally call it as interpreted language?
see how it is both- compiled and interpreted?
First of all I want to tell that you will like my answer more if you are from the Java world.
In the Java the source code first gets converted to the byte code through javac compiler then directed to the JVM(responsible for generating the native code for execution purpose). Now I want to show you that we call the Java as compiled language because we can see that it really compiles the source code and gives the .class file(nothing but bytecode) through:
javac Hello.java -------> produces Hello.class file
java Hello -------->Directing bytecode to JVM for execution purpose
The same thing happens with python i.e. first the source code gets converted to the bytecode through the compiler then directed to the PVM(responsible for generating the native code for execution purpose). Now I want to show you that we usually call the Python as an interpreted language because the compilation happens behind the scene
and when we run the python code through:
python Hello.py -------> directly excutes the code and we can see the output provied that code is syntactically correct
# python Hello.py it looks like it directly executes but really it first generates the bytecode that is interpreted by the interpreter to produce the native code for the execution purpose.
CPython- Takes the responsibility of both compilation and interpretation.
Look into the below lines if you need more detail:
As I mentioned that CPython compiles the source code but actual compilation happens with the help of cython then interpretation happens with the help of CPython
Now let's talk a little bit about the role of Just-In-Time compiler in Java and Python
In JVM the Java Interpreter exists which interprets the bytecode line by line to get the native machine code for execution purpose but when Java bytecode is executed by an interpreter, the execution will always be slower. So what is the solution? the solution is Just-In-Time compiler which produces the native code which can be executed much more quickly than that could be interpreted. Some JVM vendors use Java Interpreter and some use Just-In-Time compiler. Reference: click here
In python to get around the interpreter to achieve the fast execution use another python implementation(PyPy) instead of CPython.
click here for other implementation of python including PyPy.
According to the official Python site, it's interpreted.
https://www.python.org/doc/essays/blurb/
Python is an interpreted, object-oriented, high-level programming language...
...
Since there is no compilation step ...
...
The Python interpreter and the extensive standard library are available...
...
Instead, when the interpreter discovers an error, it raises an
exception. When the program doesn't catch the exception, the
interpreter prints a stack trace.
Its a big confusion for people who just started working in python and the answers here are a little difficult to comprehend so i'll make it easier.
When we instruct Python to run our script, there are a few steps that Python carries out before our code actually starts crunching away:
It is compiled to bytecode.
Then it is routed to virtual machine.
When we execute some source code, Python compiles it into byte code. Compilation is a translation step, and the byte code is a low-level platform-independent representation of source code.
Note that the Python byte code is not binary machine code (e.g., instructions for an Intel chip).
Actually, Python translate each statement of the source code into byte code instructions by decomposing them into individual steps. The byte code translation is performed to speed execution.
Byte code can be run much more quickly than the original source code statements. It has.pyc extension and it will be written if it can write to our machine.
So, next time we run the same program, Python will load the .pyc file and skip the compilation step unless it's been changed. Python automatically checks the timestamps of source and byte code files to know when it must recompile. If we resave the source code, byte code is automatically created again the next time the program is run.
If Python cannot write the byte code files to our machine, our program still works. The byte code is generated in memory and simply discarded on program exit. But because .pyc files speed startup time, we may want to make sure it has been written for larger programs.
Let's summarize what happens behind the scenes.
When Python executes a program, Python reads the .py into memory, and parses it in order to get a bytecode, then goes on to execute. For each module that is imported by the program, Python first checks to see whether there is a precompiled bytecode version, in a .pyo or .pyc, that has a timestamp which corresponds to its .py file. Python uses the bytecode version if any. Otherwise, it parses the module's .py file, saves it into a .pyc file, and uses the bytecode it just created.
Byte code files are also one way of shipping Python codes. Python will still run a program if all it can find are.pyc files, even if the original .py source files are not there.
Python Virtual Machine (PVM)
Once our program has been compiled into byte code, it is shipped off for execution to Python Virtual Machine (PVM). The PVM is not a separate program. It need not be installed by itself. Actually, the PVM is just a big loop that iterates through our byte code instruction, one by one, to carry out their operations. The PVM is the runtime engine of Python. It's always present as part of the Python system. It's the component that truly runs our scripts. Technically it's just the last step of what is called the Python interpreter.
If ( You know Java ) {
Python code is converted into bytecode like java does.
That bytecode is executed again everytime you try to access it.
} else {
Python code is initially traslated into something called bytecode that is quite
close to machine language but not actual machine code
so each time we access or run it that bytecode is executed again
}
It really depends on the implementation of the language being used! There is a common step in any implementation, though: your code is first compiled (translated) to intermediate code - something between your code and machine (binary) code - called bytecode (stored into .pyc files). Note that this is a one-time step that will not be repeated unless you modify your code.
And that bytecode is executed every time you are running the program. How? Well, when we run the program, this bytecode (inside a .pyc file) is passed as input to a Virtual Machine (VM)1 - the runtime engine allowing our programs to be executed - that executes it.
Depending on the language implementation, the VM will either interpret the bytecode (in the case of CPython2 implementation) or JIT-compile3 it (in the case of PyPy4 implementation).
Notes:
1 an emulation of a computer system
2 a bytecode interpreter; the reference implementation of the language, written in C and Python - most widely used
3 compilation that is being done during the execution of a program (at runtime)
4 a bytecode JIT compiler; an alternative implementation to CPython, written in RPython (Restricted Python) - often runs faster than CPython
Almost, we can say Python is interpreted language. But we are using some part of one time compilation process in python to convert complete source code into byte-code like java language.
For newbies
Python automatically compiles your script to compiled code, so called byte code, before running it.
Running a script is not considered an import and no .pyc will be created.
For example, if you have a script file abc.py that imports another module xyz.py, when you run abc.py, xyz.pyc will be created since xyz is imported, but no abc.pyc file will be created since abc.py isn’t being imported.
Python(the interpreter) is compiled.
Proof: It won't even compile your code if it contains syntax error.
Example 1:
print("This should print")
a = 9/0
Output:
This should print
Traceback (most recent call last):
File "p.py", line 2, in <module>
a = 9/0
ZeroDivisionError: integer division or modulo by zero
Code gets compiled successfully. First line gets executed (print) second line throws ZeroDivisionError (run time error) .
Example 2:
print("This should not print")
/0
Output:
File "p.py", line 2
/0
^
SyntaxError: invalid syntax
Conclusion: If your code file contains SyntaxError nothing will execute as compilation fails.
As sone one already said, "interpreted/compiled is not a property of the language but a property of the implementation." Python can be used in interpretation mode as well as compilation mode. When you run python code directly from terminal or cmd then the python interpreter starts. Now if you write any command then this command will be directly interpreted. If you use a file containing Python code and running it in IDE or using a command prompt it will be compiled first the whole code will be converted to byte code and then it will run. So it depends on how we use it.
The python code you write is compiled into python bytecode, which creates file with extension .pyc. If compiles, again question is, why not compiled language.
Note that this isn't compilation in the traditional sense of the word. Typically, we’d say that compilation is taking a high-level language and converting it to machine code. But it is a compilation of sorts. Compiled in to intermediate code not into machine code (Hope you got it Now).
Back to the execution process, your bytecode, present in pyc file, created in compilation step, is then executed by appropriate virtual machines, in our case, the CPython VM
The time-stamp (called as magic number) is used to validate whether .py file is changed or not, depending on that new pyc file is created. If pyc is of current code then it simply skips compilation step.
Seems to be a case of semantics. I think most of us infer that the usual result of compiling is machine-code. With that in mind I say to myself that python is not compiled. I would be wrong though because compile really means convert to a lower level so converting from source to byte-code is also compiling.
In my opinion Python is put into an interpreter categoty because its designed to be capable of fully processing ( from python code to execution in cpu) individual python statement. I.e. you write one statement and you can execute it and if no errors then get the corresponding result.
Having an intermediate code (like bytecode) i believe doesnt make difference to categorize it overall as compiler. Though this component (intermediate code generation) is typically been part of compiler but it can also be used in interpreters. See wiki definition of interpreter https://en.m.wikipedia.org/wiki/Interpreter_(computing). Its a crucial piece to add efficiency in terms of execution speed. With cache its even more powerful so that if you havent changed code in current program scope you skip heavy processing steps like lexical, semantic analysis and even some of code optimization.

What happens when you run a python program?

I'm just trying to get an understanding of compilers and interpreters with python. I don't fully get it yet so I might use some terms incorrectly.
My understanding right now:
CPython is both the compiler (to bytecode) for python as well as a vm where that bytecode is interpreted and run as machine code
so when you run a .py file CPython compiles your code into bytecode.
that bytecode is then converted to machine code in the python vm (which is also cpython?)

When using the Python Interpreter, is the compiler used at all?

In Google's Python Class it reads
Python is a dynamic, interpreted (bytecode-compiled) language
I know what an interpreter is and know what bytecode is but the two together seem not to fit. After doing some reading it became a bit clearer that basically Python source code is automatically compiled before it is interpreted; but some new questions emerged.
When using the Python Interpreter does no compilation happen? If it does, when? For example if you're just typing code at the command line and it gets run each time you hit enter, when does a compiler have the opportunity to do its work?
Also in the linked to question above, #delnan gives a pretty broad definition of a compiler
A compiler is, more generally, a program that converts a program in
one programming language into a program in another programming
language...JIT compilers compile to native machine code at runtime
I guess my question is: what's the difference between an interpreter and automatic compiler? To refine the question a bit, if Python is compiled, why not compile all the way to machine code (or assembly, since I know writing compilers that can produce pure machine code is difficult)?
Perhaps it is best to forget semantics and just try to learn what Cpython is actually doing. When you invoke the Cpython binary, it does a number of things. Generally speaking, you can expect it to translate the code you've written into a sequence of bytecode instructions. This is the "compiling" stage that people will sometimes reference for python code. These are a more compact and efficient way to tell the interpreter what to do than your hand-written code. Frequently, python will cache these files for reuse in .pyc files (only re-generating if the associated .py file is newer). You can think of python bytecode as the set of instructions that the python virtual machine can run -- In a lot of ways, it's not really all that different than what you get for Java. When people speak of compiled languages (e.g. C), the compiler's job is to translate your code into a set of instructions that will run directly on your computer's hardware. Languages like Cpython and Java have an extra level of indirection (e.g. the Virtual Machine). The Virtual Machine runs directly on the computer's hardware and is responsible for interpreting the domain specific language.
Compared to standard "compiled" languages (e.g. C, Fortran), this stage is really light-weight -- and python doesn't do a lot of the checking that "traditional" compilers will do (e.g. typechecking). It pretty much only checks the syntax and does a few very simple optimizations using the peephole optimizer.

Python vs Cpython

What's all this fuss about Python and CPython (Jython,IronPython), I don't get it:
python.org mentions that CPython is:
The "traditional" implementation of Python (nicknamed CPython)
yet another Stack Overflow question mentions that:
CPython is the default byte-code interpreter of Python, which is written in C.
Honestly I don't get what both of those explanations practically mean but what I thought was that, if I use CPython does that mean when I run a sample python code, it compiles it to C language and then executes it as if it were C code
So what exactly is CPython and how does it differ when compared with python and should I probably use CPython over Python and if so what are its advantages?
So what is CPython?
CPython is the original Python implementation. It is the implementation you download from Python.org. People call it CPython to distinguish it from other, later, Python implementations, and to distinguish the implementation of the language engine from the Python programming language itself.
The latter part is where your confusion comes from; you need to keep Python-the-language separate from whatever runs the Python code.
CPython happens to be implemented in C. That is just an implementation detail, really. CPython compiles your Python code into bytecode (transparently) and interprets that bytecode in a evaluation loop.
CPython is also the first to implement new features; Python-the-language development uses CPython as the base; other implementations follow.
What about Jython, etc.?
Jython, IronPython and PyPy are the current "other" implementations of the Python programming language; these are implemented in Java, C# and RPython (a subset of Python), respectively. Jython compiles your Python code to Java bytecode, so your Python code can run on the JVM. IronPython lets you run Python on the Microsoft CLR. And PyPy, being implemented in (a subset of) Python, lets you run Python code faster than CPython, which rightly should blow your mind. :-)
Actually compiling to C
So CPython does not translate your Python code to C by itself. Instead, it runs an interpreter loop. There is a project that does translate Python-ish code to C, and that is called Cython. Cython adds a few extensions to the Python language, and lets you compile your code to C extensions, code that plugs into the CPython interpreter.
You need to distinguish between a language and an implementation. Python is a language,
According to Wikipedia, "A programming language is a notation for writing programs, which are specifications of a computation or algorithm". This means that it's simply the rules and syntax for writing code. Separately we have a programming language implementation which in most cases, is the actual interpreter or compiler.
Python is a language.
CPython is the implementation of Python in C. Jython is the implementation in Java, and so on.
To sum up: You are already using CPython (if you downloaded from here).
Even I had the same problem understanding how are CPython, JPython, IronPython, PyPy are different from each other.
So, I am willing to clear three things before I begin to explain:
Python: It is a language, it only states/describes how to convey/express yourself to the interpreter (the program which accepts your python code).
Implementation: It is all about how the interpreter was written, specifically, in what language and what it ends up doing.
Bytecode: It is the code that is processed by a program, usually referred to as a virtual machine, rather than by the "real" computer machine, the hardware processor.
CPython is the implementation, which was
written in C language. It ends up producing bytecode (stack-machine
based instruction set) which is Python specific and then executes it.
The reason to convert Python code to a bytecode is because it's easier to
implement an interpreter if it looks like machine instructions. But,
it isn't necessary to produce some bytecode prior to execution of the
Python code (but CPython does produce).
If you want to look at CPython's bytecode then you can. Here's how you can:
>>> def f(x, y): # line 1
... print("Hello") # line 2
... if x: # line 3
... y += x # line 4
... print(x, y) # line 5
... return x+y # line 6
... # line 7
>>> import dis # line 8
>>> dis.dis(f) # line 9
2 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('Hello')
4 CALL_FUNCTION 1
6 POP_TOP
3 8 LOAD_FAST 0 (x)
10 POP_JUMP_IF_FALSE 20
4 12 LOAD_FAST 1 (y)
14 LOAD_FAST 0 (x)
16 INPLACE_ADD
18 STORE_FAST 1 (y)
5 >> 20 LOAD_GLOBAL 0 (print)
22 LOAD_FAST 0 (x)
24 LOAD_FAST 1 (y)
26 CALL_FUNCTION 2
28 POP_TOP
6 30 LOAD_FAST 0 (x)
32 LOAD_FAST 1 (y)
34 BINARY_ADD
36 RETURN_VALUE
Now, let's have a look at the above code. Lines 1 to 6 are a function definition. In line 8, we import the 'dis' module which can be used to view the intermediate Python bytecode (or you can say, disassembler for Python bytecode) that is generated by CPython (interpreter).
NOTE: I got the link to this code from #python IRC channel: https://gist.github.com/nedbat/e89fa710db0edfb9057dc8d18d979f9c
And then, there is Jython, which is written in Java and ends up producing Java byte code. The Java byte code runs on Java Runtime Environment, which is an implementation of Java Virtual Machine (JVM). If this is confusing then I suspect that you have no clue how Java works. In layman terms, Java (the language, not the compiler) code is taken by the Java compiler and outputs a file (which is Java byte code) that can be run only using a JRE. This is done so that, once the Java code is compiled then it can be ported to other machines in Java byte code format, which can be only run by JRE. If this is still confusing then you may want to have a look at this web page.
Here, you may ask if the CPython's bytecode is portable like Jython, I suspect not. The bytecode produced in CPython implementation was specific to that interpreter for making it easy for further execution of code (I also suspect that, such intermediate bytecode production, just for the ease the of processing is done in many other interpreters).
So, in Jython, when you compile your Python code, you end up with Java byte code, which can be run on a JVM.
Similarly, IronPython (written in C# language) compiles down your Python code to Common Language Runtime (CLR), which is a similar technology as compared to JVM, developed by Microsoft.
This article thoroughly explains the difference between different implementations of Python. Like the article puts it:
The first thing to realize is that ‘Python’ is an interface. There’s a
specification of what Python should do and how it should behave (as
with any interface). And there are multiple implementations (as with
any interface).
The second thing to realize is that ‘interpreted’ and ‘compiled’ are
properties of an implementation, not an interface.
Python is a language: a set of rules that can be used to write programs. There are several implementaions of this language.
No matter what implementation you take, they do pretty much the same thing: take the text of your program and interpret it, executing its instructions. None of them compile your code into C or any other language.
CPython is the original implementation, written in C. (The "C" part in "CPython" refers to the language that was used to write Python interpreter itself.)
Jython is the same language (Python), but implemented using Java.
IronPython interpreter was written in C#.
There's also PyPy - a Python interpreter written in Python. Make your pick :)
As python is open source that's why we can customize python as per our requirements. After customization, we can name that version as we want. That's why multiple flavours of python are available. Each flavour is a customized version of python to fulfill a special requirement. It is similar to the fact that there are multiple flavours of UNIX like, Ubuntu, Linux, RedHat Linux etc. Below are some of the flavours of python :
CPython
Default implementation of python programming language which we download from python.org, provided by python software foundation. It is written in C and python. It does not allow us to write any C code, only python code are allowed. CPython can be called as both an interpreter and a compiler as here our python code first gets compiled to python bytecode then the bytecode gets interpreted into platform-specific operations by PVM or Python Virtual Machine. Keep in mind interpreters have language syntaxes predefined that's why it does not need to translate into low level machine code. Here Interpreter just executes bytecode on the fly during runtime and results in platform-specific operations.
Old versions of JavaScript, Ruby, Php were fully interpreted languages as their interpreters would directly translate each line source code into platform-specific operations, no bytecode was involved. Bytecode is there in Java, Python, C++ (.net), C# to decouple the language from execution environment, i.e. for portability, write once, run anywhere. Since 2008, Google Chrome's V8 JavaScript engine has come up with Just-In-Time Compiler for JavaScript. It executes JavaScript code line-by-line like an interpreter to reduce start up time, but if encounters with a hot section with repeatedly executed line of code then optimizes that code using baseline or optimizing compiler.
Cython
Cython is a programming language which is a superset of python and C. It is written in C and python. It is designed to give C-like performance with python syntax and optional C-syntax. Cython is a compiled language as it generates C code and gets compiled by C compiler. We can write similar code in Cython as in default python or CPython, the differences are :
Cython allows us to write optional additional C code and,
In Cython, our python code gets translated into C code internally so that it can get compiled by C compiler. Although Cython results in considerably faster execution, but falls short of original C language execution. This is because Cython has to make calls to the CPython interpreter and CPython standard libraries to understand the written CPython code
JPython / Jython
Java implementation of python programming language. It is written in Java and python. Here our python code first gets compiled to Java bytecode and then that bytecode gets interpreted to platform-specific operations by JVM or Java Virtual Machine. This is similar to how Java code gets executed : Java code first gets compiled to intermediate bytecode then that bytecode gets interpreted to platform-specific operations by JVM
PyPy
RPython implementation of python programming language. It is written in a restricted subset of python called Restricted Python (RPython). PyPy runs faster than CPython because to interpret bytecode, PyPy has a Just-in-time Compiler (Interpreter + Compiler) while CPython has an Interpreter. So JIT Compiler in PyPy can execute Python bytecode line-by-line like an Interpreter to reduce start up time, but if encounters with a hot section with repeatedly executed line of code then optimizes that code using Baseline or Optimizing Compiler.
JIT Compiler in a Nutshell: Compiler in Python translates our high level source code into bytecode and to execute bytecode, some implementations have normal Interpreter, some have Just-in-time Compiler. To execute a loop which runs say, million times i.e. a very hot code, initially Interpreter will run it for some time and Monitor of JIT Compiler will watch that code. Then when it gets repeated some times i.e. the code becomes warm* then JIT compiler will send that code to Baseline Compiler which will make some assumptions on variable types etc. based on data gathered by Monitor while watching the code. From next iterations if assumptions turn out to be valid, then no need to retranslate bytecode into machine code, i.e. steps can be skipped for faster execution. If the code repeats a lot of times i.e. code becomes very hot then JIT compiler will send that code to Optimizing Compiler which will make more assumptions and will skip more steps for very fast execution.
JIT Compiler Drawbacks: initial slower execution when the code gets analysed, and if assumptions turn out to be false then optimized compiled code gets thrown out i.e. Deoptimization or Bailing out happens which can make code execution slower, although JIT compilers has limit for optimization/deoptimization cycle. After certain number of deoptimization happens, JIT Compiler just does not try to optimize anymore. Whereas normal Interpreter, in each iteration, repeatedly translates bytecode into machine code thereby taking more time to complete a loop which runs say, million times
IronPython
C# implementation of python, targeting the .NET framework
Ruby Python
works with Ruby platform
Anaconda Python
Distribution of python and R programming languages for scientific computing like, data science, machine learning, artificial intelligence, deep learning, handling large volume of data etc. Numerous number of libraries like, scikit-learn, tensorflow, pytorch, numba, pandas, jupyter, numpy, matplotlib etc. are available with this package
Stackless
Python for Concurrency
To test speed of each implementation, we write a program to call integrate_f 500 times using an N value of 50,000, and record the execution time over several runs. Below table shows the benchmark results :
Implementation
Execution Time (seconds)
Speed Up
CPython
9.25
CPython + Cython
0.21
44x
PyPy
0.57
16x
implementation means what language was used to implement Python and not how python Code would be implemented. The advantage of using CPython is the availability of C Run-time as well as easy integration with C/C++.
So CPython was originally implemented using C. There were other forks to the original implementation which enabled Python to lever-edge Java (JYthon) or .NET Runtime (IronPython).
Based on which Implementation you use, library availability might vary, for example Ctypes is not available in Jython, so any library which uses ctypes would not work in Jython. Similarly, if you want to use a Java Class, you cannot directly do so from CPython. You either need a glue (JEPP) or need to use Jython (The Java Implementation of Python)
You should know that CPython doesn't really support multithreading (it does, but not optimal) because of the Global Interpreter Lock. It also has no Optimisation mechanisms for recursion, and has many other limitations that other implementations and libraries try to fill.
You should take a look at this page on the python wiki.
Look at the code snippets on this page, it'll give you a good idea of what an interpreter is.
The original, and standard, implementation of Python is usually called CPython when
you want to contrast it with the other options (and just plain “Python” otherwise). This
name comes from the fact that it is coded in portable ANSI C language code. This is
the Python that you fetch from http://www.python.org, get with the ActivePython and
Enthought distributions, and have automatically on most Linux and Mac OS X machines.
If you’ve found a preinstalled version of Python on your machine, it’s probably
CPython, unless your company or organization is using Python in more specialized
ways.
Unless you want to script Java or .NET applications with Python or find the benefits
of Stackless or PyPy compelling, you probably want to use the standard CPython system.
Because it is the reference implementation of the language, it tends to run the
fastest, be the most complete, and be more up-to-date and robust than the alternative
systems.
A programming language implementation is a system for executing computer programs.
There are two general approaches to programming language implementation:
Interpretation: An interpreter takes as input a program in some language, and performs the actions written in that language on some machine.
Compilation: A compiler takes as input a program in some language, and translates that program into some other language, which may serve as input to another interpreter or another compiler.
Python is an interpreted high-level programming language created by Guido van Rossum in 1991.
CPython is reference version of the Python computing language, which is written in C created by Guido van Rossum too.
Other list of Python Implementations
Source
Cpython is the default implementation of Python and the one which we get onto our system when we download Python from its official website.
Cpython compiles the python source code file with .py extension into an intermediate bytecode which is usually given the .pyc extension, and gets executed by the Cpython Virtual Machine. This implementation of Python provides maximum compatibility with the Python packages and C extension modules.
There are many other Python implementations such as IronPython, Jython, PyPy, CPython, Stackless Python and many more.

Is it easy to fully decompile python compiled(*.pyc) files?

I wanted to know that how much easy is to decompile python byte code. I have made an application in python whose source I want to be secure. I am using py2exe which basically relies on python compiled files.
Does that secure the code?
Depends on your definition of "fully" (in "fully decompile")...;-). You won't easily get back the original Python source -- but getting the bytecode is easy, and standard library module dis exists exactly to make bytecode easily readable (though it's still bytecode, not full Python source code;-).
Compiling .pyc does not secure the code. They are easily read. See How do I protect Python code?
If you search online you can find decompilers for Python bytecode: there's a free version for downloading but which only handles bytecode up to Python 2.3, and an online service which will decompile up to version 2.6.
There don't appear to be any decompilers yet for more recent versions of Python bytecode, but that's almost certainly just because nobody has felt the need to write one rather than any fundamental difficulty with the bytecode itself.
Some people have tried to protect Python bytecode by modifying the interpreter: there's no particular reason why you can't compile your own interpreter with the different values used for the bytecode: that will prevent simple examination of the code with import dis, but won't stand up long to any determined attack and it all costs money that code be better put into improving the program itself.
In short, if you want to protect your program then use the law to do it: use an appropriate software license and prosecute those who ignore it. Code is expensive to write, but the end result is rarely the valuable part of a software package: data is much more valuable.
Now we can say: VERY EASY decompile, and NOT secure.
For I have use uncompyle6 to decompile my (latest 3.8.0) Python code:
uncompyle6 utils.cpython-38.pyc > utils.py
and decompile effect is NEARLY PERFECT.
origin python vs decompiled python:
-> ALMOST same py code except no comments
Not forget, other can use unpy2exe to
Extract .pyc files from executables created with py2exe
so your method not secure.

Categories

Resources