Access variable from several threads in Python language

Access variable from several threads in Python language - python

What guarantees are provided for this code execution in the case of using multithreading in Python:
a=b
Where a, b are integer variables (i.e. they have class 'int' in a Python Programming language). Also "a", "b" are variables from a global context.
Q1: I think it's not necessary that this instruction be translated into something like "mov [a], [b]" (in x86). If write code in ASM x86 there is a possibility to specify "lock mov [a], [b]". Of course, it's too low level and scripting language like Python, I bet, does a lot of instruction on top.
But still, some language like C++ has volatile keyword which will restrict compiler to create instruction which for optimization purpose store variables in a CPU registers.
How in Python language specify that? Is it possible?
Q2: With this line of code, there is another problem, even if it's possible to create some form of volatile behavior (maybe), but another question how to put Memory Fence. For people who are working with Python/Bash and less with compile languages, there is a thing that threads may execute in separate CPU cores, and sometimes you maybe not so lucky and will read the incorrect values from the CPU core cache. In compiling language which is translated in x86 or ARM instruction it's you that responsible for that at the end of the day, but for Python - I have no idea who is responsible for that.

You cannot make any general statements about the execution of a program that is written in Python, as it depends too much on the implementation details of the interpreter that is being used (CPython, Jython, IronPython, etc.), or the libraries that are used by the program. As far as I know, the Python language itself barely makes any promises at this level, and doesn't expose any relevant mechanics, with the exception of for high-level concepts such as threads, processes, mutexes, etc.
If you're concerned about the execution of your program, then simply don't use just Python. Instead, you should implement the critical parts of your codebase in C or C++, and use Python bindings to interact with those components. That way, you can manage memory and execution problems at the appropriate level with programming languages that are better suited, while also being able to use Python to tie most of the codebase together.

Related

How to insert Memory Fence and specify that memory is volatile in the Python program?

I am using Python languages and I use CPU threads from the threading thread.Threading wrapper. In some way, the Python interpreter converts my code into PYC byte code with its JIT. (Please provide a reference to Python bytecode standard, but as far as I know standard does not exist. As well it does not exist a standard for a language).
Then these virtual commands are executed. The real commands for Intel'd CPUs are x86/x64 instructions, and for ARM's CPU are AArch64/AArch32 instructions.
My problem - I want to make an action within the Python programming language, that enforces an ordering constraint between the memory operations.
What I want to know:
Q1: How I can emit a command
mfence if Python program is running in x86/x64 CPU
Or instruction like atomic_thread_fence() for LLVM-IR
Q2: How I can specify that some memory is volatile and should not be put into the CPU register for optimization purposes.

CPython does not have a JIT - though it may do one day.
So your code is only converted into bytecode, which will be interpreted, and not into actual Intel/etc. machine code.
Additionally, Python has what's known as the GIL - Global Interpreter Lock - meaning that even if you have multiple Python threads in a process, only one of them can be interpreting code at once - though this may also change one day. Threads were frequently useful for doing I/O, because I/O operations are able to happen at the same time, but these days asyncio is a good competitor for doing that.
So, in response to Q1, it doesn't make any sense to "put an mfence in Python code" (or the like).
Instead, what you probably want to do, if you want to enforce ordering constraints between one bit of code being executed and another, is use more high-level strategies, such as Python threading.Lock, queue.Queue, or similar equivalents in the asyncio world.
As for Q2, are you thinking of the C/C++ keyword volatile? This is frequently mistakenly thought of as a way to make memory access atomic for use in multithreading - it isn't. All that C/C++ volatile does is ensure that memory reads and writes happen exactly as specified rather than being possibly optimised out. What is your use case? There are all sorts of strategies one can use in Python to optimise code, but it's an impossible question to answer without knowing what you're actually trying to do.
Answers to comments
The CPU executes instructions. Somebody should emit this instruction. I'm calling a JIT a part inside a Python interpreter that emits instructions at the end of the day.
CPython is an interpreter - it does not emit instructions. JITs do emit instructions, but as stated above, CPython does not have a JIT. When you run a Python script, CPython will compile your text-based .py file into bytecode, and then it will spend the rest of its time working through the bytecode and doing what the bytecode says. The only machine instructions being executed are those that are emitted by whoever compiled CPython.
If you compile a Python script to a .pyc and then execute that, CPython will do exactly the same, it will just skip the "compile your text-based .py file into bytecode" part as it's already done it - the result of that step is stored in the .pyc file.
I was a bit vague in naming. Do you mean in Python, the instruction is reexecuted each time the interpreter meats the instruction?
A real CPU executing machine code will "re-execute" each instruction as it reads it, sure. CPython will do the same thing with Python bytecode. (CPython will execute a whole bunch of real machine code instructions each time it reads one Python instruction.)
Thanks, I have found this notice https://docs.python.org/3/extending/extending.html "CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once ". Ok, so when Python thread will go with C++/C bindings into native code then what will happen? Q1-A - can in that time Python another thread be executed? Q1-B - If inside C++ code I will create another thread what will happen?
Native code can release the GIL but must lock it again before returning to Python code.
Typically, native code that does some CPU-intensive work or does some I/O that requires waiting would release the GIL while it does that work or waits on that I/O. This allows Python code on another Python thread to run at the same time. But at no point does Python code run on two threads at once. That's why it makes no sense to put native ordering instructions and the like in Python code.
Native code that needs to use native ordering instructions for its own purposes will obviously do that, but that is C/C++ code, not Python code. If you want to know how to put native ordering instructions in a C/C++ Python extension, then look up how to do it in any C/C++ code - the fact that it's a Python extension is irrelevant.
Basically, either write Python code and use high-level strategies like what I mentioned above, or write a native extension in C/C++ and do whatever you need to do in that language.
I need to learn more about GIL and seems that there is a good study of GIL from David Beazley https://dabeaz.com/python/UnderstandingGIL.pdf But #Keiji - you maybe can be wrong with Q1 - CPython Threads seems to be a real thread, and if implementor of C++/C extensions (Almost all libraries for Python) will decide to release GIL lock - it's possible to do... So Q1 still has sense...
I've covered this above. Python code can't interact with native code in a way that would require putting native ordering instructions in Python code.
Back to the question - I mean volatile in sense of C++ getting rid of compiler optimization to not allow optimized variables to be put into the register. In C++ it does not guarantee atomicity and it does not guarantee memory fence. So question regarding volatile how I can specify for integer variable or user-defined type?
If you want to make something in C/C++ be volatile, use the volatile keyword. If you're writing Python code, it doesn't make any sense to make something volatile.

About Python Threads:
The Python thread first of all is tricky. Interpreters use real POSIX/WinAPI threads. Threads thread.Threading under the hood is the real threads.
The thread execution model is pretty specific and can be called "cooperative multitasking" due to one enthusiast (David Beazley, https://www.dabeaz.com/about.html)
As David Beazley stated https://www.dabeaz.com/python/UnderstandingGIL.pdf
When a thread is in the I/O waiting for the Thread release global lock (called GIL lock). David Beazley stated that there is a way to release the lock after the system call.
Next, there is a "tick" instruction in Python VM instructions. If some thread is CPU-bound then the thread will execute that "tick" instruction. (Roughly speaking it occurs every 100ms)
In tick, each thread tries to release GIL and acquire "tick" one more time.
There is no thread scheduler in Python
Multithreading in Python is in fact hurts performance.

Why are there no languages that are both interpreted and (really) compiled?

I am an (old) engineer not a programmer so forgive me for asking a naïve question.
My understanding is that to get really fast execution times for a program, it needs to be compiled to native machine code. And there are a relatively small number of languages still in use that do this (e.g. C and C++).
But I much prefer the syntax of Python over that of the C-derived compiled languages. However my understanding is that interpreted Python (and pseudo-compiled Python run on a virtual machine) cannot match the execution speed of a truly compiled language.
Is there some reason that a true native-code Python compiler cannot be developed?
[I am interested specifically in Python but I am not aware of any language that can be interpreted and also compiled to native machine code.]

The key distinction is a clean separation between compile time and run time. In Python, for instance, import happens at runtime, and can happen conditionally. And per the halting problem, that means a compiler cannot determine up front if a given import will happen. Yet, this affects the code that would need to be generated.
As Bill the Lizard notes, if the language does have a clean distinction, an interpreter can still choose to ignore it. C's #include can happen before main, but that does not mean an interpreter must do so.
Outside the syntax, Python is also virtually uncompilable due to weak typing. In C, + has a very limited set of meanings - integer, floating point or pointer, and the compiler will know the static type of the arguments. C++ has far more extensive overloading, but the same basic principle applies. virtual functions allow some run-time flexibility, but from an finite set of options, all compiled before main starts.
Also not syntax is the memory model - both C and C++ have a memory model that's an improved derivative of Java's memory model, which makes threading quite efficient. (Unlike Java, not every object can be synchronized, you need special members). As CPU's gain more and more cores, the advantages only continue to grow. Compilers can see pretty well where memory and CPU registers need to be brought in sync.

compiler vs interpreter ( on basis of construction and design )

After viewing lots of posts about the difference between compilers and interpreters, I'm still not able to figure out the difference in their construction and internal mechanism.
The most common difference I read was that a compiler produces a target program which is executable { means machine code as its output } which can run on a system and than be fed with input.
Whereas an interpreter simply runs the input line by line { what exactly is happening here ?} and produces the output.
My main doubts are :
1) A compiler consists of a lexical analyzer, parser, intermediate code generator and code generator but what are the parts of an interpreter?
2) Who gives the run-time support to interpreted languages, I mean who manages the heap and stacks for recursive functions?
3) This is specific to the Python language:
Python comprises of a compiler stage and than interpreter stage as well
compiler produces some byte-code and and than this byte-code is interpreted by its Virtual Machine.
if I were to design only the compiler for Python (Python -> bytecode)
a) will I have to manage memory { write code to manage stack and heap } for it?
b) how will this compiler differ from the traditional compiler or say interpreter?
I know this is a whole lot to ask here but I really want to understand these minute details.
I'm referring the compiler book by Alfred V. Aho
Based on the feedback and some further study I think I should modify my question
A compiler need not produce only machine code as its output
But one question is still bugging me
Let say I want to design a ( Python->bytecode ) compiler and then bytecode will be interpreted by the virtual machine.. (correct me if I'm wrong ).
Then I'll have to write a lexical analyzer for Python and then a parser which will generate some sort of abstract syntax tree.. after this do I have to generate some intermediate code (3 address code as mentioned in the dragon book) or direct bytecode instructions ( which I suppose will be given in the VM's documentation ) ?
Will I have to write code for handling stack as well to provide support for recursion and scope ?

First off, "compiler" does not imply "outputs machine code". You can compile from any language to any other, be it a high-level programming language, some intermediate format, code for a virtual machine (bytecode) or code for a physical machine (machine code).
Like a compiler, an interpreter needs to read and understand the language it implements. Thus you have the same front-end code (though today's interpreters usually implement far simpler language - the bytecode only; therefore these interpreters only need a very simple frontend). Unlike a compiler, an interpreter's backend doesn't generate code, but executes it. Obviously, this is a different problem entirely and hence an interpreter looks quite difference from a compiler. It emulates a computer (often one that's far more high-level than real life machines) instead of producing a representation of an equivalent program.
Assuming today's somewhat-high-level virtual machines, this is the job of the interpreter - there are dedicated instructions for, among other things, calling functions and creating objects, and garbage collection is baked into the VM. When you target lower-level machines (such as the x86 instruction set), many such details need to be baked into the generated code, be it directly (system calls or whatever) or via calling into the C standard library implementation.
3.
a) Probably not, as a VM dedicated to Python won't require it. It would be very easy to screw up, unnecessary and arguably incompatible to Python semantics as it allows manual memory management. On the other hand, if you were to target something low-level like LLVM, you'd have to take special care - the details depend on the target language. That's part of why nobody does it.
b) It would be a perfectly fine compiler, and obviously not an interpreter. You'd probably have a simpler backend than a compiler targeting machine code, and due to the nature of the input language you wouldn't have quite as much analysis and optimization to do, but there's no fundamental difference.

python threading: memory model and visibility

Does python threading expose issues of memory visibility and statement reordering as Java does? Since I can't find any reference to a "Python Memory Model" or anything like that, despite the fact that lots of people are writing multithreaded Python code, I'm guessing that these gotchas don't exist here. No volatile keyword, for instance. But it doesn't seem to be stated explicitly anywhere that, for instance, a change in a variable in one thread is immediately visible to all other threads.
Maybe this stuff is all very obvious to Python programmers, but as a fearful Java programmer, I require a little extra reassurance :)

There is no formal model for Python's threading (hey, after all, there wasn't one for Java's for years... hopefully, one will also eventually be written for Python).
In practice, no Python implementation performs any advanced optimization such as statement reordering or temporarily treating shared variables as thread-local ones -- and you can count on these semantics constraints even though they are not formally assured.
CPython in particular, as #Rawheiser's mention, uses a global interpreter lock; other implementations (PyPy, IronPython, Jython, ...) do not (so they can use multiple cores effectively with a threading model, while CPython requires multi-processing for the same purpose), so you should not count on that if you want to write code that's portable throughout Python implementations. (So, you shouldn't count on the "atomicity" of operations that only happen to be atomic in CPython because of the GIL, such as dictionary accesses -- in other Python implementations, multiple threads might be modifying a dict at once and cause errors unless you protect the dict with a lock or the like).

Why is (python|ruby) interpreted?

What are the technical reasons why languages like Python and Ruby are interpreted (out of the box) instead of compiled? It seems to me like it should not be too hard for people knowledgeable in this domain to make these languages not be interpreted like they are today, and we would see significant performance gains. So certainly I am missing something.

Several reasons:
faster development loop, write-test vs write-compile-link-test
easier to arrange for dynamic behavior (reflection, metaprogramming)
makes the whole system portable (just recompile the underlying C code and you are good to go on a new platform)
Think of what would happen if the system was not interpreted. Say you used translation-to-C as the mechanism. The compiled code would periodically have to check if it had been superseded by metaprogramming. A similar situation arises with eval()-type functions. In those cases, it would have to run the compiler again, an outrageously slow process, or it would have to also have the interpreter around at run-time anyway.
The only alternative here is a JIT compiler. These systems are highly complex and sophisticated and have even bigger run-time footprints than all the other alternatives. They start up very slowly, making them impractical for scripting. Ever seen a Java script? I haven't.
So, you have two choices:
all the disadvantages of both a compiler and an interpreter
just the disadvantages of an interpreter
It's not surprising that generally the primary implementation just goes with the second choice. It's quite possible that some day we may see secondary implementations like compilers appearing. Ruby 1.9 and Python have bytecode VM's; those are ½-way there. A compiler might target just non-dynamic code, or it might have various levels of language support declarable as options. But since such a thing can't be the primary implementation, it represents a lot of work for a very marginal benefit. Ruby already has 200,000 lines of C in it...
I suppose I should add that one can always add a compiled C (or, with some effort, any other language) extension. So, say you have a slow numerical operation. If you add, say Array#newOp with a C implementation then you get the speedup, the program stays in Ruby (or whatever) and your environment gets a new instance method. Everybody wins! So this reduces the need for a problematic secondary implementation.

Exactly like (in the typical implementation of) Java or C#, Python gets first compiled into some form of bytecode, depending on the implementation (CPython uses a specialized form of its own, Jython uses JVM just like a typical Java, IronPython uses CLR just like a typical C#, and so forth) -- that bytecode then gets further processed for execution by a virtual machine (AKA interpreter), which may also generate machine code "just in time" -- known as JIT -- if and when warranted (CLR and JVM implementations often do, CPython's own virtual machine typically doesn't but can be made to do so e.g. with psyco or Unladen Swallow).
JIT may pay for itself for sufficiently long-running programs (if memory's way cheaper than CPU cycles), but it may not (due to slower startup times and larger memory footprint), especially when the types also have to be inferred or specialized as part of the code generation. Generating machine code without type inference or specialization is easy if that's what you want, e.g. freeze does it for you, but it really doesn't present the advantages that "machine code fetishists" attribute to it. E.g., you get an executable binary of 1.5 to 2 MB in lieu of a tiny "hello world" .pyc -- not much point!-). That executable is stand-alone and distributable as such, but it will only work on a very specific narrow range of operating systems and CPU architectures, so the tradeoffs are quite iffy in most cases. And, the time it takes to prepare the executable is quite long indeed, so it would be a crazy choice to make that mode of operation the default one.

Merely replacing an interpreter with a compiler won't give you as big a performance boost as you might think for a language like Python. When most time is actually spend doing symbolic lookups of object members in dictionaries, it doesn't really matter if the call to the function performing such lookup is interpreted, or is native machine code - the difference, while not quite negligible, will be dwarfed by lookup overhead.
To really improve performance, you need optimizing compilers. And optimization techniques here are very different from what you have with C++, or even Java JIT - an optimizing compiler for a dynamically typed / duck typed language such as Python needs to do some very creative type inference (including probabilistic - i.e. "90% chance of it being T" and then generating efficient machine code for that case with a check/branch before it) and escape analysis. This is hard.

I think the biggest reason for the languages being interpreted is portability. As a programmer you can write code that will run in an interpreter not a specific OS. So your programs behave more uniformly across platforms (more so than compiled languages). Another advantage I can think of is it's easier to have a dynamic type system in an interpreted language. I think the creators of the language were thinking having a language where programmers can be more productive due to automatic memory management, dynamic type system and meta programming wins over any performance loss due to the language being interpreted. If you are concerned about performance you can always compile the language to native machine code employing a technique like JIT compilation.

Today, there is no longer a strong distinction between "compiled" and "interpreted" languages. Python is in fact compiled just as much as Java is, the only differences are:
The Python compiler is much faster than the Java compiler
Python automatically compiles source code as it is executed, there is no separate "compile" step required
Python bytecode is different from JVM bytecode
Python even has a function called compile() which is an interface to the compiler.
It sounds like the distinction you are making is between "dynamically typed" and "statically typed" languages. In dynamic languages such as Python, you can write code like:
def fn(x, y):
return x.foo(y)
Notice that the types of x and y are not specified. At runtime, this function will look at x to see whether it has a member function named foo, and if so will call it with y. If not, it will throw a runtime error that indicates no such function was found. This sort of runtime lookup is much easier to represent using an intermediate representation like bytecode, where a runtime VM does the lookup instead of having to generate machine code to do the lookup itself (or, call a function to do the lookup which is what the bytecode will do anyway).
Python has projects such as Psyco, PyPy, and Unladen Swallow that take various approaches to compiling Python object code into something closer to native code. There is active research in this area but there is not (as yet) a simple answer.

The effort required to create a good compiler to generate native code for a new language is staggering. Small research groups typically take 5 to 10 years (examples: SML/NJ, Haskell, Clean, Cecil, lcc, Objective Caml, MLton, and many others). And when the language in question requires type checking and other decisions to be made at run time, a compiler writer has to work much harder to get good native-code performance (for an excellent example, see work by Craig Chambers and later Urs Hoelzle on Self). The performance gains you might hope for are harder to realize than you might think. This phenomenon partly explains why so many dynamically typed languages are interpreted.
As noted, a decent interpreter is also instantly portable, while porting compilers to new machine architectures takes substantial effort (and is a problem I personally have been working on for over 20 years, with some time off for good behavior). So an interpreter is a way to reach a wide audience quickly.
Finally, although fast compilers and slow interpreters exist, it's usually easer to make the edit-translate-go cycle faster by using an interpreter. (For some nice examples of fast compilers see the aforementioned lcc as well as Ken Thompson's go compiler. For an example of a relatively slow interpreter see GHCi.

Well, isn't one of the strengths of these languages that they are so easily scriptable? They wouldn't be if they were compiled. And on the other hand, dynamic languages are easier to intereprete than to compile.

In a compiled language, the loop you get into when making software is
Make a change
Compile changes
Test changes
goto 1
Interpreted languages tend to be faster to make stuff in because you get to cut out step two of that process (and when you're dealing with a large system where compile times can be upwards of two minutes, step two can add a significant amount of time).
This isn't necessarily the reason python|ruby designers thought of, but keep in mind that "How efficiently does the machine run this?" is only half the software development problem.
It also seems like it would be easier to compile code in a language that's interpreted naturally than it would be to add an interpreter to a language that's compiled by default.

REPL. Don't knock it 'till you've tried it. :)

By design.
The authors wanted something where they can write scripts into.
Python gets compiled the first time it is executed though

Compiling Ruby at least is notoriously hard. I'm working on one, and as part of that I wrote a blog post enumerating some of the issues here.
Specifically, Ruby is suffering from a very unclear (i.e. non-existent) boundary between the "read" and "execute" phase of the program that makes it hard to compile efficiently. You could just emulate what the interpreter does, but then you're not going to see much speed up, so it wouldn't be worth the effort. If you want to compile it efficiently you then face a lot of additional complications to handle the extreme level of dynamism in Ruby.
The good news is that there are techniques for overcoming this. Self, Smalltalk and Lisp/Scheme's have dealt quite successfully with most of the same issues. But it takes time to sift through it and figure out how to make it work with Ruby. It also doesn't help that Ruby has a very convoluted grammar.

Raw compute performance is probably not a goal of most interpreted languages. Interpreted languages are typically more concerned about programmer productivity than raw speed. In most cases these languages are plenty fast enough for the tasks the languages were designed to tackle.
Given that, and that just about the only advantages of a compiler are type checking (difficult to do in a dynamic language) and speed, there's not much incentive to write compilers for most interpreted languages.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.