What is the Python equivalent of FreeRTOS Streambuffer? - python

I need a structure equivalent to FreeRTOS's Streambuffer, but in Python. Ideally I could write something like:
myBuf=StreamBuffer(maxSize=2048)
and then be able to call:
read_bytes=myBuf.read(amount_to_read, timeout=300)
myBuf.write(b"fooo", timeout=20)
such that the operations are blocking whenever I specifiy (a non-zero) timeout, until there is enough bytes to read or enough space to write, and that would return immediately with an error code or throw an exception if timeout is zero (ie non-blocking mode) or exceeded.
I tried experimenting with io.BytesIO, but I don't know how to use it. There does not seem a way to specify a limit on its size or an ability to make it blocking.
Does anything standard exist, or does anyone know of any useful packages that provide this functionality in a simple/fast way?
Update:
From Here it seems I can use io.BufferedRWPair, but there are no examples on things such as setting timeouts and configuring non-/blocking behavior. I also don't understand the warning that implies one should use io.BufferedRandom instead.
Update2:
I realize Python provides collections.queue which is pretty much analogous to FreeRTOS queues, but is there an equivalent for streams, so that one can just keep pushing bytes from one end and pop them off from the other?
Update 3: Seems an os.pipe or multiprocessing.pipe is closest to what I'm looking for.

Related

Python multiprocessing.Pool(): am I limited in what I can return?

I am using Python's multi-processing pool. I have been told, although not experienced this myself so I cannot post the code, that one cannot just "return" anything from within the multiprocessing.Pool()-worker back to the multiprocessing.Pool()'s main process. Words like "pickling" and "lock" were being thrown around but I am not sure.
Is this correct, and if so, what are these limitations?
In my case, I have a function which generates a mutable class object and then returns it after it has done some work with it. I'd like to have 8 processes run this function, generate their own classes, and return each of them after they're done. Full code is NOT written yet, so I cannot post it.
Any issues I may run into?
My code is: res = pool.map(foo, list_of_parameters)
Q : "Is this correct, and if so, what are these limitations?"
It depends. It is correct, but the SER/DES processing is the problem here, as a pair of disjoint processes tries to "send" something ( there: a task specification with parameters and back: ... Yessss, the so long waited for result* )
Initial versions of the Python standard library of modules piece, responsible for doing this, the pickle-module, was not able to SER-ialise some more complex types of objects, Class-instances being one such example.
There are newer and newer versions evolving, sure, yet this SER/DES step is one of the SPoFs that may avoid a smooth code-execution for some such cases.
Next are the cases, that finish by throwing a Memory Error as they request as much memory allocations, that the O/S simply rejects any new request for such an allocation, and the whole process attempt to produce and send pickle.dumps( ... ) un-resolvably crashes.
Do we have any remedies available?
Well, maybe yes, maybe no - Mike McKearn's dill may help in some cases to better handle complex objects in SER/DES-processing.
May try to use import dill as pickle; pickle.dumps(...) and test your hot-candidates for Class()-instances to get SER/DES-ed, if they get a chance to pass through. If not, no way using this low-hanging fruit first trick.
Next, a less easy way would be to avoid your dependence on hardwired multiprocessing.Pool()-instantiations and their (above)-limited SER/comms/DES-methods, and design your processing strategy as a distributed-computing system, based on a communicating agents paradigm.
That way you benefit from a right-sized, just-enough designed communication interchange between intelligent-enough agents, that know (as you've designed them to know it) what to tell one to the others, even without sending any mastodon-sized BLOB(s), that accidentally crash the processing in any of the SPoF(s) you cannot both prevent and salvage ex-post.
There seem no better ways forward I know about or can foresee in 2020-Q4 for doing this safe and smart.

Is there a way to find out if Python threading locks are ever used by more than one thread?

I'm working on a personal project that has been refactored a number of times. It started off using multithreading, then parts of it used asyncio, and now it is back to being mainly single threaded.
As a result of all these changes I have a number of threading.Lock()'s in the code that I would like to remove and cleanup to prevent future issues.
How can I easily work out which locks are in use and hit by more than one thread during the runtime of the application?
If I am in the situation to find that out, I would try to replace the lock with a wrapper that do the counting (or print something, raise an exception, etc.) for me when the undesired behavior happened. Python is hacky, so I can simply create a function and overwrite the original threading.Lock to get the job done. That might need some careful implementation, e.g., catch both all possible pathway to lock and unlock.
However, you have to be careful that even so, you might not exercise all possible code path and thus never know if you really remove all "bugs".

Zipkin for profiling the internals of a traditional progamm

I want to use zipkin to profile the internals of a traditional program.
I use the term "traditional", since AFAIK zipkin is for tracing in a microservice environment where one request gets computed by N sub-requests.
I would like to analyse the performance of my python program.
I would like to trace all python method calls and all linux syscalls which gets done.
How to trace the python method calls and linux syscalls to get the spans into zipkin?
Even if it is not feasible, I am interesting how this could be done. I would like to learn how zipkin works.
In zipkin lingo, what you are asking about is often called "local spans" or "local tracing", basically an operation that neither originated, nor resulted in a remote call.
I'm not aware of anything at the syscall level, but many tracers support explicit instrumentation of function calls.
For example, using py_zipkin
#zipkin_span(service_name='my_service', span_name='some_function')
def some_function(a, b):
return do_stuff(a, b)
Besides explicit instrumentation like this, one could also export data to zipkin. For example, one could convert trace data that is made in another tool to zipkin's json format.
This probably doesn't answer your question, but I hope it helps.

Most pythonic way to call dependant methods

I have a class with few methods - each one is setting some internal state, and usually requires some other method to be called first, to prepare stage.
Typical invocation goes like this:
c = MyMysteryClass()
c.connectToServer()
c.downloadData()
c.computeResults()
In some cases only connectToServer() and downloadData() will be called (or even just connectToServer() alone).
The question is: how should those methods behave when they are called in wrong order (or, in other words, when the internal state is not yet ready for their task)?
I see two solutions:
They should throw an exception
They should call correct previous method internally
Currently I'm using second approach, as it allows me to write less code (I can just write c.computeResults() and know that two other methods will be called if necessary). Plus, when I call them multiple times, I don't have to keep track of what was already called and so I avoid multiple reconnecting or downloading.
On the other hand, first approach seems more predictable from the caller perspective, and possibly less error prone.
And of course, there is a possibility for a hybrid solution: throw and exception, and add another layer of methods with internal state detection and proper calling of previous ones. But that seems to be a bit of an overkill.
Your suggestions?
They should throw an exception. As said in the Zen of Python: Explicit is better than implicit. And, for that matter, Errors should never pass silently. Unless explicitly silenced. If the methods are called out of order that's a programmer's mistake, and you shouldn't try to fix that by guessing what they mean. You might accidentally cover up an oversight in a way that looks like it works, but is not actually an accurate reflection of the programmer's intent. (That programmer may be future you.)
If these methods are usually called immediately one after another, you could consider collating them by adding a new method that simply calls them all in a row. That way you can use that method and not have to worry about getting it wrong.
Note that classes that handle internal state in this way are sometimes called for but are often not, in fact, necessary. Depending on your use case and the needs of the rest of your application, you may be better off doing this with functions and actually passing connection objects, etc. from one method to another, rather than using a class to store internal state. See for instance Stop Writing Classes. This is just something to consider and not an imperative; plenty of reasonable people disagree with the theory behind Stop Writing Classes.
You should write exceptions. It is good programming practice to write Exceptions to make your code easier to understand for the following reasons:
What you are describe fits the literal description of "exception" -- it is an exception to normal proceedings.
If you build in some kind of work around, you will likely have "spaghetti code" = BAD.
When you, or someone else goes back and reads this code later, it will be difficult to understand if you do not provide the hint that it is an exception to have these methods executed out of order.
Here's a good source:
http://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/
As my CS professor always said "Good programmers can write code that computers can read, but great programmers write code that humans and computers can read".
I hope this helps.
If it's possible, you should make the dependencies explicit.
For your example:
c = MyMysteryClass()
connection = c.connectToServer()
data = c.downloadData(connection)
results = c.computeResults(data)
This way, even if you don't know how the library works, there's only one order the methods could be called in.

How do I dump an entire Python process for later debugging inspection?

I have a Python application in a strange state. I don't want to do live debugging of the process. Can I dump it to a file and examine its state later? I know I've restored corefiles of C programs in gdb later, but I don't know how to examine a Python application in a useful way from gdb.
(This is a variation on my question about debugging memleaks in a production system.)
There is no builtin way other than aborting (with os.abort(), causing the coredump if resource limits allow it) -- although you can certainly build your own 'dump' function that dumps relevant information about the data you care about. There are no ready-made tools for it.
As for handling the corefile of a Python process, the Python source has a gdbinit file that contains useful macros. It's still a lot more painful than somehow getting into the process itself (with pdb or the interactive interpreter) but it makes life a little easier.
If you only care about storing the traceback object (which is all you need to start a debugging session), you can use debuglater (a fork of pydump). It works with recent versions of Python and has a IPython/Jupyter integration.
If you want to store the entire session, look at dill. It has a dump_session, and load_session functions.
Here are two other relevant projects:
python-checkpointing2
pycrunch-trace
If you're looking for a language agnostic solution, you want to create a core dump file. Here's an example with Python.
Someone above said that there is no builtin way to perform this, but that's not entirely true. For an example, you could take a look at the pylons debugging tools. Whene there is an exception, the exception handler saves the stack trace and prints a URL on the console that can be used to retrieve the debugging session over HTTP.
While they're probably keeping these sessions in memory, they're just python objects, so there's nothing to stop you from pickling a stack dump and restoring it later for inspection. It would mean some changes to the app, but it should be possible...
After some research, it turns out the relevant code is actually coming from Paste's EvalException module. You should be able to look there to figure out what you need.
It's also possible to write something that would dump all the data from the process, e.g.
Pickler that ignores the objects it can't pickle (replacing them with something else) (e.g. Python: Pickling a dict with some unpicklable items)
Method that recursively converts everything into serializable stuff (e.g. this, except it needs a check for infinitely recursing objects and do something with those; also it could try dir() and getattr() to process some of the unknown objects, e.g. extension classes).
But leaving a running process with manhole or pylons or something like that certainly seems more convenient when possible.
(also, I wonder if something more convenient was written since this question was first asked)
This answer suggests making your program core dump and then continuing execution on another sufficiently similar box.

Categories

Resources