I am new to Python. Just want to know is there any module in python similar to ruby's drb? Like a client can use object provided by the drb server?
This is generally called "object brokering" and a list of some Python packages in this area can be found by browsing the Object Brokering topic area of the Python Package Index here.
The oldest and most widely used of these is Pyro.
Pyro does what I think you're discribing (although I've not used drb).
From the website:
Pyro is short for PYthon Remote Objects. It is an advanced and powerful Distributed Object Technology system written entirely in Python, that is designed to be very easy to use. Never worry about writing network communication code again, when using Pyro you just write your Python objects like you would normally. With only a few lines of extra code, Pyro takes care of the network communication between your objects once you split them over different machines on the network. All the gory socket programming details are taken care of, you just call a method on a remote object as if it were a local object!
The standard multiprocessing module might do what you want.
I have no idea what drb is, but from the little information you have given,
it might be something like the Perspective Broker in Twisted
Introduction
Suppose you find yourself in control
of both ends of the wire: you have two
programs that need to talk to each
other, and you get to use any protocol
you want. If you can think of your
problem in terms of objects that need
to make method calls on each other,
then chances are good that you can use
twisted's Perspective Broker protocol
rather than trying to shoehorn your
needs into something like HTTP, or
implementing yet another RPC
mechanism.
The Perspective Broker system
(abbreviated PB, spawning numerous
sandwich-related puns) is based upon a
few central concepts:
serialization: taking fairly arbitrary
objects and types, turning them into a
chunk of bytes, sending them over a
wire, then reconstituting them on the
other end. By keeping careful track of
object ids, the serialized objects can
contain references to other objects
and the remote copy will still be
useful.
remote method calls: doing
something to a local object and
causing a method to get run on a
distant one. The local object is
called a RemoteReference, and you do
something by running its .callRemote
method.
Have you looked at execnet?
http://codespeak.net/execnet/
For parallel processing and distributed computing I use parallel python.
Related
Recently I became interested in DIAMETER protocol defined by [RFC 6733][1]. Since I am learning Python, I thought that it might be interesting to see if I could use any DIAMETER Python library. I did find [one][2], but it appears to be no longer maintained. So I got the highly ambitious idea of trying to build one, at least something that is skeletal, that could be extended to have richer DIAMETER signaling capabilities.
Since I had also come across twisted matrix a while back, I tried to check it's documentation to see if has support for all types of transport that DIAMETER protocol could be supported on, but apart from TCP, UDP (and also TLS), I din't find mention of rest, i.e.
SCTP/IP
SCTP/UDP
DTLS/SCTP
So was wondering if there is any other library that could be used, or should I expect to have to hand-roll this ? Extending twisted, is beyond me at this step.
[1]: https://www.rfc-editor.org/rfc/rfc6733
[2]: http://i1.dk/PythonDiameter/
I don't know it this one is still supported (last update in december 2014)
http://sourceforge.net/projects/pyprotosim/
It does radius, diameter, dhcp, ldap, EAP calculations
You haven't chosen the easiest protocol. A lot of providers have their own AVPs, and sometimes they even use standard numbers for theirs.
You can also write your own lib to parse DIAMETER, it's not that hard, you just need time (a lot) and good documentation (or experts).
If the one I did had not been developed during my work, I could have shared it, but I can't.
If you were going to roll your own, you can do this with Twisted by using the IFileDescriptor (and related) interface(s). Make an SCTP socket, wrap an IFileDescriptor around it that returns its fileno, then implement IReadDescriptor.doRead to call sctp_sendmsg and IWriteDescriptor.doWrite to call sctp_recvmsg. Now you have an SCTP transport. You can implement it to call methods on whatever SCTP protocol interface that is appropriate to the protocol. I don't know enough about SCTP to say what methods that protocol interface should have, unfortunately.
I have a python process serving as a WSGI-apache server. I have many copies of this process running on each of several machines. About 200 megabytes of my process is read-only python data. I would like to place these data in a memory-mapped segment so that the processes could share a single copy of those data. Best would be to be able to attach to those data so they could be actual python 2.7 data objects rather than parsing them out of something like pickle or DBM or SQLite.
Does anyone have sample code or pointers to a project that has done this to share?
This post by #modelnine on StackOverflow provides a really great comprehensive answer to this question. As he mentioned, using threads rather than process-forking in your webserver can significantly lesson the impact of this. I ran into a similar problem trying to share extremely-large NumPy arrays between CLI Python processes using some type of shared memory a couple of years ago, and we ended up using a combination of a sharedmem Python extension to share data between the workers (which proved to leak memory in certain cases, but, it's fixable probably). A read-only mmap() technique might work for you, but I'm not sure how to do that in pure-python (NumPy has a memmapping technique explained here). I've never found any clear and simple answers to this question, but hopefully this can point you in some new directions. Let us know what you end up doing!
It's difficult to share actual python objects because they are bound to the process address space. However, if you use mmap, you can create very usable shared objects. I'd create one process to pre-load the data, and the rest could use it. I found quite a good blog post that describes how it can be done: http://blog.schmichael.com/2011/05/15/sharing-python-data-between-processes-using-mmap/
Since it's read-only data you won't need to share any updates between processes (since there won't be any updates) I propose you just keep a local copy of it in each process.
If memory constraints is an issue you can have a look at using multiprocessing.Value or multiprocessing.Array without locks for this: https://docs.python.org/2/library/multiprocessing.html#shared-ctypes-objects
Other than that you'll have to rely on an external process and some serialising to get this done, I'd have a look at Redis or Memcached if I were you.
One possibility is to create a C- or C++-extension that provides a Pythonic interface to your shared data. You could memory map 200MB of raw data, and then have the C- or C++-extension provide it to the WSGI-service. That is, you could have regular (unshared) python objects implemented in C, which fetch data from some kind of binary format in shared memory. I know this isn't exactly what you wanted, but this way the data would at least appear pythonic to the WSGI-app.
However, if your data consists of many many very small objects, then it becomes important that even the "entrypoints" are located in the shared memory (otherwise they will waste too much memory). That is, you'd have to make sure that the PyObject* pointers that make up the interface to your data, actually themselves point to the shared memory. I.e, the python objects themselves would have to be in shared memory. As far as I can read the official docs, this isn't really supported. However, you could always try "handcrafting" python objects in shared memory, and see if it works. I'm guessing it would work, until the Python interpreter tries to free the memory. But in your case, it won't, since it's long-lived and read-only.
I am currently working on a control system for Arduino type devices using Twisted,and have a bit of a design issue
Here is how things are currently: (sorry in advance, might be a bit long)
to handle different type of devices (each having a different firmware & communication protocol ) i have a designed a "driver" system :
each driver is made of :
a "hardware handler class" : a wrapper around Twsited's serial class with a few added helper methods
a custom serial protocol
2- While implementing drivers for Reprap 3d printers (also based on arduino, also using a serial connection) with rather specific protocols (ie enqueue point, set temperature etc), i have started to wonder if i am placing the methods for handling those features (each having specific commands) in the right place..
This all leads me to my questions:
I am not quite sure about the good practices as far as twisted protocols go , but having looked through the documentation / code of quite a few of them, it seems they tend to have relatively few methods
is this always the case? should the protocol only be used for very low level functions and in/out formatting and communication ?
certain devices i want to manage have very clearly defined protocols (Makerbot etc), should i consider general protocol specifications to be a different thing then the actual Twisted protocol classes i am creating ?
Any advice, tips and pointers are more than welcome !
Thanks in advance.
I'll try my best at answering a, well, quite general question.
1) The interface that make up a Twisted protocol has only 4 methods:
http://twistedmatrix.com/documents/11.0.0/api/twisted.internet.interfaces.IProtocol.html
So this will be where all the interaction between your protocol implementations and Twisted happens.
2) Besides instances of protocol, there is of course the factory which produces instances of your protocol (for each new connection). So for example, stuff that should be available to all connections (like i.e. current number of connected clients, whatever) naturally resides there.
3) Of course it might make sense to build small class hierarchies, where you derive from Protocol, implement stuff that is shared by all your subprotocols, and then only implement the subprotocol specifics again in a derived class.
my question simply relates to the difference in performance between a socket in C and in Python. Since my Python build is CPython, I assume it's similar, but I'm curious if someone actually has "real" benchmarks, or at least an opinion that's evidence based.
My logics is as such:
C socket much faster? then write a C
extension.
not/barely a difference?
keep writing in Python and figure out
how to obtain packet level control
(scapy? dpkt?)
I'm sure someone will want to know for either context or curiosity. I plan to build a sort of proxy for myself (not for internet browsing, anonymity, etc) and will bind the application I want to use with it to a specific port. Then, all packets on said port will be queued, address header modified, and then sent, etc, etc.
Thanks in advance.
In general, sockets in Python perform just fine. For example, the reference implementation of the BitTorrent tracker server is written in Python.
When doing networking operations, the speed of the network is usually the limiting factor. That is, any possible tiny difference in speed between C and Python's socket code is completely overshadowed by the fact that you're doing networking of some kind.
However, your description of what you want to do indicates that you want to inspect and modify individual IP packets. This is beyond the capabilities of Python's standard networking libraries, and is in any case a very OS-dependent operation. Rather than asking "which is faster?" you will need to first ask "is this possible?"
i would think C would be faster, but python would be a lot easier to manage and use.
the difference would be so small, you wouldn't need it unless you were trying to send masses amount of data (something stupid like 1 million gb/second lol)
joe
I'm writing a reasonably complex web application. The Python backend runs an algorithm whose state depends on data stored in several interrelated database tables which does not change often, plus user specific data which does change often. The algorithm's per-user state undergoes many small changes as a user works with the application. This algorithm is used often during each user's work to make certain important decisions.
For performance reasons, re-initializing the state on every request from the (semi-normalized) database data quickly becomes non-feasible. It would be highly preferable, for example, to cache the state's Python object in some way so that it can simply be used and/or updated whenever necessary. However, since this is a web application, there several processes serving requests, so using a global variable is out of the question.
I've tried serializing the relevant object (via pickle) and saving the serialized data to the DB, and am now experimenting with caching the serialized data via memcached. However, this still has the significant overhead of serializing and deserializing the object often.
I've looked at shared memory solutions but the only relevant thing I've found is POSH. However POSH doesn't seem to be widely used and I don't feel easy integrating such an experimental component into my application.
I need some advice! This is my first shot at developing a web application, so I'm hoping this is a common enough issue that there are well-known solutions to such problems. At this point solutions which assume the Python back-end is running on a single server would be sufficient, but extra points for solutions which scale to multiple servers as well :)
Notes:
I have this application working, currently live and with active users. I started out without doing any premature optimization, and then optimized as needed. I've done the measuring and testing to make sure the above mentioned issue is the actual bottleneck. I'm sure pretty sure I could squeeze more performance out of the current setup, but I wanted to ask if there's a better way.
The setup itself is still a work in progress; assume that the system's architecture can be whatever suites your solution.
Be cautious of premature optimization.
Addition: The "Python backend runs an algorithm whose state..." is the session in the web framework. That's it. Let the Django framework maintain session state in cache. Period.
"The algorithm's per-user state undergoes many small changes as a user works with the application." Most web frameworks offer a cached session object. Often it is very high performance. See Django's session documentation for this.
Advice. [Revised]
It appears you have something that works. Leverage to learn your framework, learn the tools, and learn what knobs you can turn without breaking a sweat. Specifically, using session state.
Second, fiddle with caching, session management, and things that are easy to adjust, and see if you have enough speed. Find out whether MySQL socket or named pipe is faster by trying them out. These are the no-programming optimizations.
Third, measure performance to find your actual bottleneck. Be prepared to provide (and defend) the measurements as fine-grained enough to be useful and stable enough to providing meaningful comparison of alternatives.
For example, show the performance difference between persistent sessions and cached sessions.
I think that the multiprocessing framework has what might be applicable here - namely the shared ctypes module.
Multiprocessing is fairly new to Python, so it might have some oddities. I am not quite sure whether the solution works with processes not spawned via multiprocessing.
I think you can give ZODB a shot.
"A major feature of ZODB is transparency. You do not need to write any code to explicitly read or write your objects to or from a database. You just put your persistent objects into a container that works just like a Python dictionary. Everything inside this dictionary is saved in the database. This dictionary is said to be the "root" of the database. It's like a magic bag; any Python object that you put inside it becomes persistent."
Initailly it was a integral part of Zope, but lately a standalone package is also available.
It has the following limitation:
"Actually there are a few restrictions on what you can store in the ZODB. You can store any objects that can be "pickled" into a standard, cross-platform serial format. Objects like lists, dictionaries, and numbers can be pickled. Objects like files, sockets, and Python code objects, cannot be stored in the database because they cannot be pickled."
I have read it but haven't given it a shot myself though.
Other possible thing could be a in-memory sqlite db, that may speed up the process a bit - being an in-memory db, but still you would have to do the serialization stuff and all.
Note: In memory db is expensive on resources.
Here is a link: http://www.zope.org/Documentation/Articles/ZODB1
First of all your approach is not a common web development practice. Even multi threading is being used, web applications are designed to be able to run multi-processing environments, for both scalability and easier deployment .
If you need to just initialize a large object, and do not need to change later, you can do it easily by using a global variable that is initialized while your WSGI application is being created, or the module contains the object is being loaded etc, multi processing will do fine for you.
If you need to change the object and access it from every thread, you need to be sure your object is thread safe, use locks to ensure that. And use a single server context, a process. Any multi threading python server will serve you well, also FCGI is a good choice for this kind of design.
But, if multiple threads are accessing and changing your object the locks may have a really bad effect on your performance gain, which is likely to make all the benefits go away.
This is Durus, a persistent object system for applications written in the Python
programming language. Durus offers an easy way to use and maintain a consistent
collection of object instances used by one or more processes. Access and change of a
persistent instances is managed through a cached Connection instance which includes
commit() and abort() methods so that changes are transactional.
http://www.mems-exchange.org/software/durus/
I've used it before in some research code, where I wanted to persist the results of certain computations. I eventually switched to pytables as it met my needs better.
Another option is to review the requirement for state, it sounds like if the serialisation is the bottle neck then the object is very large. Do you really need an object that large?
I know in the Stackoverflow podcast 27 the reddit guys discuss what they use for state, so that maybe useful to listen to.