Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have a few files that are ~64GB in size that I think I would like to convert to hdf5 format. I was wondering what the best approach for doing so would be? Reading line-by-line seems to take more than 4 hours, so I was thinking of using multiprocessing in sequence, but was hoping for some direction on what would be the most efficient way without resorting to hadoop. Any help would be very much appreciated. (and thank you in advance)
For this type of problem I typically turn from Python. You're right that multiprocessing/parallelization is a good solution, but Python is not pleasant to work with in this area. Consider trying something on the JVM. I like Clojure's core.async, but there's also the peach ("parallel each") or celluloid libraries for JRuby that's much closer to Python.
The approach doesn't have to be as "heavy" as Hadoop, but I'd still use a similar map/reduce pattern over the files. Have a thread that is reading line by line from the source file(s) and dispatching to several threads. (Using core.async I'd have multiple queues which are getting consumed by different threads, then feeding back a "finished" signal into a watchdog thread.) In the end you should be able to squeeze a lot of performance out of your CPU.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have a bit wide question. Maybe someone knows where I can find good examples of functional test frameworks on python.
I'm working on functional tests on python 2.5. I use standart unittest framework, but maybe where is some better ways, with support of nice html reporting.
Also maybe where is examples of tests structure.
Thanks
Unfortuantly I dont have that much experience with testing in Python, but I found these links which seems to contain a lot of useful information:
The sites below lists a lot of different test frameworks. One of which is called TestOOB and has html/xml reporting.
https://pythonhosted.org/testing/
http://pycheesecake.org/wiki/PythonTestingToolsTaxonomy
O'reilly also has a free book on test driven python development which can be found here:
http://chimera.labs.oreilly.com/books/1234000000754/index.html
This site also has some information and examples of different testing frameworks.
http://docs.python-guide.org/en/latest/writing/tests/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm a Python/Web dev who wants to build rich desktop applications. After realizing that both Qt and Kivy are trying to ram a shitty DSL down my throat (not saying that's a necessarily bad thing, I just kind of have an aversion to it), I thought I'd much rather work with the technologies I feel most comfortable with - namely, HTML5/CSS/JS on the front end and a back end driven by something like Tornado or Node.js.
What options would I then have for the container which would run the front end? Everything just looks so bloated and unwieldy.
I've had some success with running Chromium Embedded Framework via its Python bindings. I'd like to play around with a Gecko-based analogue, though.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I need a key-value database, like redis or memcached, but not in memory and rather on disk. After filling the database (which we do regularly and from scratch), I'd actually only need the get operation, but from many different processes (so Kyoto Cabinet and LevelDB do not work for me).
I need like 5 million keys and ~10-30gb of data, so some other simple databases don't work as well.
I can't find any information on whether RocksDB can handle multiple read-only clients; it's not straight-forward to build on my OS so I wanted to ask before doing that. If it can't, is there any database which would work? Preferably with an Ubuntu package and Python bindings ;-).
We're just using many-many small files now, but it really sucks, as we want easy backups, copying, etc. I also suspect this may cause slowdowns, but it doesn't really matter that much.
Yes, you should be able to run multiple read-only clients on a single RocksDB database. Just open the database with DB::OpenForReadOnly() call: https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L108
The simplest answer is probably Berkeley DB, and bindings are a part of the stdlib: https://docs.python.org/2/library/anydbm.html
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
My application polls an API repeatedly and spawns processes to parse any new data resulting from these calls, conditionally making an API request based on those data. The speed of that turnaround time is critical.
A large bottleneck seems to be related to the setup of the actual spawned processes themselves -- a few module imports and normal instantiation code, which take up to 0.05 seconds on a middling Amazon setup†. It seems like what it would be helpful to have a batch of processes with those imports/init code already done††, waiting to process results. What is the best approach to create/communicate with a pool (10-20?) of warm, reusable, and extremely lightweight processes in Python?
† - yes, I know throwing better hardware at the problem will help, and I'll do that too.
†† - yes, I know doing less will help, and I'm working on making the code as streamlined and minimal as possible
Well, you're in for a learning curve here, but multiprocessing.Pool() will create a pool of any number of processes you specify. Use the initializer= argument to specify a function each process will run at the start. Then there are several methods you can use to submit work items to the processes in the pool - read the docs, play with it, and ask questions if you get stuck.
One caution: "extremely lightweight processes" is impossible. By definition, processes are "heavy". "How heavy" is up to your operating system, and has approximately nothing to do with the programming language you use. If you're looking for lightweight, you're looking for threads.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
We handle huge data streams through our socket servers and in need of a non-block way to management callbacks to prevent race conditions.
Recently I came to know about functional reactive programming a method of programming and the solution is just what we are looking for.
There are examples in Haskell (reactive banana), ClojureScript and Javascript (bacon js), but none for python. Are there any libraries written for Python enabling Functional Reactive Programming? If there aren't any libraries, where is a good place to start? What are the possible challenges to write one?
There's an official Microsoft wip Rx (Reactive Extensions) implementation for Python called Rx.py.
This project targets Python 3.
I just checked the Wikipedia article on reactive programming, and in there, three modules are mentioned. You could check those out:
Trellis
Yoopf
Traits