Loading time shared library too large - python

I'm connecting one lib(*.so) with ctypes. However, the loading time is very large. That is very slow.
What technique can I use to improve performance?
My module will always run at the prompt. Will run a command at a time.
> $./myrunlib.py fileQuestion fileAnswer
# again
> $./myrunlib.py fileQuestion fileAnswer
code:
from ctypes import *
drv = cdll.LoadLibrary('/usr/lib/libXPTO.so')

Either you've got a strange bug which makes your library load extremely slowly when used by a Python program (which I find rather unlikely), or the loading take the time it takes (maybe because the library does a large initialization task upon being loaded).
In the latter case your only option seems to be to prevent any restarts of your Python program. Let it run in a loop which reads all tasks from stdin (or any other pipe or socket or maybe even from job files) instead of from the command line.

Related

Is it possible to initialise a module before running a python program?

I wrote a python program which uses a module (pytesseract, specifically) and I notice it takes a few seconds to import the module once I run it. I am wondering if there is a way to initialise the module before running the main program in order to cut the duration of the actual program by a few seconds. Any suggestions?
One possible solution for slow startup time would be to split your program into two parts--one part that is always running as a daemon or service and another that communicates with it to process individual tasks.
As a quick answer without more info, pytesseract also imports (if they are installed) PIL, numpy, and pandas. If you don't need these, you could uninstall them to reduce load time.
I presume that you need to start your application multiple times with different arguments and you don't want to waste time on imports every time, right?
You can wrap actual code in while True: and use input() to get new arguments. Or read arguments from the file.

How do I speed up repeated calls a ruby program (github's linguist) from python?

I'm using github's linguist to identify unknown source code files. Running this from the command line after a gem install github-linguist is insanely slow. I'm using python's subprocess module to make a command-line call on a stock Ubuntu 14 installation.
Running against an empty file: linguist __init__.py takes about 2 seconds (similar results for other files). I assume this is completely from the startup time of Ruby. As #MartinKonecny points out, it seems that it is the linguist program itself.
Is there some way to speed this process up -- or a way to bundle the calls together?
One possibility is to just adapt the linguist program (https://github.com/github/linguist/blob/master/bin/linguist) to take multiple paths on the command-line. It requires mucking with a bit of Ruby, sure, but it would make it possible to pass multiple files without the startup overhead of Linguist each time.
A script this simple could suffice:
require 'linguist/file_blob'
ARGV.each do |path|
blob = Linguist::FileBlob.new(path, Dir.pwd)
# print out blob.name, blob.language, blob.sloc, etc.
end

Optimal way of matlab to python communication

So I am working on a Matlab application that has to do some communication with a Python script. The script that is called is a simple client software. As a side note, if it would be possible to have a Matlab client and a Python server communicating this would solve this issue completely but I haven't found a way to work that out.
Anyhow, after searching the web I have found two ways to call Python scripts, either by the system() command or editing the perl.m file to call Python scripts instead. Both ways are too slow though (tic tocing them to > 20ms and must run faster than 6ms) as this call will be in a loop that is very time sensitive.
As a solution I figured I could instead save a file at a certain location and have my Python script continuously check for this file and when finding it executing the command I want it to. Now after timing each of these steps and summing them up I found it to be incredibly much faster (almost 100x so for sure fast enough) and I cant really believe that, or rather I cant understand why calling python scripts is so slow (not that I have more than a superficial knowledge of the subject). I also found this solution to be really messy and ugly so just wanted to check that, first, is it a good idea and second, is there a better one?
Finally, I realize that the Python time.time() and Matlab tic, toc might not be precise enough to measure time correctly on that scale so also a reason why I ask.
Spinning up new instances of the Python interpreter takes a while. If you spin up the interpreter once, and reuse it, this cost is paid only once, rather than for every run.
This is normal (expected) behaviour, since startup includes large numbers of allocations and imports. For example, on my machine, the startup time is:
$ time python -c 'import sys'
real 0m0.034s
user 0m0.022s
sys 0m0.011s

Speed up feedparser

I'm using feedparser to print the top 5 Google news titles. I get all the information from the URL the same way as always.
x = 'https://news.google.com/news/feeds?pz=1&cf=all&ned=us&hl=en&topic=t&output=rss'
feed = fp.parse(x)
My problem is that I'm running this script when I start a shell, so that ~2 second lag gets quite annoying. Is this time delay primarily from communications through the network, or is it from parsing the file?
If it's from parsing the file, is there a way to only take what I need (since that is very minimal in this case)?
If it's from the former possibility, is there any way to speed this process up?
I suppose that a few delays are adding up:
The Python interpreter needs a while to start and import the module
Network communication takes a bit
Parsing probably consumes only little time but it does
I think there is no straightforward way of speeding things up, especially not the first point. My suggestion is that you have your feeds downloaded on a regularly basis (you could set up a cron job or write a Python daemon) and stored somewhere on your disk (i.e. a plain text file) so you just need to display them at your terminal's startup (echo would probably be the easiest and fastest).
I personally made good experiences with feedparser. I use it to download ~100 feeds every half hour with a Python daemon.
Parse at real time not better case if you want faster result.
You can try does it asynchronously by Celery or by similar other solutions. I like the Celery, it gives many abilities. There are abilities as task as the cron or async and more.

same python interpreter instance running multiple scripts simultaneously?

6-7 years ago i saw an initiative of a way to run python on tight resources env by running the interpreter only once, while allowing several scripts to use it at the same time.
the idea was bot the save the interpreter startup overhead and to save RAM.
Does something alike exists?
this question
Python: Execute multiple Scripts simultaneously from same Interpreter
doesn't address concurrency. at least the answers were about sequential running, but i need simultaneously :)
ideas?
Yes and no. Python itself uses a Global Interpreter Lock (GIL), which you can read a lot about, if you care to. To make a long story short, however, it ensures the interpreter is basically single-threaded. You can create (and run) more than one thread in your Python program, but when/if they use the Python interpreter, only one can do so at a time. If, however, you have threads running mostly code from something like SciPy or NumPy (which is native code that doesn't get interpreted) then you can run several concurrently.
Most operating systems, however, have a Copy On Write mechanism for process memory pages, which means that (as long as the code isn't modified) most of the code used by the interpreter will be shared without any extra work on your part (or the interpreter's) at all. IOW, when you run two or more copies of the interpreter, the second and subsequent will share most of the memory (at least for executable code) with the first, so resource usage will not rise (anywhere close to) linearly as you run more instances. Startup time will also be substantially reduced -- the OS has to create a new page table mapping the memory pages to the new process, but does not need to reread those pages from disk or anything like that.
Python supports threading via the thread and threading modules (one is lowlevel, the other one highlevel).

Categories

Resources