I've read here about importing a module in python. There is an option to not import a whole module (e.g. sys) and to only import a part of it (e.g. sys.argv). Is that possible in C? Can I include only the implementation of printf or any other function instead of the whole stdio.h library?
I ask this because it seems very inefficient to include a whole file where I need only several lines of code.
I understand that there is a possibility that including only the function itself won't work because it depends on other functions, other includes, defines, and globals. I only ask in order to use this for whole code blocks that contain all the data that are needed in order to execute.
C does not have anything that is equivalent to, or even similar to Python's "from ... import" mechanism.
I ask this because it seems very inefficient to include a whole file where I need only several lines of code.
Actually, what normally happens when you #include a file is that you import the declarations for macros, or functions declared somewhere else. You don't import any executable code ... so the "unnecessary" inclusions have ZERO impact on runtime code size or efficiency.
If you use (i.e. "call") a macro, then that causes the macro body to expanded, which adds to the executable code size.
If you call a function whose declaration you have included, that will add the code ... for the call statement itself. The function does not expanded though. Instead, an "external reference" is added to your ".o" file, which the loader resolves when you create the executable from the ".o" files and the dependent libraries.
Python: "There is an option to not import a whole module" : I think you misunderstand what is going on here. When you specify the names to import, it means that only those names go into you namespace. The "whole" module is compiled, and any code outside functions is run, even when you specify just one name.
C: I am going to assume that you are using an operating system like UNIX/Linux/OS X or Windows (the following does not apply to embedded systems).
The closest C has to import is dynamic runtime linking. That is not part of standard C, it is defined by the operating system. So POSIX has one mechanism and Windows has another. Most people call these library files "DLLs", but strictly speaking that is a Microsoft term, they are "shared objects" (.so) on UNIX type systems.
When a process attaches to a DLL or .so then it is "mapped" into the virtual memory of the process. The detail here varies between operating systems, but essentially the code is split into "pages", the size of which varies, but 4kb for 32-bit systems and 16kb for 64-bit is typical. Only those pages that are required are loaded into memory. When a page is required then a so-called "page-fault" occurs and the operating system will get the page from either the executable file or the swap area (depending on the OS).
One of the advantages of this mechanism is that code pages can be shared between processes. So if you have 50 processes all using the same DLL (like the C run-time library, for example), then only one copy is actually loaded into memory. They all share the one set of pages (they can because they are read-only).
There is no sharing mechanism like that in Python - unless the module is itself written in C and is a DLL (.pyd).
All this occurs without the knowledge of the program.
EDIT: looking at other's answers I realise you might be thinking of the #include pre-processor directive to merge a header file into the source code. Assuming these are standard header files, then they make no difference to the size of your executable, they should be "idempotent". That is, they only contain information of use by the pre-processor, compiler, or linker. If there are definitions in the header file that are not used there should be no side effect.
Linking libraries (-l directive to the compiler) that are not used will make the executable larger, which makes the page tables larger, but aside from that if they are not used then they shouldn't make any significant difference. That is because of the on-demand page-loading described above (the concept was invented in the 1960s in Manchester UK).
Related
So you've got some legacy code lying around in a fairly hefty project. How can you find and delete dead functions?
I've seen these two references: Find unused code and Tool to find unused functions in php project, but they seem specific to C# and PHP, respectively.
Is there a Python tool that'll help you find functions that aren't referenced anywhere else in the source code (notwithstanding reflection/etc.)?
In Python you can find unused code by using dynamic or static code analyzers. Two examples for dynamic analyzers are coverage and figleaf. They have the drawback that you have to run all possible branches of your code in order to find unused parts, but they also have the advantage that you get very reliable results.
Alternatively, you can use static code analyzers that just look at your code, but don't actually run it. They run much faster, but due to Python's dynamic nature the results may contain false positives.
Two tools in this category are pyflakes and vulture. Pyflakes finds unused imports and unused local variables. Vulture finds all kinds of unused and unreachable code. (Full disclosure: I'm the maintainer of Vulture.)
The tools are available in the Python Package Index https://pypi.org/.
I'm not sure if this is helpful, but you might try using the coverage, figleaf or other similar modules, which record which parts of your source code is used as you actually run your scripts/application.
Because of the fairly strict way python code is presented, would it be that hard to build a list of functions based on a regex looking for def function_name(..) ?
And then search for each name and tot up how many times it features in the code. It wouldn't naturally take comments into account but as long as you're having a look at functions with less than two or three instances...
It's a bit Spartan but it sounds like a nice sleepy-weekend task =)
unless you know that your code uses reflection, as you said, I would go for a trivial grep. Do not underestimate the power of the asterisk in vim as well (performs a search of the word you have under your cursor in the file), albeit this is limited only to the file you are currently editing.
Another solution you could implement is to have a very good testsuite (seldomly happens, unfortunately) and then wrap the routine with a deprecation routine. if you get the deprecation output, it means that the routine was called, so it's still used somewhere. This works even for reflection behavior, but of course you can never be sure if you don't trigger the situation when your routine call is performed.
its not only searching function names, but also all the imported packages not in use.
you need to search the code for all the imported packages (including aliases) and search used functions, then create a list of the specific imports from each package (example instead of import os, replace with from os import listdir, getcwd,......)
I have an application that dynamically generates a lot of Python modules with class factories to eliminate a lot of redundant boilerplate that makes the code hard to debug across similar implementations and it works well except that the dynamic generation of the classes across the modules (hundreds of them) takes more time to load than simply importing from a file. So I would like to find a way to save the modules to a file after generation (unless reset) then load from those files to cut down on bootstrap time for the platform.
Does anyone know how I can save/export auto-generated Python modules to a file for re-import later. I already know that pickling and exporting as a JSON object won't work because they make use of thread locks and other dynamic state variables and the classes must be defined before they can be pickled. I need to save the actual class definitions, not instances. The classes are defined with the type() function.
If you have ideas of knowledge on how to do this I would really appreciate your input.
You’re basically asking how to write a compiler whose input is a module object and whose output is a .pyc file. (One plausible strategy is of course to generate a .py and then byte-compile that in the usual fashion; the following could even be adapted to do so.) It’s fairly easy to do this for simple cases: the .pyc format is very simple (but note the comments there), and the marshal module does all of the heavy lifting for it. One point of warning that might be obvious: if you’ve already evaluated, say, os.getcwd() when you generate the code, that’s not at all the same as evaluating it when loading it in a new process.
The “only” other task is constructing the code objects for the module and each class: this requires concatenating a large number of boring values from the dis module, and will fail if any object encountered is non-trivial. These might be global/static variables/constants or default argument values: if you can alter your generator to produce modules directly, you can probably wrap all of these (along with anything else you want to defer) in function calls by compiling something like
my_global=(lambda: open(os.devnull,'w'))()
so that you actually emit the function and then a call to it. If you can’t so alter it, you’ll have to have rules to recognize values that need to be constructed in this fashion so that you can replace them with such calls.
Another detail that may be important is closures: if your generator uses local functions/classes, you’ll need to create the cell objects, perhaps via “fake” closures of your own:
def cell(x): return (lambda: x).__closure__[0]
I have a C++ program (in fact it is a dll) which dynamically links to another shared library (python dll), this program has two occasions to be used.
In occasion A, the program will make function calls to that dynamically linked shared library while in occasion B, the program will not.
My question is if I build the program specially for occasion B without linking to the shared library, will I gain performance comparing to the case where I link the shared library without actually using it?
It's really dependent on several factors: What OS, what shared library, and what the application actually does. Possibly also how the shared library is built.
In general, it's not a particularly large penalty, since shared libraries are demand-loaded and use position independent addressing [PIC] (PC-relative, and similar). What this means is that the shared library is being loaded only when it actually is used, and that there's "no work" to load the library. This is something that OS designers and system architects think a lot about, because for many applications that are performance sensitive (for example compilers or web-services), a badly designed shared library system will make performance BAD.
It is of course possible to configure when building the shared library. At least the use of PIC aspect of this, so if the person/company configuring the build of the shared library "wants to", it could be badly configured and worse than zero effect.
To this you have to add any initialization that the shared library does. Well designed shared libraries does "on demand" or "lazy" initialization, in other words, doesn't do much initialization until it's actually required. Again, the details of exactly which library - including how that was configured when it was built - can make a huge difference here.
The only REAL way to tell, in any particular use-case, is to build "with" and "without" the extra shared library, and measure the actual performance.
I have a reporting function on my Web Application that has been subjected to Feature Creep -- instead of providing a PDF, it must now provide a .zip file that contains a variety of documents.
Generating the documents is fine. Adding documents to the Zipfile is the issue.
Until now, the various documents to be added to the archive have existed as a mix of cStringIO, StringIO or tempfile.SpooledTemporaryFile objects.
Digging into the zipfile library docs, it appears that the module's [write][1] function will only work with Strings or paths to physical files on the machine; it does not work with file-like objects.
Just to be clear: zipfile can read/write from a file-like object as the archive itself (zipfile.ZipFile), however when adding an element to the archive, the library only supports a pathname or raw string data.
I found an online blog posting suggesting a possible workaround, but I'm not eager to use it on my production machine. http://swl10.blogspot.com/2012/12/writing-stream-to-zipfile-in-python.html
Does anyone have other strategies for handling this? It looks like I have to either save everything to disk and take a hit on I/O , or handle everything as a string and take a hit on memory. Neither are ideal.
Use the solution you are referring to (using monkeypathing).
Regarding your concerns about monkeypathing thing not sounding solid enough: Do elaborate on how can be monkeypatched method used from other places.
Hooking in Python is no special magic. It means, that someone assigns alternative value/function to something, what is already defined. This has to be done by a line of code and has very limited scope.
In the example from the blog is the scope of monkeypatched os functions just the class ZipHooks.
Do not be afraid, that it would leak somewhere else without your knowledge or break complete system. Even other packages, importing your module with ZipHooks class would not have access to pathed stat and open unless they would use ZipHooks class or explicitly call stat_hook or open_hook from your package.
In compiled languages, the source code is turned into object code by the compiler and the different object files (if there are multiple files) are linked by the linker and loaded into the memory by the loader for execution.
If I have an application written using an interpreted language (for eg., ruby or python) and if the source code is split across files, when exactly are the files brought together. To put it other words when is the linking done? Do interpreted languages have Linkers and Loaders in the first place or the interpreter does everything?
I am really confused about this and not able to get my head around it!! Can anyone shine some light on this?!
An interpreted language is more or less a large configuration for an executable that is called interpreter. That executable (e. g. /usr/bin/python) is the program which actually runs. It then reads the script it shall execute (e. g. /home/alfe/bin/factorial.py) and executes it, in the simplest form line-by-line.
During that process it can encounter references to other files (other modules, e. g. /usr/python/lib/math.py) and then it will read and interpret those.
Many such languages have mechanisms built in to reduce the overhead of this process by creating byte-code versions of the scripts they interpreted. So there might well be a file /usr/python/lib/math.pyc for instance, which the interpreter put there after first processing and which it can faster read and interpret than the original /usr/python/lib/math.py. But this is not really part of the concept of interpreted languages¹.
Sometimes, a binary library is part of an interpreted language; depending on the sophistication of the interpreter it can link that library at runtime and then use it. This is most typical for the system modules and stuff which needs to be highly optimized.
But in general one can say that no binary machine code gets generated at all. And nothing is linked at the compile time. Actually, there is no real compile time, even though one could call that first processing of the input scripts a compile step.
Footnotes:
¹) The concept of interpreting scripts does encompass neither that "compiling" (pre-translating of the source into a faster-to-interpret form) nor that "caching" of this form by storing files like the .pyc files. WRT to your question concerning linking and splitting programs into several files or modules, these aspects of precompiling and caching are just technical details to speed up things. The concept itself is: read one line of the input script & execute it. Then read the next line and so on.
Well, in Python, modules are loaded and executed or parsed when the interpreter finds some method or indication to do so. There's no linking but there is loading of course (when the file is requested in the code).
Python do something clever to improve its performance. It compiles to bytecode (.pyc files) the first time it executes a file. This improves substantially the execution of the code next time the module is imported or executed.
So the behavior is more or less:
A file is executed
Inside the file, the interpreter finds a reference to another file
It parses it and potentially execute it. This means that every class, variable or method definition will become available in the runtime.
And this is how the process is done (very general). Of course, there are optimizations and caches to improve the performance.
Hope this helps!