Support for POSIX openat functions in python

Support for POSIX openat functions in python - python

There is a patch to add support for the POSIX openat functions (and other *at functions like fstatat) to the python standard library that is marked as closed with resolution fixed, but the os, posix and platform modules do not currently include any of these methods.
These methods are the standard way of solving problems like this in C and other languages efficiently and without race conditions.
Are these included in the standard library currently somewhere? And if not, are there plans to include this in the future.

Yes, this is supported by passing the dir_fd argument to various functions in the standard os module. See for example os.open():
Open the file path and set various flags [...]
This function can support paths relative to directory descriptors with the dir_fd parameter.
If you want to use high-level file objects such as those returned by the builtin open() function, that function's documentation provides example code showing how to do this using the opener parameter to that function. Note that open() and os.open() are entirely different functions and should not be confused. Alternatively, you could open the file with os.open() and then pass the file descriptor number to os.fdopen() or to open().
It should also be noted that this currently only works on Unix; the portable and future-proof way to check for dir_fd support is to write code such as the following:
if os.open in os.supports_dir_fd:
# Use dir_fd.
else:
# Don't.
On the other hand, I'm not entirely sure Windows even allows opening a directory in the first place. You certainly can't do it with _open()/_wopen(), which are documented to fail if "the given path is a directory." To be safe, I recommend only trying to open the directory after you check for dir_fd support.

Related

What does "i" represent in Python .pyi extension?

In Python, what does "i" represent in .pyi extension?
In PEP-484, it mentions .pyi is "a stub file" but no mnemonic help on the extension. So does the "i" mean "Include"? "Implementation"? "Interface"?

I think the i in .pyi stands for "Interface"
Definition for Interface in Java:
An interface in the Java programming language is an abstract type that
is used to specify a behaviour that classes must implement
From Python typeshed github repository:
Each Python module is represented by a .pyi "stub". This is a normal
Python file (i.e., it can be interpreted by Python 3), except all the
methods are empty.
In 'Mypy' repository, they explicitly mention "stub" files as public interfaces:
A stubs file only contains a description of the public interface of
the module without any implementations.
Because "Interfaces" do not exist in Python (see this SO question between Abstract class and Interface) I think the designers intended to dedicate a special extension for it.
pyi implements "stub" file (definition from Martin Fowler)
Stubs: provide canned answers to calls made during the test, usually
not responding at all to anything outside what's programmed in for the
test.
But people are more familiar with Interfaces than "stub" files, therefore it was easier to choose .pyi rather than .pys to avoid unnecessary confusion.

Apparently PyCharm creates .pyi file for its own purposes:
The *.pyi files are used by PyCharm and other development tools to provide
more information, such as PEP 484 type hints, than it is able to glean from
introspection of extension types and methods. They are not intended to be
imported, executed or used for any other purpose other than providing info
to the tools. If you don't use use a tool that makes use of .pyi files then
you can safely ignore this file.
See: https://www.python.org/dev/peps/pep-0484/
https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html
This comment was found in: python27/Lib/site-packages/wx/core.pyi

The i in .pyi stands for ‘interface’.
The .pyi extension was first mentioned in this GitHub issue thread where JukkaL says:
I'd probably prefer an extension with just a single dot. It also needs to be something that is not in use (it should not be used by cython, etc.). .pys seems to be used in Windows (or was). Maybe .pyi, where i stands for an interface definition?

Another way to explain the contents of a module that Wing can't figure out is with a pyi Python Interface file. This file is merely a Python skeleton with the proper structure, call signature, and return values to correspond to the functions, attributes, classes, and methods specified in a module.

Extension for Python

It is written in a documentation:
Such extension modules can do two things that can’t be done directly
in Python: they can implement new built-in object types, and they can
call C library functions and system calls.
Syscalls
I cannot see why "system calls" are special here. I know what it is syscall. I didn't see why it is special and why it cannot be done directly in Python.
Especially, we can use open in Python to open a file. It must be a underlying syscall to get descriptor for file ( in Unix systems).
It was just open. Besides that we can use: call(["ls", "-l"]) and it also must use syscall like execve or something like that.
Functions
Why is calling C library function is special? After all:
ctypes is a foreign function library for Python. It provides C
compatible data types, and allows calling functions in DLLs or shared
libraries. It can be used to wrap these libraries in pure Python.

Essentially system calls interact with the underlying system services(that is the Kernel for Linux). C functions on the other hand run on user space exclusively. To that sense system call is more "special".

Does C has a "from-import"-like mechanism?

I've read here about importing a module in python. There is an option to not import a whole module (e.g. sys) and to only import a part of it (e.g. sys.argv). Is that possible in C? Can I include only the implementation of printf or any other function instead of the whole stdio.h library?
I ask this because it seems very inefficient to include a whole file where I need only several lines of code.
I understand that there is a possibility that including only the function itself won't work because it depends on other functions, other includes, defines, and globals. I only ask in order to use this for whole code blocks that contain all the data that are needed in order to execute.

C does not have anything that is equivalent to, or even similar to Python's "from ... import" mechanism.
I ask this because it seems very inefficient to include a whole file where I need only several lines of code.
Actually, what normally happens when you #include a file is that you import the declarations for macros, or functions declared somewhere else. You don't import any executable code ... so the "unnecessary" inclusions have ZERO impact on runtime code size or efficiency.
If you use (i.e. "call") a macro, then that causes the macro body to expanded, which adds to the executable code size.
If you call a function whose declaration you have included, that will add the code ... for the call statement itself. The function does not expanded though. Instead, an "external reference" is added to your ".o" file, which the loader resolves when you create the executable from the ".o" files and the dependent libraries.

Python: "There is an option to not import a whole module" : I think you misunderstand what is going on here. When you specify the names to import, it means that only those names go into you namespace. The "whole" module is compiled, and any code outside functions is run, even when you specify just one name.
C: I am going to assume that you are using an operating system like UNIX/Linux/OS X or Windows (the following does not apply to embedded systems).
The closest C has to import is dynamic runtime linking. That is not part of standard C, it is defined by the operating system. So POSIX has one mechanism and Windows has another. Most people call these library files "DLLs", but strictly speaking that is a Microsoft term, they are "shared objects" (.so) on UNIX type systems.
When a process attaches to a DLL or .so then it is "mapped" into the virtual memory of the process. The detail here varies between operating systems, but essentially the code is split into "pages", the size of which varies, but 4kb for 32-bit systems and 16kb for 64-bit is typical. Only those pages that are required are loaded into memory. When a page is required then a so-called "page-fault" occurs and the operating system will get the page from either the executable file or the swap area (depending on the OS).
One of the advantages of this mechanism is that code pages can be shared between processes. So if you have 50 processes all using the same DLL (like the C run-time library, for example), then only one copy is actually loaded into memory. They all share the one set of pages (they can because they are read-only).
There is no sharing mechanism like that in Python - unless the module is itself written in C and is a DLL (.pyd).
All this occurs without the knowledge of the program.
EDIT: looking at other's answers I realise you might be thinking of the #include pre-processor directive to merge a header file into the source code. Assuming these are standard header files, then they make no difference to the size of your executable, they should be "idempotent". That is, they only contain information of use by the pre-processor, compiler, or linker. If there are definitions in the header file that are not used there should be no side effect.
Linking libraries (-l directive to the compiler) that are not used will make the executable larger, which makes the page tables larger, but aside from that if they are not used then they shouldn't make any significant difference. That is because of the on-demand page-loading described above (the concept was invented in the 1960s in Manchester UK).

Reading the builtin python modules [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I find the location of Python module sources?
I dont understand how to read the code in the builtin python modules. I know how to find out whats in a module for example,
import os;
dir(os)
But when I try to look for example for the function listdir I cannot find a def listdir to read what it actually does.

One word: inspect.
The inspect module provides several useful functions to help get information about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code objects. For example, it can help you examine the contents of a class, retrieve the source code of a method, extract and format the argument list for a function, or get all the information you need to display a detailed traceback.
It's in the standard library, and the docs have examples. So, you just print(inspect.getsource(os)), or do inspect.getsourcefile(os), etc.
Note that some of the standard-library modules are written in C (or are even fake modules built into the interpreter), in which case getsourcefile returns nothing, but getfile will at least tell you it's a .so/.pyd/whatever, which you can use to look up the original C source in, say, a copy of the Python source code.
You can also just type help(os), and the FILE right at the top gives you the path (generally the same as getsourcefile for Python modules, the same a getfile otherwise).
And you can always go to the online source for the Python modules and C extension modules. Just change the "2.7" to "3.3", etc., in the URL to get different versions. (I believe if you remove the version entirely, you get the trunk code, currently corresponding to 3.4 pre-alpha, but don't quote me on that.)
The os.listdir function isn't actually defined directly in os; it's effectively from <platform-specific-module> import * imported. You can trace it down through a few steps yourself, but it's usually going to be posix_listdir in posixmodule.c on most platforms. (Even Windows—recent versions use the same file to define the posix module on non-Windows, and the nt and posix modules on Windows, and there's a bunch of #if defined(…) stuff in the code.)

Is there a python-equivalent of the unix "file" utility?

I want to have different behavior in a python script, depending on the type of file. I cannot use the filename extension as it may not be present or misleading. I could call the file utility and parse the output, but I would rather use a python builtin for portability.
So is there anything in python that uses heuristics to deduce the type of the file from its contents?

python-magic
pymagic
Probably others as well. "magic" is the magic keyword to search for. ;-)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.