I want to have different behavior in a python script, depending on the type of file. I cannot use the filename extension as it may not be present or misleading. I could call the file utility and parse the output, but I would rather use a python builtin for portability.
So is there anything in python that uses heuristics to deduce the type of the file from its contents?
python-magic
pymagic
Probably others as well. "magic" is the magic keyword to search for. ;-)
Related
There is a patch to add support for the POSIX openat functions (and other *at functions like fstatat) to the python standard library that is marked as closed with resolution fixed, but the os, posix and platform modules do not currently include any of these methods.
These methods are the standard way of solving problems like this in C and other languages efficiently and without race conditions.
Are these included in the standard library currently somewhere? And if not, are there plans to include this in the future.
Yes, this is supported by passing the dir_fd argument to various functions in the standard os module. See for example os.open():
Open the file path and set various flags [...]
This function can support paths relative to directory descriptors with the dir_fd parameter.
If you want to use high-level file objects such as those returned by the builtin open() function, that function's documentation provides example code showing how to do this using the opener parameter to that function. Note that open() and os.open() are entirely different functions and should not be confused. Alternatively, you could open the file with os.open() and then pass the file descriptor number to os.fdopen() or to open().
It should also be noted that this currently only works on Unix; the portable and future-proof way to check for dir_fd support is to write code such as the following:
if os.open in os.supports_dir_fd:
# Use dir_fd.
else:
# Don't.
On the other hand, I'm not entirely sure Windows even allows opening a directory in the first place. You certainly can't do it with _open()/_wopen(), which are documented to fail if "the given path is a directory." To be safe, I recommend only trying to open the directory after you check for dir_fd support.
I understand that the normal way of using protobuf is to create the .proto and then compile it into the relevant class - Java, Python, etc. I have a requirement which might need to parse the .proto file in Python code. Has anyone tried creating own parser for the .proto file? Will it be recommended to always compile the class instead of directly parsing the .proto?
It probably won't help you directly, but yes, I've written my own parser (live demo, parser source). This code is C# hence why it probably won't help, but it clearly is possible. I started that branch 9 days ago, and now it is basically feature-complete including parser, generator, and an interactive web-site with syntax-error highlighting - so it isn't necessarily a huge amount of work.
However! You may find it easier just to shell execute "protoc" (available on maven). If you use the -oFILE / --descriptor_set_out=FILE switch (same thing, alternative syntax), then it parses the input .proto file and writes a file that is a serialized FileDescriptorSet from descriptor.proto. This means you can use your regular tools to generate code in your chosen language for descriptor.proto, then deserialize the file as a FileDescriptorSet instance. Once you've done that: you can just walk the object model to see the files, messages, enums, fields, etc. IIRC some protobuf implementations support working entirely from a descriptor (which is what protoc emits), without the codegen step.
In Python, what does "i" represent in .pyi extension?
In PEP-484, it mentions .pyi is "a stub file" but no mnemonic help on the extension. So does the "i" mean "Include"? "Implementation"? "Interface"?
I think the i in .pyi stands for "Interface"
Definition for Interface in Java:
An interface in the Java programming language is an abstract type that
is used to specify a behaviour that classes must implement
From Python typeshed github repository:
Each Python module is represented by a .pyi "stub". This is a normal
Python file (i.e., it can be interpreted by Python 3), except all the
methods are empty.
In 'Mypy' repository, they explicitly mention "stub" files as public interfaces:
A stubs file only contains a description of the public interface of
the module without any implementations.
Because "Interfaces" do not exist in Python (see this SO question between Abstract class and Interface) I think the designers intended to dedicate a special extension for it.
pyi implements "stub" file (definition from Martin Fowler)
Stubs: provide canned answers to calls made during the test, usually
not responding at all to anything outside what's programmed in for the
test.
But people are more familiar with Interfaces than "stub" files, therefore it was easier to choose .pyi rather than .pys to avoid unnecessary confusion.
Apparently PyCharm creates .pyi file for its own purposes:
The *.pyi files are used by PyCharm and other development tools to provide
more information, such as PEP 484 type hints, than it is able to glean from
introspection of extension types and methods. They are not intended to be
imported, executed or used for any other purpose other than providing info
to the tools. If you don't use use a tool that makes use of .pyi files then
you can safely ignore this file.
See: https://www.python.org/dev/peps/pep-0484/
https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html
This comment was found in: python27/Lib/site-packages/wx/core.pyi
The i in .pyi stands for ‘interface’.
The .pyi extension was first mentioned in this GitHub issue thread where JukkaL says:
I'd probably prefer an extension with just a single dot. It also needs to be something that is not in use (it should not be used by cython, etc.). .pys seems to be used in Windows (or was). Maybe .pyi, where i stands for an interface definition?
Another way to explain the contents of a module that Wing can't figure out is with a pyi Python Interface file. This file is merely a Python skeleton with the proper structure, call signature, and return values to correspond to the functions, attributes, classes, and methods specified in a module.
Regarding syntax in Python
Why do we use open("file") to open but not "file".close() to close it?
Why isn't it "file".open() or inversely close("file")?
It's because open() is a function, and .close() is an object method. "file".open() doesn't make sense, because you're implying that the open() function is actually a class or instance method of the string "file". Not all strings are valid files or devices to be opened, so how the interpreter is supposed to handle "not a file-like device".open() would be ambiguous. We don't use "file".close() for the same reason.
close("file") would require a lookup of the file name, then another lookup to see if there are file handles, owned by the current process, attached to that file. That would be very inefficient and probably has hidden pitfalls that would make it unreliable (for example, what if it's not a file, but a TTY device instead?). It's much faster and simpler to just keep a reference to the opened file or device and close the file through that reference (also called a handle).
Many languages take this approach:
f = open("file") # open a file and return a file object or handle
# stuff...
close(f) # close the file, using the file handle or object as a reference
This looks similar to your close("file") construct, but don't be fooled: it's closing the file through a direct reference to it, not the file name as stored in a string.
The Python developers have chosen to do the same thing, but it looks different because they have implemented it with an object-oriented approach instead. Part of the reason for this is that Python file objects have a lot of methods available to them, such as read(), flush(), seek(), etc. If we used close(f), then we would have to either change all of the rest of the file object methods to functions, or let it be one random function that behaves differently from the rest for no good reason.
TL;DR
The design of open() and file.close() is consistent with OOP principals and good file reference practices. open() is a factory-like function that creates objects that reference files or other devices. Once the object is created, then all other operations on that object are done through class or instance methods.
Normally you shouldn't use "file".close() explicitly but use open(file) as contextmanager so the file handle is also closed if an exception happened. End of problem :-)
But to actually answer your question I assume the reason is that open supports many options and the returned class differs depending on these options (see also io module). So it would be simply much more complicated for the end-user to remember which class he wants and then use the "class".open with the right class itself. Note you can also pass "integer file descriptor of the file to be wrapped." to open, this would mean besides having a str.open() method you also get a int.open(). This would be really bad OO design but also confusing. I wouldn't care to guess what kind of questions would be asked on StackOverflow about that ("door".open(), (1).open())...
However I must admit that there is a pathlib.Path.open function. But if you have a Path it isn't ambiguous anymore.
As to a close() function: Each instance will have a close() method already and there are no differences between the different classes, so why create an additional function? There is simply no advantage.
Only slightly less new than you but I'll give this one a go, basically opening and closing are pretty different actions in a language like python. When you are opening the file what you are really doing is creating an object to be worked within your application that represents the file, so you create it with a function that informs the OS that the file has been opened and creates an object that python can use to read and write to the file. When it comes time to close the file what basically needs to be done is for your app to tell the OS that it is done with the file and dispose of the object that represented the file fro memory, and the easiest way to do that is with a method on the object itself. Also note that a syntax like "file".open would require the string type to include methods for opening files, which would be a very strange design and require a lot of extensions on the string type for anything else you wanted to implement with that syntax. close(file) would make a bit more sense but would still be a clunky way of releasing that object/letting the OS know the file was no longer open, and you would be passing a variable file representing the object created when you opened the file rather than a string pointing to the file's path.
In addition to what has been said, I quote the changelog in Python removing the built in file type. It (briefly) explains why the class constructor approach using the file type (available in Python 2) was removed in Python 3:
Removed the file type. Use open(). There are now several different kinds of streams that open can return in the io module.
Basically, while file("filename") would create an instance of file, open("filename") can return instances of different stream classes, depending on the mode.
https://docs.python.org/3.4/whatsnew/3.0.html#builtins
Python's raw_input() preserves all bash features like arrow keys and reverse searching...
But when I use Perl's <> to read from stdin,none of the features are supported any more...
What's the easiest way to do it as raw_input in Perl?
According to the Python docs,
If the readline module was loaded, then raw_input() will use it to provide elaborate line editing and history features.
As stated in comments to the original post, you need to use an appropriate module, such as Term::ReadLine or Term::ReadLine::Gnu, to access those features. This is no different than Python - if you want readline's features, you have to load a readline module, whether implicitly or explicitly.
But, yes, you will need to use your chosen readline module's input function instead of <> for any input which you want to have processed through readline. (Term::ReadLine::Perl includes a Term::ReadLine::Perl::Tied module which might override <> to run through readline without requiring additional code changes, but T::RL::P hasn't been updated since 2009 and seems to be undocumented, so I wouldn't recommend it unless you have plenty of time to figure out how to use it.)