Problem Statement
When path contains %25, flask seems to be mutating the incoming path to treat %25 as % instead of preserving the original request path. Here are the request and path variable:
Request: : GET http://localhost:5000/Files/dir %a/test %25a.txt
Flask request.base_url: http://localhost:5000/Files/dir%20%25a/test%20%25a.txt
Debug: 127.0.0.1 - - [14/Feb/2023 12:00:49] "GET /Files/dir%20%a/test%20%25a.txt HTTP/1.1" 200 -
Specifically the test %25a.txt seems to be encoded as test%20%25a.txt instead of test%20%2525a.txt.
Environment
Python 3
Ubuntu 20.04
Flask 2.2.x
Things Tried
Looks like others suggested that %25 is not allowed to be in url paths (Ref: In URL `%` is replaced by `%25` when using `queryParams` while routing in Angular).
Help Needed
Is %25 indeed not allowed to be in the request path ?
For apps that allow files to be named with %25 what would be a good way to handle this ?
https://www.rfc-editor.org/rfc/rfc7230 § 2.7 explains
that the path is comprised of
pchars,
which (roughly) are unreserved or pct-encoded.
Your favorite character definitely does not fall into this
or the similar delim category:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
So that leaves us with a percent-encoded %25,
which the spec
treats further.
Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI. Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, ...
And that is where things went south for you.
Now, one can tilt at windmills until Don Quixote brings the cows home,
but the fact of the matter is that software is made of bugs,
and they can be hard to isolate and get folks to fix.
The usual Pragmatic approach to sending a "forbidden" character
such as percent is to disguise it as it makes it way through
a software stack. Here's two common techniques.
Pick a seldom used character, perhaps ~ tilde. Map percent to tilde and vice-versa. Prohibit tilde in pathnames, or use percent-encoded %7E for it.
Base64 encode the pathname, and decode on the other end.
This tends to leave your URLs a bit uglier, a bit less informative,
than they would have been.
Given a pathname p, either it contains a percent or it doesn't.
Prepend 0 if it doesn't, and now it survives untouched, in a form that can be grep'd.
Prepend 1 if it does, and then use base64 or whatever.
Strip the leading digit on the other end and process appropriately.
Related
I have url address where its extension needs to be in ASCII/UTF-8
a='sAE3DSRAfv+HG='
i need to convert above as this:
a='sAE3DSRAfv%2BHG%3D'
I searched but not able to get it.
Please see built-in method urllib.parse.quote()
A very important task for the URL is its safe transmission. Its meaning must not change after you created it till it is received by the intended receiver. To achieve that end URL encoding was incorporated. See RFC 2396
URL might contain non-ascii characters like cafés, López etc. Or it might contain symbols which have different meaning when put in the context of a URL. For example, # which signifies a bookmark. To ensure safe transmitting of such characters HTTP standards maintain that you quote the url at the point of origin. And URL is always present in quoted format to anyone else.
I have put sample usage below.
>>> import urllib.parse
>>> a='sAE3DSRAfv+HG='
>>> urllib.parse.quote(a)
'sAE3DSRAfv%2BHG%3D'
>>>
I found a reference to a file in a log that had the following format:
\\?\C:\Path\path\file.log
I cannot find a reference to what the sequence of \?\ means. I believe the part between the backslashes refers to a hostname.
For instance, on my Windows computer, the following works just fine:
dir \\?\C:\
and also, just fine with same result:
dir \\.\C:\
Questions:
Is there a reference to what the question mark means in this particular path format?
What might generate a file path in such a format?
A long read, but worth reading if you are in this domain: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx
Extract:
The Windows API has many functions that also have Unicode versions to
permit an extended-length path for a maximum total path length of
32,767 characters. This type of path is composed of components
separated by backslashes, each up to the value returned in the
lpMaximumComponentLength parameter of the GetVolumeInformation
function (this value is commonly 255 characters). To specify an
extended-length path, use the "\\?\" prefix. For example,
"\\?\D:\very long path".
and:
The "\\?\" prefix can also be used with paths constructed according to
the universal naming convention (UNC). To specify such a path using
UNC, use the "\\?\UNC\" prefix. For example, "\\?\UNC\server\share",
where "server" is the name of the computer and "share" is the name of
the shared folder. These prefixes are not used as part of the path
itself. They indicate that the path should be passed to the system
with minimal modification, which means that you cannot use forward
slashes to represent path separators, or a period to represent the
current directory, or double dots to represent the parent directory.
Because you cannot use the "\\?\" prefix with a relative path,
relative paths are always limited to a total of MAX_PATH characters.
The Windows API parses input strings for file I/O. Among other things, it translates / to \ as part of converting the name to an NT-style name, or interpreting the . and .. pseudo directories. With few exceptions, the Windows API also limits path names to 260 characters.
The documented purpose of the \\?\ prefix is:
For file I/O, the "\\?\" prefix to a path string tells the Windows APIs to disable all string parsing and to send the string that follows it straight to the file system.
Among other things, this allows using otherwise reserved symbols in path names (such as . or ..). Opting out of any translations, the system no longer has to maintain an internal buffer, and the arbitrary limit of 260 characters can also be lifted (as long as the underlying filesystem supports it). Note, that this is not the purpose of the \\?\ prefix, rather than a corollary, even if the prefix is primarily used for its corollary.
In using keras.model.load_weights, by the way, the weight file is saved in a hdf5 format, I come across some situations where the folder names that have initial r or t, cause the error: errno = 22, error message = 'invalid argument', flags = 0, o_flags = 0.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python, or the situation I encountered is only specific to keras.
It would greatly help debug this if you include examples of such filenames that give you trouble. However, I have a good idea on what is probably happening here.
This problems seem to appear on folders that start with r or t on their names. Also, as they are folders, on their full path name they are preceded by a \ character (for example "\thisFolder", or similar). This is true in the case of a Windows environment, as they use \ for separating paths contrary to *nix systems that use the regular slash /.
Considering these things, seems that perhaps you are experiencing this as \r and \t are both special characters that mean Carriage Return and Tabulation, respectively. If this is the case many file openers will have trouble processing such file name.
Even more, I would not be surprised if you got the same errors on folders that begin with n or other letters that when concatenated to a backslash give special characters (\n is new line, \s is a white space, etc.).
To overcome this seems that you will need to escape your backslash character before passing it as a filename. In python, an escaped backslash is "\\"
. In addition, you can also opt to pass a Raw string instead, by adding the r prefix to your string, something like r"\a\raw\string". More information on escaping and raw string can be found on this question and answers.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python,
As mentioned, you should avoid this with characters that have a special meaning with a backslash. I suggest you check here to see the characters Python accepts like this, so you can refrain from using such characters (or well use raw strings and forget about this problem).
I know that this is not something that should ever be done, but is there a way to use the slash character that normally separates directories within a filename in Linux?
The answer is that you can't, unless your filesystem has a bug. Here's why:
There is a system call for renaming your file defined in fs/namei.c called renameat:
SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname)
When the system call gets invoked, it does a path lookup (do_path_lookup) on the name. Keep tracing this, and we get to link_path_walk which has this:
static int link_path_walk(const char *name, struct nameidata *nd)
{
struct path next;
int err;
unsigned int lookup_flags = nd->flags;
while (*name=='/')
name++;
if (!*name)
return 0;
...
This code applies to any file system. What's this mean? It means that if you try to pass a parameter with an actual '/' character as the name of the file using traditional means, it will not do what you want. There is no way to escape the character. If a filesystem "supports" this, it's because they either:
Use a unicode character or something that looks like a slash but isn't.
They have a bug.
Furthermore, if you did go in and edit the bytes to add a slash character into a file name, bad things would happen. That's because you could never refer to this file by name :( since anytime you did, Linux would assume you were referring to a nonexistent directory. Using the 'rm *' technique would not work either, since bash simply expands that to the filename. Even rm -rf wouldn't work, since a simple strace reveals how things go on under the hood (shortened):
$ ls testdir
myfile2 out
$ strace -vf rm -rf testdir
...
unlinkat(3, "myfile2", 0) = 0
unlinkat(3, "out", 0) = 0
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(3) = 0
unlinkat(AT_FDCWD, "testdir", AT_REMOVEDIR) = 0
...
Notice that these calls to unlinkat would fail because they need to refer to the files by name.
You could use a Unicode character that displays as / (for example the fraction slash), assuming your filesystem supports it.
It depends on what filesystem you are using. Of some of the more popular ones:
ext3: No
ext4: No
jfs: Yes
reiserfs: No
xfs: No
Only with an agreed-upon encoding. For example, you could agree that % will be encoded as %% and that %2F will mean a /. All the software that accessed this file would have to understand the encoding.
The short answer is: No, you can't. It's a necessary prohibition because of how the directory structure is defined.
And, as mentioned, you can display a unicode character that "looks like" a slash, but that's as far as you get.
In general it's a bad idea to try to use "bad" characters in a file name at all; even if you somehow manage it, it tends to make it hard to use the file later. The filesystem separator is flat-out not going to work at all, so you're going to need to pick an alternative method.
Have you considered URL-encoding the URL then using that as the filename? The result should be fine as a filename, and it's easy to reconstruct the name from the encoded version.
Another option is to create an index - create the output filename using whatever method you like - sequentially-numbered names, SHA1 hashes, whatever - then write a file with the generated filename/URL pair. You can save that into a hash and use it to do a URL-to-filename lookup or vice-versa with the reversed version of the hash, and you can write it out and reload it later if needed.
The short answer is: you must not. The long answer is, you probably can or it depends on where you are viewing it from and in which layer you are working with.
Since the question has Unix tag in it, I am going to answer for Unix.
As mentioned in other answers that, you must not use forward slashes in a filename.
However, in MacOS you can create a file with forward slashes / by:
# avoid doing it at all cost
touch 'foo:bar'
Now, when you see this filename from terminal you will see it as foo:bar
But, if you see it from finder: you will see finder converted it as foo/bar
Same thing can be done the other way round, if you create a file from finder with forward slashes in it like /foobar, there will be a conversion done in the background. As a result, you will see :foobar in terminal but the other way round when viewed from finder.
So, : is valid in the unix layer, but it is translated to or from / in the Mac layers like Finder window, GUI. : the colon is used as the separator in HFS paths and the slash / is used as the separator in POSIX paths
So there is a two-way translation happening, depending on which “layer” you are working with.
See more details here: https://apple.stackexchange.com/a/283095/323181
You can have a filename with a / in Linux and Unix. This is a very old question, but surprisingly nobody has said it in almost 10 years since the question was asked.
Every Unix and Linux system has the root directory named /. A directory is just a special kind of file. Symbolic links, character devices, etc are also special kinds of files. See here for an in depth discussion.
You can't create any other files with a /, but you certainly have one -- and a very important one at that.
After a half hour searching Google, I am surprised I cannot find any way to create a file on Windows with slashes in the name. The customer demands that file names have the following structure:
04/28/2012 04:07 PM 6,781 12Q1_C125_G_04-17.pdf
So far I haven't found any way to encode the slashes so they become part of the file name instead of the path.
Any Suggestions?
You can't.
The forward slash is one of the characters that are not allowed to be used in Windows file names, see
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
The following fundamental rules enable applications to create and
process valid names for files and directories, regardless of the file
system:
Use a period to separate the base file name from the extension in the name of a directory or file.
Use a backslash (\) to separate the components of a path. The backslash divides the file name from the path to it, and one directory name from another directory name in a path. You cannot use a backslash in the name for the actual file or directory because it is a reserved character that separates the names into components.
Use a backslash as required as part of volume names, for example, the "C:\" in "C:\path\file" or the "\server\share" in
"\server\share\path\file" for Universal Naming Convention (UNC)
names. For more information about UNC names, see the Maximum Path
Length Limitation section.
Do not assume case sensitivity. For example, consider the names OSCAR, Oscar, and oscar to be the same, even though some file systems (such as a POSIX-compliant file system) may consider them as
different. Note that NTFS supports POSIX semantics for case
sensitivity but this is not the default behavior. For more
information, see CreateFile.
Volume designators (drive letters) are similarly case-insensitive. For example, "D:\" and "d:\" refer to the same volume.
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
Integer value zero, sometimes referred to as the ASCII NUL character.
Characters whose integer representations are in the range from 1 through 31, except for alternate data streams where these characters are allowed. For more information about file streams, see File
Streams.
Any other character that the target file system does not allow.
At least all windows installation i've seen won't let you create files with slashes in them.
Even if it were possible somehow, by doing deepshit magic, it will probably screw up almost all applications, including windows explorer.
you could abuse windows' unicode capabilities, though.
Creating a file with ∕ (this is not a forward slash, it is "division slash", see http://www.fileformat.info/info/unicode/char/2215/index.htm ) in it's name works just fine, for example.
Um... forward slash is not a legal character in a Windows file name?
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx