String "????" converts to "data" in Python

String "????" converts to "data" in Python - python

While writing a small Python script, I noticed that when the string "????" is passed as a command-line argument, it converts to "data" during program execution. Now, I am unsure whether this is a string or some other kind of data type. Finding information on this has been tricky, given the search terms.
Why does this happen and what does it mean?

? is a shell wildcard character, it matches any character (similar to . in a regular expression). So an unquoted ???? expands to all filenames with 4 characters. data is presumably the first such filename alphabetically in your directory.
See the output of
echo ????
If you want to pass ???? literally to the script, quote it.
python yourscript.py '????'

Related

Why does python add additional backslashes to the path?

I have a text file with a path that goes like this:
r"\\user\data\t83\rf\Desktop\QA"
When I try to read this file a print a line it returns the following string, I'm unable to open the file from this location:
'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'

Seems you've got Python code in your text file, so either sanitize your file, so it only includes the actual path (not a Python string representation) or you can try to fiddle with string replace until you're satisfied, or just evaluate the Python string.
Note that using eval() opens Padora's box (it as unsafe as it gets), it's safer to use ast.literal_eval() instead.
import ast
file_content = 'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
print(eval(file_content)) # do not use this, it's only shown for the sake of completeness
print(ast.literal_eval(file_content))
Output:
\\user\data\t83\rf\Desktop\QA
\\user\data\t83\rf\Desktop\QA
Personally, I'd prefer to sanitize the file, so it only contains \\user\data\t83\rf\Desktop\QA

\ will wait for another character to form one like \n (new line) or \t (tab) therefore a single backslash will merge with the next character. To solve this if the next character is \\ it will represent the single backslash.

remove shell escape characters from a string

I am creating a terminal based python application whereby the user drags and drops a csv file into the terminal to get the file path. The file path is therefore escaped.
How do I remove all instances of this?
For example, I have a file
thisisatestfile/\(2).csv
but when I drag it into terminal it appears as:
thisisatestfile\:\\\(2\).csv
I have a list of all the shell escape characters that I need to remove:
link to characters
I am not very good at regex so any help much appreciated!

I just implemented this with shlex.split
>>> shlex.split('thisisatestfile/\(2).csv')
['thisisatestfile/(2).csv']
Since this method is intended for taking a raw shell invocation and returning a list of args to be passed to, e.g., subprocess.Popen, it returns a list. If you know you only have a single string to process, just grab the first element of the returned list.
>>> shlex.split('thisisatestfile/\(2).csv')[0]
'thisisatestfile/(2).csv'

Why do some functions in Python change \ to \\

When I declare pass a file to shutil.copy as
shutil.copy(r'i:\myfile.txt', r'UNC to where I want it to go')
I get an error
No such file or directory 'i:\\myfile.txt'
I've experienced this problem before with the os module when I have a UNC path. Usually I just get frustrated enough that I forget using the os module and just put the file path into with open() or whatever I'm using it for.
It is my understanding that placing an r before '' is supposed to cause python to ignore escape characters and treat them as string literals, but the behavior I'm seeing leads me to believe that this is not the case. For some reason it takes the \ and changes it to \\.
I've seen this when using os.path.join where the \\ at the beginning of the the UNC Path gets turned into \\\\.
What is the best way to pass a string literal to ensure that all escape characters are ignored and the string is preserved?

Your string is not being modified by Python. It's the representation of your string that's coming out differently.
When the error is printed, Python calls repr() to print the value. This function will
Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a repr() method.
This can be very nice when debugging: if I paste that string (quotes, escapes, and all) into the REPL I'll get the string in memory that you were working with. I can use this to interactively try your copy command, maybe tweaking the string a bit.
If you want to see your string in a printed form, you could do
source_path = r'i:\myfile.txt'
target_path = r'UNC to where I want it to go'
print(f'Copying {source_path} to {target_path}...')
shutil.copy(source_path, target_path)

Replace double backslash in string literal with single backslash

I'm trying to print a string that contains double backslash (one to escape the other) such that only one of the backslashes are printed. I thought this would happen automatically, but I must be missing some detail.
I have this little snippet:
for path in self.tokenized:
pdb.set_trace()
print(self.tokenized[path])
When I debug with that pdb.set_trace() I can see that my strings have double backslashes, and then I enter continue to print the remainder and it prints that same thing.
> /home/kendall/Development/path-parser/tokenize_custom.py(82)print_tokens()
-> print(self.tokenized[path])
(Pdb) self.tokenized[path]
['c:', '\\home', '\\kendall', '\\Desktop', '\\home\\kendall\\Desktop']
(Pdb) c
['c:', '\\home', '\\kendall', '\\Desktop', '\\home\\kendall\\Desktop']
Note that I'm writing a parser that parses Windows file paths -- thus the backslashes.
This is what it looks like to run the program:
kendall#kendall-XPS-8500:~/Development/path-parser$ python main.py -f c:\\home\\kendall\\Desktop

The issue you are having is that you're printing a list, which only knows one way to stringify its contents: repr. repr is only designed for debugging use. Idiomatically, when possible (classes are a notable exception), it outputs a syntactically valid python expression that can be directly fed into the interpretter to reproduce the original object - hence the escaped backslashes.
Instead, you need to loop through each list, and print each string individually.
You can use str.join() to do this for you.
To get the exact same output, minus the doubled backslashes, you'd need to do something like:
print("[{0}]".format(", ".join(self.tokenized[path])))

Passing a command line argument to Python whos string contains a metacharacter

I am attempting to pass in a string as input argument to a Python program, from the command line i.e. $python parser_prog.py <pos1> <pos2> --opt1 --opt2 and interpreting these using argparse. Of course if contains any metacharacters these are first interpreted by the shell, so it needs to be quoted.
This seems to work, strings are passed through literally, preserving the \*?! characters:
$ python parser_prog.py 'str\1*?' 'str2!'
However, when I attempt to pass through a '-' (hyphen) character, I cannot seem to mask it. It is interpreted as an invalid option.
$ python parser_prog.py 'str\1*?' '-str2!'
I have tried single and double quotes, is there a way to make sure Python interprets this as a raw string? (I'm not in the interpreter yet, this is on the shell command line, so I can't use pythonic expressions such as r'str1')
Thank you for any hints!

As you said yourself, Python only sees the strings after being processed by the shell. The command-line arguments '-f' and -f look identical to the called program, and there is no way to dsitinguish them. That said, I think that argparse supports a -- argument to denote the end of the options, and everything after this is treated as a positional argument.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

String "????" converts to "data" in Python - python

Related

Why does python add additional backslashes to the path?

remove shell escape characters from a string

Why do some functions in Python change \ to \\

Replace double backslash in string literal with single backslash

Passing a command line argument to Python whos string contains a metacharacter

Categories

Resources