Following this question, I've settled on the following Python code to modify Windows shortcuts. It works for English based shortcuts but it doesn't for unicode based shortcuts.
How could this (or any other) snippet be modified to support unicode?
import re, os, pythoncom
from win32com.shell import shell, shellcon
shortcut_path = os.path.join(path_to_shortcut, shortcut_filename)
shortcut = pythoncom.CoCreateInstance (shell.CLSID_ShellLink, None, pythoncom.CLSCTX_INPROC_SERVER, shell.IID_IShellLink)
persist_file = shortcut.QueryInterface (pythoncom.IID_IPersistFile)
persist_file.Load (shortcut_path)
destination1 = shortcut.GetPath(0)[0]
destination2 = os.path.join(destination_path, destination_filename)
shortcut.SetPath(destination2)
persist_file.Save(shortcut_path, 0)
Assume the following are unicode: path_to_shortcut, shortcut_filename, destination_path, destination_filename
Perhaps looking here may help: Python Unicode HOWTO
I'm guessing you'd need to be sure that each of those strings was properly encoded as Unicode and any changes need to preserve that encoding. That article should provide all the information you'll need.
Related
I'm using Python 3 (recently switched from Python 2). My code usually runs on Linux but also sometimes (not often) on Windows. According to Python 3 documentation for open(), the default encoding for a text file is from locale.getpreferredencoding() if the encoding arg is not supplied. I want this default value to be utf-8 for a project of mine, no matter what OS it's running on (currently, it's always UTF-8 for Linux, but not for Windows). The project has many many calls to open() and I don't want to add encoding='utf-8' to all of them. Thus, I want to change the locale's preferred encoding in Windows, as Python 3 sees it.
I found a previous question
"Changing the "locale preferred encoding"", which has an accepted answer, so I thought I was good to go. But unfortunately, neither of the suggested commands in that answer and its first comment work for me in Windows. Specifically, that accepted answer and its first comment suggest running chcp 65001 and set PYTHONIOENCODING=UTF-8, and I've tried both. Please see transcript below from my cmd window:
> py -i
Python 3.4.3 ...
>>> f = open('foo.txt', 'w')
>>> f.encoding
'cp1252'
>>> exit()
> chcp 65001
Active code page: 65001
> py -i
Python 3.4.3 ...
>>> f = open('foo.txt', 'w')
>>> f.encoding
'cp1252'
>>> exit()
> set PYTHONIOENCODING=UTF-8
> py -i
Python 3.4.3 ...
>>> f = open('foo.txt', 'w')
>>> f.encoding
'cp1252'
>>> exit()
Note that even after both suggested commands, my opened file's encoding is still cp1252 instead of the intended utf-8.
As of python3.5.1 this hack looks like this:
import _locale
_locale._getdefaultlocale = (lambda *args: ['en_US', 'utf8'])
All files opened thereafter will assume the default encoding to be utf8.
i know its a real hacky workaround, but you could redefine the locale.getpreferredencoding() function like so:
import locale
def getpreferredencoding(do_setlocale = True):
return "utf-8"
locale.getpreferredencoding = getpreferredencoding
if you run this early on, all files opened after (at lest in my testing on a win xp machine) open in utf-8, and as this overrides the module method this would apply to all platforms.
Locale can be set in windows globally to UTF-8, if you so desire, as follows:
Control panel -> Clock and Region -> Region -> Administrative -> Change system locale -> Check Beta: Use Unicode UTF-8 ...
After this, and a reboot, I confirmed that locale.getpreferredencoding() returns 'cp65001' (=UTF-8) and that functions like open default to UTF-8.
The post is old but the issue is still of actuality (under Python 3.7 and Windows 10).
I've improved the solution as follows, making sure that the language/country part isn't overwritten but only the encoding, and also to make sure that it is only done under Windows:
if os.name == "nt":
import _locale
_locale._gdl_bak = _locale._getdefaultlocale
_locale._getdefaultlocale = (lambda *args: (_locale._gdl_bak()[0], 'utf8'))
Hope this helps...
As of Python 3.7, you may want to use UTF-8 mode by setting an environment variable or passing a flag to Python. Note that it turns a few more things into using utf-8 other than just locale.getpreferredencoding, but that may well be a good thing. As of Python 3.15, UTF-8 mode is set to become the default.
I quite like using pathlib for path management in python, but the drawback of using this package is that a lot of commands, like shutil.copy, .move, the
builtin open requires a string and not a PosixPath object, giving as error
TypeError: coercing to Unicode: need string or buffer, PosixPath found
The logical solution is of course to use str().
My question is how would it be possible (if it would be) to modify pathlib objects such that a call like open(pathlib.PosixPath) would work without the use of str().
The answer by #Navith is what you should now do in python 3.4. However, PEP-519 is proposed and accepted in python 3.6 to address this valid concern.
This PEP proposes a protocol for classes which represent a file system path to be able to provide a str or bytes representation. Changes to Python's standard library are also proposed to utilize this protocol where appropriate to facilitate the use of path objects where historically only str and/or bytes file system paths are accepted.
So in python 3.6 the standard library methods you refer to now accept Paths, and the answer to your question is use python 3.6.
Path objects have open, rmdir, chmod, ... methods that work the way you'd expect.
>>> import pathlib
>>> a_path = pathlib.Path("a.txt")
>>> a_txt = a_path.open("w", encoding="UTF-8")
>>> a_txt
<_io.TextIOWrapper name='a.txt' mode='w' encoding='UTF-8'>
I'm a python n00b. I have downloaded URL encoded file and I want to work with it on my unix system(Ubuntu 14).
When I try and run some operations on my file, the system says that the file doesn't exist. How do I change my filename to a unix recognizable format?
Some of the files I have download have spaces in them so they would have to be presented with a backslash and then a space. Below is a snippet of my code
link = "http://www.stephaniequinn.com/Music/Scheherezade%20Theme.mp3"
output = open(link.split('/')[-1],'wb')
output.write(site.read())
output.close()
shutil.copy(link.split('/')[-1], tmp_dir)
The "link" you have actually is a URL. URLs are special and are not allowed to contain certain characters, such as spaces. These special characters can still be represented, but in an encoded form. The translation from special characters to this encoded form happens via a certain rule set, often known as "URL encoding". If interested, have a read over here: http://en.wikipedia.org/wiki/Percent-encoding
The encoding operation can be inverted, which is called decoding. The tool set with which you downloaded the files you mentioned most probably did the decoding already, for you. In your link example, there is only one special character in the URL, "%20", and this encodes a space. Your download tool set probably decoded this, and saved the file to your file system with the actual space character in the file name. That is, most likely you have a file in the file system with the following basename:
Scheherezade Theme.mp3
So, when you want to open that file from within Python, and all you have is the link, you first need to get the decoded variant of it. Python can decode URL-encoded strings with built-in tools. This is what you need:
>>> import urllib.parse
>>> url = "http://www.stephaniequinn.com/Music/Scheherezade%20Theme.mp3"
>>> urllib.parse.unquote(url)
'http://www.stephaniequinn.com/Music/Scheherezade Theme.mp3'
>>>
This assumes that you are using Python 3, and that your link object is a unicode object (type str in Python 3).
Starting off with the decoded URL, you can derive the filename. Your link.split('/')[-1] method might work in many cases, but J.F. Sebastian's answer provides a more reliable method.
To extract a filename from an url:
#!/usr/bin/env python2
import os
import posixpath
import urllib
import urlparse
def url2filename(url):
"""Return basename corresponding to url.
>>> url2filename('http://example.com/path/to/file?opt=1')
'file'
"""
urlpath = urlparse.urlsplit(url).path # pylint: disable=E1103
basename = posixpath.basename(urllib.unquote(urlpath))
if os.path.basename(basename) != basename:
raise ValueError # refuse 'dir%5Cbasename.ext' on Windows
return basename
Example:
>>> url2filename("http://www.stephaniequinn.com/Music/Scheherezade%20Theme.mp3")
'Scheherezade Theme.mp3'
You do not need to escape the space in the filename if you use it inside a Python script.
See complete code example on how to download a file using Python (with a progress report).
This question already has answers here:
Splitting path strings into drive, path and file name parts
(2 answers)
Closed 8 years ago.
I need to split the string using delimiter "\"
The string can be in any of the following format:
file://C:\Users\xyz\filename.txt
C:\Users\xyz\filename.txt
I need my script to give the output as "filename.txt"
I tried to use split('\\\\'). It does not work out. Which is the better function to use?
Suppose your string is pathName, then you can use fileName = pathName.split('\\')[-1].
Try the following steps, do notice the valid string format for using \ inside strings and to avoid \x scope error
>>> file = 'file://C:\\Users\\xyz\\filename.txt'
>>> file.split('\\')[-1]
'filename.txt'
>>> file = 'C:\\Users\\xyz\\filename.txt'
>>> file.split('\\')[-1]
'filename.txt'
Two issues here.
Path splitting
You'd normally use os.path.split to work with paths:
>>> import os.path
>>> p=r'C:\Users\xyz\filename.txt'
>>> head, tail = os.path.split(p)
>>> head
'C:\\Users\\xyz'
>>> tail
'filename.txt'
Caveat: os.path works with the path format of the operating system it's used on. If you know you specifically want to work with Windows paths (even when your program is ran on Linux or OSX), then instead of the os.path you'd work with the ntpath module. See the note:
Note Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:
posixpath for UNIX-style paths
ntpath for Windows paths
macpath for old-style MacOS paths
os2emxpath for OS/2 EMX paths
Format support
You have 2 formats to support:
file://C:\Users\xyz\filename.txt
C:\Users\xyz\filename.txt
2 is a normal Windows path, and 1 is... Frankly, I have no idea what that is. It kind of looks like a file URI, but uses Windows-style delimiters (backslashes). This is strange. When I open a PDF in Chrome on Windows the URI looks different:
file:///C:/Users/kos/Downloads/something.pdf
and I'll assume that's the format you're interested in. If not, then I can't vouch for what you're dealing with and you can make some educated guess on how to interpret it (drop the file:// prefix and treat it as a Windows path?).
An URI you can split into meaningful parts using the urlparse module (see urllib.parse for python 3), and once you've extracted the path part of the URI, you can just .split('/') it (URI grammar is simple enough to allow that). Here's what happens if you use this module on a file:// URI:
>>> r = urlparse.urlparse(r'file:///C:/Users/xyz/filename.txt')
>>> r
ParseResult(scheme='file', netloc='', path='/C:/Users/xyz/filename.txt', params='', query='', fragment='')
>>> r.path
'/C:/Users/xyz/filename.txt'
>>> r.path.lstrip('/').split('/')
['C:', 'Users', 'xyz', 'filename.txt']
Please read this URI scheme description to have a better idea how this format looks like and why there are three slashes after file:.
Let's say you want to start a python script with some parameters like
python myscript some arguments
I understand, that the strings sys.argv[1] and sys.argv[2] will have the encoding specified in the terminal. Is there a way to get this information from within the python script?
My goal is something like this:
terminal_enocding = some_way.to.GET_TERMINAL_ENCODING
some = `sys.argv[1]`.decode(terminal_encoding)
arguments = `sys.argv[2]`.decode(terminal_encoding)
sys.stdout.encoding will give you the encoding of standard output. sys.stdin.encoding will give you the encdoing for standard input.
You can call locale.getdefaultlocale() and use the second part of the tuple.
See more here (Fedora wiki entry explaining the why's and how's of the default encoding in Python)
The function locale.getpreferredencoding() also seems to do the job.
It returns the Python encoding string which you can directly use like this:
>>> import locale
>>> s = b'123\n'
>>> enc = locale.getpreferredencoding()
>>> s.decode(enc)
'123\n'