Python os.walk encode character error [duplicate]

Python os.walk encode character error [duplicate] - python

I am getting the error:
'ascii' codec can't decode byte 0x8b in position 14: ordinal not in range(128)
when trying to do os.walk. The error occurs because some of the files in a directory have the 0x8b (non-utf8) character in them. The files come from a Windows system (hence the utf-16 filenames), but I have copied the files over to a Linux system and am using python 2.7 (running in Linux) to traverse the directories.
I have tried passing a unicode start path to os.walk, and all the files & dirs it generates are unicode names until it comes to a non-utf8 name, and then for some reason, it doesn't convert those names to unicode and then the code chokes on the utf-16 names. Is there anyway to solve the problem short of manually finding and changing all the offensive names?
If there is not a solution in python2.7, can a script be written in python3 to traverse the file tree and fix the bad filenames by converting them to utf-8 (by removing the non-utf8 chars)? N.B. there are many non-utf8 chars in the names besides 0x8b, so it would need to work in a general fashion.
UPDATE: The fact that 0x8b is still only a btye char (just not valid ascii) makes it even more puzzling. I have verified that there is a problem converting such a string to unicode, but that a unicode version can be created directly. To wit:
>>> test = 'a string \x8b with non-ascii'
>>> test
'a string \x8b with non-ascii'
>>> unicode(test)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 9: ordinal not in range(128)
>>>
>>> test2 = u'a string \x8b with non-ascii'
>>> test2
u'a string \x8b with non-ascii'
Here's a traceback of the error I am getting:
80. for root, dirs, files in os.walk(unicode(startpath)):
File "/usr/lib/python2.7/os.py" in walk
294. for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py" in walk
294. for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py" in walk
284. if isdir(join(top, name)):
File "/usr/lib/python2.7/posixpath.py" in join
71. path += '/' + b
Exception Type: UnicodeDecodeError at /admin/casebuilder/company/883/
Exception Value: 'ascii' codec can't decode byte 0x8b in position 14: ordinal not in range(128)
The root of the problem occurs in the list of files returned from listdir (on line 276 of os.walk):
names = listdir(top)
The names with chars > 128 are returned as non-unicode strings.

Right I just spent some time sorting through this error, and wordier answers here aren't getting at the underlying issue:
The problem is, if you pass a unicode string into os.walk(), then os.walk starts getting unicode back from os.listdir() and tries to keep it as ASCII (hence 'ascii' decode error). When it hits a unicode only special character which str() can't translate, it throws the exception.
The solution is to force the starting path you pass to os.walk to be a regular string - i.e. os.walk(str(somepath)). This means os.listdir returns regular byte-like strings and everything works the way it should.
You can reproduce this problem (and show it's solution works) trivially like:
Go into bash in some directory and run touch $(echo -e "\x8b\x8bThis is a bad filename") which will make some test files.
Now run the following Python code (iPython Qt is handy for this) in the same directory:
l = []
for root,dir,filenames in os.walk(unicode('.')):
l.extend([ os.path.join(root, f) for f in filenames ])
print l
And you'll get a UnicodeDecodeError.
Now try running:
l = []
for root,dir,filenames in os.walk('.'):
l.extend([ os.path.join(root, f) for f in filenames ])
print l
No error and you get a print out!
Thus the safe way in Python 2.x is to make sure you only pass raw text to os.walk(). You absolutely should not pass unicode or things which might be unicode to it, because os.walk will then choke when an internal ascii conversion fails.

This problem stems from two fundamental problems. The first is fact that Python 2.x default encoding is 'ascii', while the default Linux encoding is 'utf8'. You can verify these encodings via:
sys.getdefaultencoding() #python
sys.getfilesystemencoding() #OS
When os module functions returning directory contents, namely os.walk & os.listdir return a list of files containing ascii only filenames and non-ascii filenames, the ascii-encoding filenames are converted automatically to unicode. The others are not. Therefore, the result is a list containing a mix of unicode and str objects. It is the str objects that can cause problems down the line. Since they are not ascii, python has no way of knowing what encoding to use, and therefore they can't be decoded automatically into unicode.
Therefore, when performing common operations such as os.path(dir, file), where dir is unicode and file is an encoded str, this call will fail if the file is not ascii-encoded (the default). The solution is to check each filename as soon as they are retrieved and decode the str (encoded ones) objects to unicode using the appropriate encoding.
That's the first problem and its solution. The second is a bit trickier. Since the files originally came from a Windows system, their filenames probably use an encoding called windows-1252. An easy means of checking is to call:
filename.decode('windows-1252')
If a valid unicode version results you probably have the correct encoding. You can further verify by calling print on the unicode version as well and see the correct filename rendered.
One last wrinkle. In a Linux system with files of Windows origin, it is possible or even probably to have a mix of windows-1252 and utf8 encodings. There are two means of dealing with this mixture. The first and preferable is to run:
$ convmv -f windows-1252 -t utf8 -r DIRECTORY --notest
where DIRECTORY is the one containing the files needing conversion.This command will convert any windows-1252 encoded filenames to utf8. It does a smart conversion, in that if a filename is already utf8 (or ascii), it will do nothing.
The alternative (if one cannot do this conversion for some reason) is to do something similar on the fly in python. To wit:
def decodeName(name):
if type(name) == str: # leave unicode ones alone
try:
name = name.decode('utf8')
except:
name = name.decode('windows-1252')
return name
The function tries a utf8 decoding first. If it fails, then it falls back to the windows-1252 version. Use this function after a os call returning a list of files:
root, dirs, files = os.walk(path):
files = [decodeName(f) for f in files]
# do something with the unicode filenames now
I personally found the entire subject of unicode and encoding very confusing, until I read this wonderful and simple tutorial:
http://farmdev.com/talks/unicode/
I highly recommend it for anyone struggling with unicode issues.

I can reproduce the os.listdir() behavior: os.listdir(unicode_name) returns undecodable entries as bytes on Python 2.7:
>>> import os
>>> os.listdir(u'.')
[u'abc', '<--\x8b-->']
Notice: the second name is a bytestring despite listdir()'s argument being a Unicode string.
A big question remains however - how can this be solved without resorting to this hack?
Python 3 solves undecodable bytes (using filesystem's character encoding) bytes in filenames via surrogateescape error handler (os.fsencode/os.fsdecode). See PEP-383: Non-decodable Bytes in System Character Interfaces:
>>> os.listdir(u'.')
['abc', '<--\udc8b-->']
Notice: both string are Unicode (Python 3). And surrogateescape error handler was used for the second name. To get the original bytes back:
>>> os.fsencode('<--\udc8b-->')
b'<--\x8b-->'
In Python 2, use Unicode strings for filenames on Windows (Unicode API), OS X (utf-8 is enforced) and use bytestrings on Linux and other systems.

\x8 is not a valid utf-8 encoding character. os.path expects the filenames to be in utf-8. If you want to access invalid filenames, you have to pass the os.path.walk the non-unicode startpath; this way the os module will not do the utf8 decoding. You would have to do it yourself and decide what to do with the filenames that contain incorrect characters.
I.e.:
for root, dirs, files in os.walk(startpath.encode('utf8')):

After examination of the source of the error, something happens within the C-code routine listdir which returns non-unicode filenames when they are not standard ascii. The only fix therefore is to do a forced decode of the directory list within os.walk, which requires a replacement of os.walk. This replacement function works:
def asciisafewalk(top, topdown=True, onerror=None, followlinks=False):
"""
duplicate of os.walk, except we do a forced decode after listdir
"""
islink, join, isdir = os.path.islink, os.path.join, os.path.isdir
try:
# Note that listdir and error are globals in this module due
# to earlier import-*.
names = os.listdir(top)
# force non-ascii text out
names = [name.decode('utf8','ignore') for name in names]
except os.error, err:
if onerror is not None:
onerror(err)
return
dirs, nondirs = [], []
for name in names:
if isdir(join(top, name)):
dirs.append(name)
else:
nondirs.append(name)
if topdown:
yield top, dirs, nondirs
for name in dirs:
new_path = join(top, name)
if followlinks or not islink(new_path):
for x in asciisafewalk(new_path, topdown, onerror, followlinks):
yield x
if not topdown:
yield top, dirs, nondirs
By adding the line:
names = [name.decode('utf8','ignore') for name in names]
all the names are proper ascii & unicode, and everything works correctly.
A big question remains however - how can this be solved without resorting to this hack?

I got this problem when use os.walk on some directories with Chinese (unicode) names. I implemented the walk function myself as follows, which worked fine with unicode dir/file names.
import os
ft = list(tuple())
def walk(dir, cur):
fl = os.listdir(dir)
for f in fl:
full_path = os.path.join(dir,f)
if os.path.isdir(full_path):
walk(full_path, cur)
else:
path, filename = full_path.rsplit('/',1)
ft.append((path, filename, os.path.getsize(full_path)))

Related

How to use .write() with characters from foreign languages (ã, à, ê, ó, ...)

I'm working on a small project in Python 3 where I have to scan a drive full of files and output a .txt file with the path of all of the files inside the drive. The problem is that some of the files are in Brazilian Portuguese which has "accented letters" such as "não", "você" and others and those special letters are being output wrongly in the final .txt.
The code is just these few lines below:
import glob
path = r'path/path'
files = [f for f in glob.glob(path + "**/**", recursive=True)]
with open("file.txt", 'w') as output:
for row in files:
output.write(str(row.encode('utf-8') )+ '\n')
An example of outputs
path\folder1\Treino_2.doc
path\folder1\Treino_1.doc
path\folder1\\xc3\x81gua de Produ\xc3\xa7\xc3\xa3o.doc
The last line show how some of the ouputs are wrong since x81gua de Produ\xc3\xa7\xc3\xa3o should be Régua de Produção

Python files handle Unicode text (including Brazilian accented characters) directly. All you need to do is using the file in text mode, which is the default unless you explicitly ask open() to give you a binary file. "w" gives you a text file that's writable.
You may want to be explicit about the encoding, however, by using the encoding argument for the open() function:
with open("file.txt", "w", encoding="utf-8") as output:
for row in files:
output.write(row + "\n")
If you don't explicitly set the encoding, then a system-specific default is selected. Not all encodings can encode all possible Unicode codepoints. This happens on Windows more than on other operating systems, where the default ANSI codepage then leads to charmap codepage can't encode character errors, but it can happen on other Operating Systems as well if the current locale is configured to use a non-Unicode encoding.
Do not encode to bytes and then convert the resulting bytes object back to a string again with str(). That only makes a big mess with string representations and escapes and the b prefix there too:
>>> path = r"path\folder1\Água de Produção.doc"
>>> v.encode("utf8") # bytes are represented with the "b'...'" syntax
b'path\\folder1\\\xc3\x81gua de Produ\xc3\xa7\xc3\xa3o.doc'
>>> str(v.encode("utf8")) # converting back with `str()` includes that syntax
"b'path\\\\folder1\\\\\\xc3\\x81gua de Produ\\xc3\\xa7\\xc3\\xa3o.doc'"
See What does a b prefix before a python string mean? for more details as to what happens here.

You probably just want to write the filename strings directly to the file, without first encoding them as UTF-8, since they already are in such an encoding. That is:
…
for row in files:
output.write(row + '\n')
Should do the right thing.
I say “probably” since filenames do not have to be valid UTF-8 in some operating systems (e.g. Linux!), and treating those as UTF-8 will fail. In that case your only recourse is to handle the filenames as raw byte sequences — however, this won’t ever happen in your code, since glob already returns strings rather than byte arrays, i.e. Python has already attempted to decode the byte sequences representing the filenames as UTF-8.
You can tell glob to handle arbitrary byte filenames (i.e. non-UTF-8) by passing the globbing pattern as a byte sequence. On Linux, the following works:
filename = b'\xbd\xb2=\xbc \xe2\x8c\x98'
with open(filename, 'w') as file:
file.write('hi!\n')
import glob
print(glob.glob(b'*')[0])
# b'\xbd\xb2=\xbc \xe2\x8c\x98'
# BUT:
print(glob.glob('*')[0])
#---------------------------------------------------------------------------
#UnicodeEncodeError Traceback (most recent call last)
#<ipython-input-12-2bce790f5243> in <module>
#----> 1 print(glob.glob('*')[0])
#
#UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed

zipfile.write() file with turkish chars in filename

On my system there are many Word documents and I want to zip them using the Python module zipfile.
I have found this solution to my problem, but on my system there are files which contain German umlauts and Turkish characters in their filename.
I have adapted the method from the solution like this, so it can process German umlauts in the filenames:
def zipdir(path, ziph):
for root, dirs, files in os.walk(path):
for file in files:
current_file = os.path.join(root, file)
print "Adding to archive -> file: "+str(current_file)
try:
#ziph.write(current_file.decode("cp1250")) #German umlauts ok, Turkish chars not ok
ziph.write(current_file.encode("utf-8")) #both not ok
#ziph.write(current_file.decode("utf-8")) #both not ok
except Exception,ex:
print "exception ---> "+str(ex)
print repr(current_file)
raise
Unfortunately my attempts to include logic for Turkish characters remained unsuccessful, leaving the problem that every time a filename contains a Turkish character the code prints an exception, for example like this:
exception ---> [Error 123] Die Syntax f³r den Dateinamen, Verzeichnisnamen oder
die Datentrõgerbezeichnung ist falsch: u'X:\\my\\path\\SomeTurk?shChar?shere.doc'
I have tried several string encode-decode stuff, but none of it was successful.
Can someone help me out here?
I edited the above code to include the changes mentioned in the comment.
The following errors are now shown:
...
Adding to archive -> file: X:\\my\path\blabla I blabla.doc
Adding to archive -> file: X:\my\path\bla bla³bla³bla³bla.doc
exception ---> 'ascii' codec can't decode byte 0xfc in position 24: ordinal not
in range(128)
'X:\\my\\path\\bla B\xfcbla\xfcbla\xfcbla.doc'
Traceback (most recent call last):
File "Backup.py", line 48, in <module>
zipdir('X:\\my\\path', zipf)
File "Backup.py", line 12, in zipdir
ziph.write(current_file.encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 24: ordinal
not in range(128)
The ³ is actually a German ü.

If you do not need to inspect the ZIP file with any archiver later, you may always encode it to base64, and then restore them when extracting with Python.
To any archiver these filenames will look like gibberish but encoding will be preserved.
Anyway, to get the 0-128 ASCII range string (or bytes object in Py3), you have to encode(), not decode().
encode() serializes the unicode() string to ASCII range.
>>> u"\u0161blah".encode("utf-8")
'\xc5\xa1blah'
decode() returns from that to unicode():
>>> "\xc5\xa1blah".decode("utf-8")
u'\u0161blah'
Same goes for any other codepage.
Sorry for emphasizing that, but people sometimes get confused about encoding and decoding stuff.
If you need files, but you arent concerned much about preserving umlautes and other symbols, you can use:
u"üsdlakui".encode("utf-8", "replace")
or:
u"üsdlakui".encode("utf-8", "ignore")
This will replace unknown characters with possible ones or totally ignore any decoding/encoding errors.
That will fix things if the raised error is something like UnicodeDecodeError: Cannot decode character ...
But, the problem will be with filenames consisting only of non-latin characters.
Now something that might actually work:
Well,
'Sömethüng'.encode("utf-8")
is bound to raise "ASCII encode error" as there is no unicode characters defined in the string while non-latin characters that othervise should be used to describe unicode/UTF-8 character are used but defined as ASCII - file itself is not UTF-8 encoded.
while:
# -*- coding: UTF-8 -*-
u'Sömethüng'.encode("utf-8")
or
# -*- coding: UTF-8 -*-
unicode('Sömethüng').encode("utf-8")
with encoding defined on top of file and saved as UTF-8 encoded should work.
Yes, you do have strings from OS (filename), but that is a problem from beginning of the story.
Even if encoding passes right, there is the ZIP thing still to be solved.
By specification ZIP should store filenames using CP437, but this is rarely so.
Most archivers use the default OS encoding (MBCS in Python).
And most archivers doesn't support UTF-8. So, what I propose here should work, but not on all archivers.
To tell the ZIP archiver that archive is using UTF-8 filenames, the eleventh bit of flag_bits should be set to True. As I said, some of them does not check that bit. This is recent thing in ZIP spec. (Well, few years ago really)
I won't write here whole code, just the part needed to understand the thing.
# -*- coding: utf-8 -*-
# Cannot hurt to have default encoding set to UTF-8 all the time. :D
import os, time, zipfile
zip = zipfile.ZipFile(...)
# Careful here, origname is the full path to the file you will store into ZIP
# filename is the filename under which the file will be stored in the ZIP
# It'll probably be better if filename is not a full path, but relative, not to introduce problems when extracting. You decide.
filename = origname = os.path.join(root, filename)
# Filenames from OS can be already UTF-8, but they can be a local codepage.
# I will use MBCS here to decode from it, so that we can encode to UTF-8 later.
# I recommend getting codepage from OS (from kernel32.dll on Windows) manually instead of using MBCS, but for now:
if isinstance(filename, str): filename = filename.decode("mbcs")
# Else, assume it is already a decoded unicode string.
# Prepare the filename for archive:
filename = os.path.normpath(os.path.splitdrive(filename)[1])
while filename[0] in (os.sep, os.altsep):
filename = filename[1:]
filename = filename.replace(os.sep, "/")
filename = filename.encode("utf-8") # Get what we need
zinfo = zipfile.ZipInfo(filename, time.localtime(os.getmtime(origname))[0:6])
# Here you should set zinfo.external_attr to store Unix permission bits and set the zinfo.compression_type
# Both are optional and not a subject to your problem. But just as notice.
zinfo.flag_bits |= 0x800 # Set 11th bit to 1, announce the UTF-8 filenames.
f = open(origname, "rb")
zip.writestr(zinfo, f.read())
f.close()
I didn't test it, just wrote a code, but this is an idea, even if somewhere crept in some bug.
If this doesn't work, I don't know what will.

python `os` returning files that `os` thinks doesn't exist

I have a collection of files from an older MAC OS file store. I know that there are filename / path name issues with the collection. The issue stems from the inclusion of a codepoint in the path that I think was rendered as a dash in the original OS, but windows struggles with the codepoint, and either includes a diacritic on the previous character, or replaces it with a ?
I'm trying to figure out a way to establishing a "truth" of the files structure, so I can be sure I'm accounting for every file.
I have explored the files with a few tools, and nothing has matching tallies. I believe the following demonstrates the problem.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
folder = "emails"
b = os.listdir(folder)
for f in b:
print repr(f)
print os.path.isfile(os.path.join(folder, f))
(I have to redact the actual filenames a litte)
Results in:-
'file(1)'
True
'file(2)'
True
'file(3)?'
False
'file(4)'
True
The file name of interest is file(3)?, where the odd codepoint has been decoded as a ?, and which evaluates as not being a file (or even exisiting via os.path.exists).
Note that print repr(string) shows that its handling a UTF-8, properly encoded ?.
I can copy paste the filename from the folder and it appears as : file(3) note the fullstop.
I can paste the string into my editor (subl) and see that I now have an undisplayable codepoint glyph for the final codepoint
a = "file(3)"
print a
print repr(a)
Gives me:
file(3)
'file(3)\xef\x80\xa9'
From this I can see that the odd code point is \xef\x80\xa9. Elsewhere in the set I also find the codepoint \xef\x80\xa8.
I must assume that os.listdir is not returning raw codepoint values but an (UTF-8?) encoded string, with a codepoint subsitution that means when it tests for exists or isfile its testing for the existance of the wrong filename, as the the file with a subsituted ? does not exist.
How do I work with these files safely? I have around 40 in a collection of around 700 files.

Try passing a unicode to os.listdir:
folder = u"emails"
b = os.listdir(folder)
Doing so will cause os.listdir to return a list of unicodes instead of strs.
Unfortunately, the more I think about this the less I understand about why this worked. Every filesystem ultimately stores its filenames in bytes using some encoding. HDF+ for instance stores filenames in UTF-16. So it would make sense if os.listdir could return those raw bytes most easily without adulteration. But instead, in this case, it looks like os.listdir can return unadulterated unicode, but not unadulterated bytes.
If someone could explain that mystery I would be most appreciative.

Did the files come from Mac Roman encoding (presumably what MacOS used), or the NFKD normal form of UTF-8 that Mac OS X uses?
The concept of Unicode normal forms is one that every programmer ought to be familiar with.... precious few are though. I can't tell you what you need too know about this with regard to Python though.

Python 2.7: Read file with Chinese characters

I am trying to analyze data within CSV files with Chinese characters in their names (E.g. "粗1 25g").
I am using Tkinter to choose the files like so:
selectedFiles = askopenfilenames(filetypes=[("xlsx","*"),("xls","*")]) # Utilize Tkinker dialog window to choose files
selectedFiles = master.tk.splitlist(selectedFiles) # Create list from files chosen
I have attempted to convert the filename to unicode in this way:
selectedFiles = [x.decode("utf-8") for x in selectedFiles]
Only to yield the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 0: ordinal not in range(128)
I have also tried converting the filenames as the files are created with the following:
titles = [x.encode('utf-8') for x in titles]
Only to receive the error:
IOError: [Errno 22] invalid mode ('wb') or filename: 'C:\...\\data_division_files\\\xe7\xb2\x971 25g.csv'
I have also tried combinations of the above methods to no avail.
What can I do to allow these files to be read in Python?
(This question,while related, has not been able to solve my problem: Obtain File size with os.path.getsize() in Python 2.7.5)

When you call decode on a unicode object, it first encodes it with sys.getdefaultencoding() so it can decode it for you. Which is why you get an error about ASCII even though you didn't ask for ASCII anywhere.
So, where are you getting a unicode object from? From askopenfilename. From a quick test, it looks like it always returns unicode values on Windows (presumably by getting the UTF-16 and decoding it), while on POSIX it returns some unicode and some str (I'd guess by leaving alone anything that fits into 7-bit ASCII, decoding anything else with your filesystem encoding). If you'd tried printing out the repr or type or anything of selectedFiles, the problem would have been obvious.
Meanwhile, the encode('utf-8') shouldn't cause any UnicodeErrors… but it's likely that your filesystem encoding isn't UTF-8 on Windows, so it will probably cause a lot of IOErrors with errno 2 (trying to open files that don't exist, or to create files in directories that don't exist), 21 (trying to open files with illegal file or directory names on Windows), etc. And it looks like that's exactly what you're seeing. And there's really no reason to do it; just pass the pathnames as-is to open and they'll be fine.
So, basically, if you removed all of your encode and decode calls, your code would probably just work.
However, there's an even easier solution: Just use askopenfile or asksaveasfile instead of askopenfilename or asksaveasfilename. Let Tk figure out how to use its pathnames and just hand you the file objects, instead of messing with the pathnames yourself.

Get properties of a file whose name contains special (non-ASCII) characters

I'm using python and having some trouble reading the properties of a file, when the filename includes non-ASCII characters.
One of the files for example is named:
0-Channel-https∺∯∯services.apps.microsoft.com∯browse∯6.2.9200-1∯615∯Channel.dat
When I run this:
list2 = os.listdir('C:\\Users\\James\\AppData\\Local\\Microsoft\\Windows Store\\Cache Medium IL\\0\\')
for data in list2:
print os.path.getmtime(data) + '\n'
I get the error:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: '0-Channel-https???services.apps.microsoft.com?browse?6.2.9200-1?615?Channel.dat'
I assume its caused by the special chars because the code works fine with other file names with only ASCII chars.
Does anyone know of a way to query the filesystem properties of a file named like this?

If this is python 2.x, its an encoding issue. If you pass a unicode string to os.listdir such as u'C:\\my\\pathname', it will return unicode strings and they should have the non-ascii chars encoded correctly. See Unicode Filenames in the docs.
Quoting the doc:
os.listdir(), which returns filenames, raises an issue: should it return the Unicode version of filenames, or should it return 8-bit strings containing the encoded versions? os.listdir() will do both, depending on whether you provided the directory path as an 8-bit string or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing an 8-bit path will return the 8-bit versions of the filenames. For example, assuming the default filesystem encoding is UTF-8, running the following program:
this code should work...
directory_name = u'C:\\Users\\James\\AppData\\Local\\Microsoft\\Windows Store\\Cache Medium IL\\0\\'
list2 = os.listdir(directory_name)
for data in list2:
print data, os.path.getmtime(os.path.join(directory_name, data))

As you are in windows you should try with ntpath module instead of os.path
from ntpath import getmtime
As I don't have windows I can't test it. Every os has a different path convention, so, Python provides a specific module for the most common operative systems.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.