Simple code:
import os
filenamelist = []
#path = "E:\blah\blah\blah"
path = "C:\Program Files\Console2"
for files in os.walk(path):
filenamelist.append(files)
print files
The above works. But when I set path= "E:\blah\blah\blah" the script runs but returns nothing.
1) C:\Users\guest>python "read files.py"
('C:\\Program Files\\Console2', [], ['console.chm', 'Console.exe', 'console.xml', 'ConsoleHook.dll', 'FreeImage.dll', 'FreeImagePlus.dll'])
2) C:\Users\guest>python "read files.py"
C:\Users\guest>
Any idea why os.walk() is having a difficult time with E:\? I can't get it to read anything on E:. I have an external drive mapped to E drive.
That could be because python treats \ as an escape symbol and you have a combination that is really an escape symbol for E: disk path.
It might be solved in one of the following ways:
Raw string literals: r"E:\blah\blah\blah" (the backslashes are not treated as escape symbols).
Double-backslashes: "E:\\blah\\blah\\blah" (escape symbols are escaped by themselves).
Slashes "E:/blah/blah/blah" (this works on Windows too).
Related
I've been working on a program that reads out an specific PDF and converts the data to an Excel file. The program itself already works, but while trying to refine some aspects I ran into a problem. What happens is the modules I'm working with read directories with simple slashes dividing each folder, such as:
"C:/Users/UserX"
While windows directories are divided by backslashes, such as:
"C:\Users\UserX"
I thought using a simple replace would work just fine:
directory.replace("\" ,"/")
But whenever I try to run the program, the \ isn't identified as a string. Instead it pops up as orange in the IDE I'm working with (PyCharm). Is there anyway to remediate this? Or maybe another useful solution?
In general you should work with the os.path package here.
os.getcwd() gives you the current directory, you can add a subfolder of it via more arguments, and put the filename last.
import os
path_to_file = os.path.join(os.getcwd(), "childFolder", filename)
In Python, the '\' character is represented by '\\':
directory.replace("\\" ,"/")
Just try adding another backslash.
First of all you need to pass "C:\Users\UserX" as a raw string. Use
directory=r"C:\Users\UserX"
Secondly, suppress the backslash using a second backslash.
directory.replace("\\" ,"/")
All of this is required as in python the backslash (\) is a special character known as an escape character.
Try this:
import os
path = "C:\\temp\myFolder\example\\"
newPath = path.replace(os.sep, '/')
print(newPath)
Output:<< C:/temp/myFolder/example/ >>
Following code :
def tema_get_file():
logdir='T:\\'
logfiles = sorted([ f for f in os.listdir(logdir) if f.startswith('tms_int_calls-')])
return logfiles[-1]
This runs fine, but I am trying to get logdir to run with a direct path :
\\servername\path\folder
The drive T is a mapped drive. Originally, the files are on the C Drive.
As soon as I do that, I get the error message :
WindowsError: [Error 3] The system cannot find the path specified:
'\servername\path\folder/.'
I've tried :
"\\servername\\path\\folder" , "\\servername\\path\\folder\\"
and
r"\\servername\path\folder" , r"\\servername\path\folder\"
and
"\\\\servername\\path\\folder" , "\\\\servername\\path\\folder\\"
For me both of the following work
os.listdir(r'\\server\folder')
os.listdir('\\\\server\\folder')
os.listdir(myUNCpath) cannot handle Windows UNC path correctly if the path string was not defined by a literal like myUNCpath = "\\\\servername\\dir1\\dir2" or using a raw string like myUNCpath = "\\servername\dir1\dir2 even if the string variable is defined like that because listdir always doubles backslash from string variable.
But what the heck one could do if getting the UNC path string by reading it from a ini file or by any other config file?
There is no way to edit as a literal, nor is it possible to make it a raw string using this r character in front of.
As a work around I found out that, it is possible to split an overall UNC path string variable into it's single components (to get rid off this dammned backslash characters) and to recompose it using a literal definition and by this setting the backslash characters again. Then the string works well - incredibel but true!
Here is my function, to perform this work around. The string which is given back from the function will work as expected if the path in the file is defined as
\servername\dir1\dir2 (without added backslash as a escape character)
...
myworkswellUNCPath = recomposeUNCpathstring(myUNCpath)
...
def recomposeUNCpathstring(UNCstring):
pathstring1 = UNCstring.replace("\\\\", "").strip()
pathComponents = pathstring1.split("\\")
pathstring = "\\\\" + pathComponents[0]
for i in range(1, len(pathComponents)-1):
pathstring = pathstring + "\\" + pathComponents[i]
return pathstring
Cheers
Stefan
I'm trying to find all *.txt files in a directory with glob(). In some cases, glob.glob('some\path\*.txt') gives an empty string, despite existing files in the given directories. This is especially true, if path is all lower-case or numeric.
As a minimal example I have two folders a and A on my C: drive both holding one Test.txt file.
import glob
files1 = glob.glob('C:\a\*.txt')
files2 = glob.glob('C:\A\*.txt')
yields
files1 = []
files2 = ['C:\\A\\Test.txt']
If this is by design, is there any other directory name, that leads to such unexpected behaviour?
(I'm working on win 7, with Python 2.7.10 (32bit))
EDIT: (2019) Added an answer for Python 3 using pathlib.
The problem is that \a has a special meaning in string literals (bell char).
Just double backslashes when inserting paths in string literals (i.e. use "C:\\a\\*.txt").
Python is different from C because when you use backslash with a character that doesn't have a special meaning (e.g. "\s") Python keeps both the backslash and the letter (in C instead you would get just the "s").
This sometimes hides the issue because things just work anyway even with a single backslash (depending on what is the first letter of the directory name) ...
I personally avoid using double-backslashes in Windows and just use Python's handy raw-string format. Just change your code to the following and you won't have to escape the backslashes:
import glob
files1 = glob.glob(r'C:\a\*.txt')
files2 = glob.glob(r'C:\A\*.txt')
Notice the r at the beginning of the string.
As already mentioned, the \a is a special character in Python. Here's a link to a list of Python's string literals:
https://docs.python.org/2/reference/lexical_analysis.html#string-literals
As my original answer attracted more views than expected and some time has passed. I wanted to add an answer that reliably solves this kind of problems and is also cross-plattform compatible. It's in python 3 on Windows 10, but should also work on *nix systems.
from pathlib import Path
filepath = Path(r'C:\a')
filelist = list(filepath.glob('*.txt'))
--> [WindowsPath('C:/a/Test.txt')]
I like this solution better, as I can copy and paste paths directly from windows explorer, without the need to add or double backslashes etc.
I'm trying to use a python script to edit a large directory of .html files in a loop. I'm having trouble looping through the filenames using os.walk(). This chunk of code just turns the html files into strings that I can work with, but the script does not even enter the loop, as if the files don't exist. Basically it prints point1 but never reaches point2. The script ends without an error message. The directory is set up inside the folder called "amazon", and there is one level of 20 subfolders inside of it with 20 html files in each of those.
Oddly the code works perfectly on a neighboring directory that only contains .txt files, but it seems like it's not grabbing my .html files for some reason. Is there something I don't understand about the structure of the for root, dirs, filenames in os.walk() loop? This is my first time using os.walk, and I've looked at a number of other pages on this site to try to make it work.
import os
rootdir = 'C:\filepath\amazon'
print "point1"
for root, dirs, filenames in os.walk(rootdir):
print "point2"
for file in filenames:
with open (os.path.join(root, file), 'r') as myfile:
g = myfile.read()
print g
Any help is much appreciated.
The backslash is used as an escape. Either double them, or use "raw strings" by putting a prefix "r" on it.
Example:
>>> 'C:\filepath\amazon'
'C:\x0cilepath\x07mazon'
>>> r'\x'
'\\x'
>>> '\x'
ValueError: invalid \x escape
Explanation: In Python, what does preceding a string literal with “r” mean?
You can avoid having to explicitly handle slashes of any sort by using os.path.join:
rootdir = os.path.join('C:', 'filepath', 'amazon')
Your problem is that you're using backslashes in your path:
>>> rootdir = 'C:\filepath\amazon'
>>> rootdir
'C:\x0cilepath\x07mazon'
>>> print(rootdir)
C:
ilepathmazon
Because Python strings use the backslash to escape special characters, in your rootdir the \f represents an ASCII Form Feed character, and the \a represents an ASCII Bell character.
You can either use a raw string (note the r before the apostrophe) to avoid this:
>>> rootdir = r'C:\filepath\amazon'
>>> rootdir
'C:\\filepath\\amazon'
>>> print(rootdir)
C:\filepath\amazon
... or just use regular slashes, which work fine on Windows anyway:
>>> rootdir = 'C:/filepath/amazon'
>>> rootdir
'C:/filepath/amazon'
>>> print(rootdir)
C:/filepath/amazon
As Huu Nguyen points out, it's considered good practice to construct paths using os.path.join() when possible ... that way you avoid the problem altogether:
>>> rootdir = os.path.join('C:', 'filepath', 'amazon')
>>> rootdir
'C:\\filepath\\amazon' # presumably ... I don't use Windows.
>>> print(rootdir)
C:\filepath\amazon
I had an issue that sounds similar to this with os.walk. The escape character (\) added to filepaths on Mac due to spaces in the path was causing the problem.
For example, the path:
/Volumes/MacHD/My Folder/MyFiles/...
when accessed via Terminal is shown as:
/Volumes/MacHD/My\ Folder/MyFiles/...
The solution was to read the path to a string and then create a new string that removed the escape characters, e.g:
# Ask user for directory tree to scan for master files
masterpathraw = raw_input("Specify directory of master files:")
# Clear escape characters from the path
masterpath = masterpathraw.replace('\\', '')
# Provide this path to os.walk
for fullpath, _, filenames in os.walk(masterpath):
# Do stuff
My script searches the directory that it's in and will create new directories using the file names that it has found and moves them to that directory: John-doe-taxes.hrb -> John-doe/John-does-taxes.hrb. It works fine until it runs into an umlaut character then it will create the directory and return an "Error 2" saying that it cannot find the file. I'm fairly new to programming and the answers i've found have been to add a
coding: utf-8
line to the file which doesn't work I believe because i'm not using umlauts in my code i'm dealing with umlaut files. One thing I was curious about, does this problem just occur with umlauts or other special characters as well? This is the code i'm using, I appreciate any advice provided.
import os
import re
from os.path import dirname, abspath, join
dir = dirname(abspath(__file__))
(root, dirs, files) = os.walk(dir).next()
p = re.compile('(.*)-taxes-')
count = 0
for file in files:
match = p.search(file)
if match:
count = count + 1
print("Files processed: " + str(count))
dir_name = match.group(1)
full_dir = join(dir, dir_name)
if not os.access(full_dir, os.F_OK):
os.mkdir(full_dir)
os.rename(join(dir, file), join(full_dir, file))
raw_input()
I think your problem is passing strs to os.rename that aren't in the system encoding. As long as the filenames only use ascii characters this will work, however outside that range you're likely to run into problems.
The best solution is probably to work in unicode. The filesystem functions should return unicode strings if you give them unicode arguments. open should work fine on windows with unicode filenames.
If you do:
dir = dirname(abspath(unicode(__file__)))
Then you should be working with unicode strings the whole way.
One thing to consider would be to use Python 3. It has native support for unicode as the default. I'm not sure if you would have to do anything to change anything in the above code for it to work, but there is a python script in the examples to transition Python2 code to Python3.
Sorry I can't help you with Python2, I had a similar problem and just transitioned my project to Python3--ended up just being a bit easier for me!