I'm trying to use a python script to edit a large directory of .html files in a loop. I'm having trouble looping through the filenames using os.walk(). This chunk of code just turns the html files into strings that I can work with, but the script does not even enter the loop, as if the files don't exist. Basically it prints point1 but never reaches point2. The script ends without an error message. The directory is set up inside the folder called "amazon", and there is one level of 20 subfolders inside of it with 20 html files in each of those.
Oddly the code works perfectly on a neighboring directory that only contains .txt files, but it seems like it's not grabbing my .html files for some reason. Is there something I don't understand about the structure of the for root, dirs, filenames in os.walk() loop? This is my first time using os.walk, and I've looked at a number of other pages on this site to try to make it work.
import os
rootdir = 'C:\filepath\amazon'
print "point1"
for root, dirs, filenames in os.walk(rootdir):
print "point2"
for file in filenames:
with open (os.path.join(root, file), 'r') as myfile:
g = myfile.read()
print g
Any help is much appreciated.
The backslash is used as an escape. Either double them, or use "raw strings" by putting a prefix "r" on it.
Example:
>>> 'C:\filepath\amazon'
'C:\x0cilepath\x07mazon'
>>> r'\x'
'\\x'
>>> '\x'
ValueError: invalid \x escape
Explanation: In Python, what does preceding a string literal with “r” mean?
You can avoid having to explicitly handle slashes of any sort by using os.path.join:
rootdir = os.path.join('C:', 'filepath', 'amazon')
Your problem is that you're using backslashes in your path:
>>> rootdir = 'C:\filepath\amazon'
>>> rootdir
'C:\x0cilepath\x07mazon'
>>> print(rootdir)
C:
ilepathmazon
Because Python strings use the backslash to escape special characters, in your rootdir the \f represents an ASCII Form Feed character, and the \a represents an ASCII Bell character.
You can either use a raw string (note the r before the apostrophe) to avoid this:
>>> rootdir = r'C:\filepath\amazon'
>>> rootdir
'C:\\filepath\\amazon'
>>> print(rootdir)
C:\filepath\amazon
... or just use regular slashes, which work fine on Windows anyway:
>>> rootdir = 'C:/filepath/amazon'
>>> rootdir
'C:/filepath/amazon'
>>> print(rootdir)
C:/filepath/amazon
As Huu Nguyen points out, it's considered good practice to construct paths using os.path.join() when possible ... that way you avoid the problem altogether:
>>> rootdir = os.path.join('C:', 'filepath', 'amazon')
>>> rootdir
'C:\\filepath\\amazon' # presumably ... I don't use Windows.
>>> print(rootdir)
C:\filepath\amazon
I had an issue that sounds similar to this with os.walk. The escape character (\) added to filepaths on Mac due to spaces in the path was causing the problem.
For example, the path:
/Volumes/MacHD/My Folder/MyFiles/...
when accessed via Terminal is shown as:
/Volumes/MacHD/My\ Folder/MyFiles/...
The solution was to read the path to a string and then create a new string that removed the escape characters, e.g:
# Ask user for directory tree to scan for master files
masterpathraw = raw_input("Specify directory of master files:")
# Clear escape characters from the path
masterpath = masterpathraw.replace('\\', '')
# Provide this path to os.walk
for fullpath, _, filenames in os.walk(masterpath):
# Do stuff
Related
I have to build the full path together in python. I tried this:
filename= "myfile.odt"
subprocess.call(['C:\Program Files (x86)\LibreOffice 5\program\soffice.exe',
'--headless',
'--convert-to',
'pdf', '--outdir',
r'C:\Users\A\Desktop\Repo\',
r'C:\Users\A\Desktop\Repo\'+filename])
But I get this error
SyntaxError: EOL while scanning string literal.
Try:
import os
os.path.join('C:\Users\A\Desktop\Repo', filename)
The os module contains many useful methods for directory and path manipulation
Backslash character (\) has to be escaped in string literals.
This is wrong: '\'
This is correct: '\\' - this is a string containing one backslash
Therefore, this is wrong:
'C:\Program Files (x86)\LibreOffice 5\program\soffice.exe'
There is a trick!
String literals prefixed by r are meant for easier writing of regular expressions. One of their features is that backslash characters do not have to be escaped. So, this would be OK:
r'C:\Program Files (x86)\LibreOffice 5\program\soffice.exe'
However, that wont work for a string ending in backslash:
r'\' - this is a syntax error
So, this is also wrong:
r'C:\Users\A\Desktop\Repo\'
So, I would do the following:
import os
import subprocess
soffice = 'C:\\Program Files (x86)\\LibreOffice 5\\program\\soffice.exe'
outdir = 'C:\\Users\\A\\Desktop\\Repo\\'
full_path = os.path.join(outdir, filename)
subprocess.call([soffice,
'--headless',
'--convert-to', 'pdf',
'--outdir', outdir,
full_path])
The problem you have is that your raw string is ending with a single backslash. For reason I don't understand, this is not allowed. You can either double up the slash at the end:
r'C:\Users\A\Desktop\Repo\\'+filename
or use os.path.join(), which is the preferred method:
os.path.join(r'C:\Users\A\Desktop\Repo', filename)
To build on what zanseb said, use the os.path.join, but also \ is an escape character, so your string literal can't end with a \ as it would escape the ending quote.
import os
os.path.join(r'C:\Users\A\Desktop\Repo', filename)
To anyone else stumbling across this question, you can use \ to concatenate a Path object and str.
Use path.Path for paths compatible with both Unix and Windows (you can use it the same way as I've used pathlib.PureWindowsPath).
The only reason I'm using pathlib.PureWindowsPath is that the question asked specifically about Windows paths.
For example:
import pathlib
# PureWindowsPath enforces Windows path style
# for paths that work on both Unix and Windows use path.Path
base_dir = pathlib.PureWindowsPath(r'C:\Program Files (x86)\LibreOffice 5\program')
# elegant path concatenation
myfile = base_dir / "myfile.odt"
print(myfile)
>>> C:\Program Files (x86)\LibreOffice 5\program\myfile.odt
add library to code :
from pathlib import Path
when u want get current path without filename use this method :
print("Directory Path:", Path().absolute())
now you just need to add the file name to it :for example
mylink = str(Path().absolute())+"/"+"filename.etc" #str(Path().absolute())+"/"+"hello.txt"
If normally addes to the first path "r" character
for example: r"c://..."
You do not need to do here
You can also simply add the strings together. Personally I like this more.
filename = r"{}{}{}".format(dir, foldername, filename)
I'm trying to find all *.txt files in a directory with glob(). In some cases, glob.glob('some\path\*.txt') gives an empty string, despite existing files in the given directories. This is especially true, if path is all lower-case or numeric.
As a minimal example I have two folders a and A on my C: drive both holding one Test.txt file.
import glob
files1 = glob.glob('C:\a\*.txt')
files2 = glob.glob('C:\A\*.txt')
yields
files1 = []
files2 = ['C:\\A\\Test.txt']
If this is by design, is there any other directory name, that leads to such unexpected behaviour?
(I'm working on win 7, with Python 2.7.10 (32bit))
EDIT: (2019) Added an answer for Python 3 using pathlib.
The problem is that \a has a special meaning in string literals (bell char).
Just double backslashes when inserting paths in string literals (i.e. use "C:\\a\\*.txt").
Python is different from C because when you use backslash with a character that doesn't have a special meaning (e.g. "\s") Python keeps both the backslash and the letter (in C instead you would get just the "s").
This sometimes hides the issue because things just work anyway even with a single backslash (depending on what is the first letter of the directory name) ...
I personally avoid using double-backslashes in Windows and just use Python's handy raw-string format. Just change your code to the following and you won't have to escape the backslashes:
import glob
files1 = glob.glob(r'C:\a\*.txt')
files2 = glob.glob(r'C:\A\*.txt')
Notice the r at the beginning of the string.
As already mentioned, the \a is a special character in Python. Here's a link to a list of Python's string literals:
https://docs.python.org/2/reference/lexical_analysis.html#string-literals
As my original answer attracted more views than expected and some time has passed. I wanted to add an answer that reliably solves this kind of problems and is also cross-plattform compatible. It's in python 3 on Windows 10, but should also work on *nix systems.
from pathlib import Path
filepath = Path(r'C:\a')
filelist = list(filepath.glob('*.txt'))
--> [WindowsPath('C:/a/Test.txt')]
I like this solution better, as I can copy and paste paths directly from windows explorer, without the need to add or double backslashes etc.
The following piece of code works fine, reads all the text files in the specified directory:
files_ = glob.glob('D:\Test files\Case 1\*.txt')
But when I change the path to another directory, it gives me an empty list of files:
files_ = glob.glob('D:\Test files\Case 2\*.txt')
print files_ >> []
Both directories contain a couple of text files. Text file names and sizes are different though.
It's really wired and I couldn't think of any thing to solve the problem. Has anyone faced such a problem?
You need to either escape your backslashes:
files_ = glob.glob('D:\\Test files\\Case 2\\*.txt')
Or specify that your string is a raw string (meaning backslashes should not be specially interpreted):
files_ = glob.glob(r'D:\Test files\Case 2\*.txt')
What happened to break your second glob is that \1 turned into the ASCII control character \x01. The error message contains a clue to that:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'D:\\Test files\\B1\x01rgb/*.*'
Notice how a \1 turned into the literal \x01. The reason your first directory worked is that you basically got lucky and didn't accidentally specify any special characters:
'\T'
Out[27]: '\\T'
'\B'
Out[28]: '\\B'
'\1'
Out[29]: '\x01'
Simple code:
import os
filenamelist = []
#path = "E:\blah\blah\blah"
path = "C:\Program Files\Console2"
for files in os.walk(path):
filenamelist.append(files)
print files
The above works. But when I set path= "E:\blah\blah\blah" the script runs but returns nothing.
1) C:\Users\guest>python "read files.py"
('C:\\Program Files\\Console2', [], ['console.chm', 'Console.exe', 'console.xml', 'ConsoleHook.dll', 'FreeImage.dll', 'FreeImagePlus.dll'])
2) C:\Users\guest>python "read files.py"
C:\Users\guest>
Any idea why os.walk() is having a difficult time with E:\? I can't get it to read anything on E:. I have an external drive mapped to E drive.
That could be because python treats \ as an escape symbol and you have a combination that is really an escape symbol for E: disk path.
It might be solved in one of the following ways:
Raw string literals: r"E:\blah\blah\blah" (the backslashes are not treated as escape symbols).
Double-backslashes: "E:\\blah\\blah\\blah" (escape symbols are escaped by themselves).
Slashes "E:/blah/blah/blah" (this works on Windows too).
My script searches the directory that it's in and will create new directories using the file names that it has found and moves them to that directory: John-doe-taxes.hrb -> John-doe/John-does-taxes.hrb. It works fine until it runs into an umlaut character then it will create the directory and return an "Error 2" saying that it cannot find the file. I'm fairly new to programming and the answers i've found have been to add a
coding: utf-8
line to the file which doesn't work I believe because i'm not using umlauts in my code i'm dealing with umlaut files. One thing I was curious about, does this problem just occur with umlauts or other special characters as well? This is the code i'm using, I appreciate any advice provided.
import os
import re
from os.path import dirname, abspath, join
dir = dirname(abspath(__file__))
(root, dirs, files) = os.walk(dir).next()
p = re.compile('(.*)-taxes-')
count = 0
for file in files:
match = p.search(file)
if match:
count = count + 1
print("Files processed: " + str(count))
dir_name = match.group(1)
full_dir = join(dir, dir_name)
if not os.access(full_dir, os.F_OK):
os.mkdir(full_dir)
os.rename(join(dir, file), join(full_dir, file))
raw_input()
I think your problem is passing strs to os.rename that aren't in the system encoding. As long as the filenames only use ascii characters this will work, however outside that range you're likely to run into problems.
The best solution is probably to work in unicode. The filesystem functions should return unicode strings if you give them unicode arguments. open should work fine on windows with unicode filenames.
If you do:
dir = dirname(abspath(unicode(__file__)))
Then you should be working with unicode strings the whole way.
One thing to consider would be to use Python 3. It has native support for unicode as the default. I'm not sure if you would have to do anything to change anything in the above code for it to work, but there is a python script in the examples to transition Python2 code to Python3.
Sorry I can't help you with Python2, I had a similar problem and just transitioned my project to Python3--ended up just being a bit easier for me!