I was working on a small script in Python where I had to traverse through the directories which have multiple types of files, but I want to open only text files. So how can I do that? Below is my code.
import os,re
pat=re.compile(input("Enter the text you want to search for : "))
fpath=r'C:\Users\Python\Python_my_Scripts\'
for i in os.walk(fpath):
for fname in i[-1]:
fpath=os.path.join(i[0],fname)
try:
IN=open(fpath,"r")
except Exception as e:
print(e)
else:
line_num=0
for line in IN:
line_num+=1
if not re.search(r'^\s+#',line):
if re.search(pat, line):
print("{1:>2d} : {0}".format(fpath,line_num))
The code basically breaks in the try segment if a directory contains any non-text file.
Use glob to get a list of filenames by pattern:
import glob
glob.glob('*.txt')
Using python-magic you can check the file type, in the same way you would using the file command. You can then check the output from magic.from_file to see whether or not the file is a text file.
>>> import magic
>>> magic.from_file("/bin/bash")
'ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=75a0ba19d5276d9eb81d6f8e9e2cb285da333296, stripped'
>>> magic.from_file("/etc/fstab")
'ASCII text'
>>> if 'text' in magic.from_file("/etc/fstab").lower():
... print("a text file...")
...
a text file...
>>>
Iterate over files with os.walk or get files with glob packages and check whether you file is binary or text for that this might be helpful to you, How can I detect if a file is binary (non-text) in python?.
Related
My script searches the directory that it's in and will create new directories using the file names that it has found and moves them to that directory: John-doe-taxes.hrb -> John-doe/John-does-taxes.hrb. It works fine until it runs into an umlaut character then it will create the directory and return an "Error 2" saying that it cannot find the file. I'm fairly new to programming and the answers i've found have been to add a
coding: utf-8
line to the file which doesn't work I believe because i'm not using umlauts in my code i'm dealing with umlaut files. One thing I was curious about, does this problem just occur with umlauts or other special characters as well? This is the code i'm using, I appreciate any advice provided.
import os
import re
from os.path import dirname, abspath, join
dir = dirname(abspath(__file__))
(root, dirs, files) = os.walk(dir).next()
p = re.compile('(.*)-taxes-')
count = 0
for file in files:
match = p.search(file)
if match:
count = count + 1
print("Files processed: " + str(count))
dir_name = match.group(1)
full_dir = join(dir, dir_name)
if not os.access(full_dir, os.F_OK):
os.mkdir(full_dir)
os.rename(join(dir, file), join(full_dir, file))
raw_input()
I think your problem is passing strs to os.rename that aren't in the system encoding. As long as the filenames only use ascii characters this will work, however outside that range you're likely to run into problems.
The best solution is probably to work in unicode. The filesystem functions should return unicode strings if you give them unicode arguments. open should work fine on windows with unicode filenames.
If you do:
dir = dirname(abspath(unicode(__file__)))
Then you should be working with unicode strings the whole way.
One thing to consider would be to use Python 3. It has native support for unicode as the default. I'm not sure if you would have to do anything to change anything in the above code for it to work, but there is a python script in the examples to transition Python2 code to Python3.
Sorry I can't help you with Python2, I had a similar problem and just transitioned my project to Python3--ended up just being a bit easier for me!
I am doing an assignment on text formatting and alignment (text wrapping) and I need to write my formatted string to new file. But once I have written to the file (or think I've written) where does that file go? Does it actually create a file on my desktop or am I being stupid?
This is my code:
txtFile = open("Output.txt", "w")
txtFile.write(string)
txtFile.close()
return txtFile
Cheers,
JT
The text is written to a file called "Output.txt" in your working directory (which is usually the directory from which the script has been executed).
To display the working directory, you can use:
>>> import os
>>> os.getcwd()
'/home/adam'
When you open a file without specifying a file path, the file will be created in the python scripts working directory.
Usually that is the location of your script but there are times when it may be a different place.
The os module in python will provide functions for checking and changing the working directory within python itself.
most notably:
os.chdir(path)
os.fchdir(fd)
os.getcwd()
It will create a new file called "Output.txt" in the same directory that you executed your script from. It may mean that the file can't be written to, if you're in a directory that doesn't have the appropriate permissions for your user.
so I'm writting a generic backup application with os module and pickle and far I've tried the code below to see if something is a file or directory (based on its string input and not its physical contents).
import os, re
def test(path):
prog = re.compile("^[-\w,\s]+.[A-Za-z]{3}$")
result = prog.match(path)
if os.path.isfile(path) or result:
print "is file"
elif os.path.isdir(path):
print "is directory"
else: print "I dont know"
Problems
test("C:/treeOfFunFiles/")
is directory
test("/beach.jpg")
I dont know
test("beach.jpg")
I dont know
test("/directory/")
I dont know
Desired Output
test("C:/treeOfFunFiles/")
is directory
test("/beach.jpg")
is file
test("beach.jpg")
is file
test("/directory/")
is directory
Resources
Test filename with regular expression
Python RE library
Validating file types by regular expression
what regular expression should I be using to tell the difference between what might be a file and what might be a directory? or is there a different way to go about this?
The os module provides methods to check whether or not a path is a file or a directory. It is advisable to use this module over regular expressions.
>>> import os
>>> print os.path.isfile(r'/Users')
False
>>> print os.path.isdir(r'/Users')
True
This might help someone, I had the exact same need and I used the following regular expression to test whether an input string is a directory, file or neither:
for generic file:
^(\/+\w{0,}){0,}\.\w{1,}$
for generic directory:
^(\/+\w{0,}){0,}$
So the generated python function looks like :
import os, re
def check_input(path):
check_file = re.compile("^(\/+\w{0,}){0,}\.\w{1,}$")
check_directory = re.compile("^(\/+\w{0,}){0,}$")
if check_file.match(path):
print("It is a file.")
elif check_directory.match(path):
print("It is a directory")
else:
print("It is neither")
Example:
check_input("/foo/bar/file.xyz") prints -> Is a file
check_input("/foo/bar/directory") prints -> Is a directory
check_input("Random gibberish") prints -> It is neither
This layer of security of input may be reinforced later by the os.path.isfile() and os.path.isdir() built-in functions as Mr.Squig kindly showed but I'd bet this preliminary test may save you a few microseconds and boost your script performance.
PS: While using this piece of code, I noticed I missed a huge use case when the path actually contains special chars like the dash "-" which is widely used. To solve this I changed the \w{0,} which specifies the requirement of alphabetic only words with .{0,} which is just a random character. This is more of a workaround than a solution. But that's all I have for now.
In a character class, if present and meant as a hyphen, the - needs to either be the first/last character, or escaped \- so change "^[\w-,\s]+\.[A-Za-z]{3}$" to "^[-\w,\s]+\.[A-Za-z]{3}$" for instance.
Otherwise, I think using regex's to determine if something looks like a filename/directory is pointless...
/dev/fd0 isn't a file or directory for instance
~/comm.pipe could look like a file but is a named pipe
~/images/test is a symbolic link to a file called '~/images/holiday/photo1.jpg'
Have a look at the os.path module which have functions that ask the OS what something is...:
I am running a find command for a particular file that I know exists. I would like to get the path to that file, because I don't want to assume that I know where the file is located. My understanding is that I need to redirect stdout, run the command and capture the output, re-hook-up standard output, then retrieve the results. The problem comes when I retrieve the results... I can't decipher them:
import os
from cStringIO import StringIO
stdout_backup = sys.stdout #Backup standard output
stdout_output = StringIO()
sys.stdout = stdout_output #Redirect standard output
os.system("find . -name 'foobar.ext' -print") #Find a known file
sys.stdout = stdout_backup #re-hook-up standard output as top priority
paths_to_file = stdout_ouput.get_value() #Retrieve results
I find all the paths I could want, the trouble is that paths_to_file yields this:
Out[9]: '\n\x01\x1b[0;32m\x02In [\x01\x1b[1;32m\x027\x01\x1b[0;32m\x02]: \x01\x1b[0m\x02\n\x01\x1b[0;32m\x02In [\x01\x1b[1;32m\x028\x01\x1b[0;32m\x02]: \x01\x1b[0m\x02'
I have no idea what to do with this. What I wanted was something like what the print command provides:
./Work/Halpin Programs/Servers/selenium-server.jar
How do I make that output usable for opening a file? If I can get what the print command yeilds, I can open the file I want.
Please reorient the question if I am misguided. Thank you!
You cannot capture the output of a subprocess by changing sys.stdout. What you captured seems to be some ANSI escape sequences from your interactive Python interpreter (IPython?).
To get the output of an external command, you should use subprocess.check_output():
paths = subprocess.check_output(["find", ".", "-name", "foobar.ext"])
In this particular case, I usually wouldn't call an external command at all, but rather use os.walk() to find the files from right within the Python process.
Edit: Here's how to use os.walk() to find the files:
def find(path, pattern):
for root, dirs, files in os.walk(path):
for match in fnmatch.filter(files, pattern):
yield os.path.join(root, match)
paths = list(find(".", "foobar.ext"))
I am trying to do $ mv <file> .. in a python script using subprocess.call(). I am able to do this on 'normal' filenames, but on certain filenames it does not work. I do not have control of the filenames that are given to the script. Here is an example:
M filename is "ITunes ES Film Metadata_10_LaunchTitles(4th Batch)_08_20_2010.XLS"
When I try and do the command directly into the python prompt and drag the file into it, this is what I get:
>>> /Users/David/Desktop/itunes_finalize/TheInventionOfLying_CSP/
ITunes\ ES\ Film\ Metadata_10_LaunchTitles\(4th\ Batch\)_08_20_2010.XLS
No such file or directory
How would I go about moving this file in a python script?
Update:
Thanks for the answers, this is how I ended up doing it:
for file in glob.glob(os.path.join(dir, '*.[xX][lL][sS]')):
shutil.move(file, os.path.join(os.path.dirname(file), os.path.pardir))
subprocess is not the best way to go here. For example, what if you're on an operating system that isn't POSIX compliant?
Check out the shutil module.
>>> import shutil
>>> shutil.move(src, dest)
If finding the actual string for the filename is hard you can use glob.glob to pattern match what you want. For example, if you're running the script/prompt from the directory with the .XLS file in question you could do the following.
>>> import glob
>>> glob.glob('*ITunes*.XLS')
You'll get a list back with all the file strings that fit that pattern.
Rather than using subprocess and spawning a new process, use shutil.move() to just do it in Python. That way, the names won't be reinterpreted and there will be little chance for error.
Spaces, parens, etc. are the shell's problem. They don't require escaping in Python provided you don't pass them to a shell.
open('*WOW!* Rock&Roll(uptempo).mp3')