File scanner in python 3 - python

I am learning python atm and in order to do something useful whilst learning, I have created a small plan:
Read specific disc drive partition. Outcome: List of directories
Iterate each file within directory and subdirectories. Outcome: List of files within directories
Read file information: extension Outcome: File extension
Read file information: size Outcome: Size
Read file information: date created Outcome: Date
Read file information: date modified Date
Read file information: owner Outcome:Ownership
At step 1 I have tried several approaches, scandir:
import os as os
x = [f.name for f in os.scandir('my_path') if f.is_file()]
with open('write_to_file_path', 'w') as f:
for row in x:
print(row)
f.write("%s\n" % str(row))
f.close()
and this:
import os as os
rootDir = ('/Users/Ivan/Desktop/revit dynamo/')
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
Though I have hard time writing a result into txt file.
May I ask what would be an ideal approach to make an audit of the specific directories with all relevant information extracted and stored as a table in txt file for now?
P.S.: my first question here, so please do not judge to strictly :)

Since you are learning Python3, I would suggest as an alternative to the low-level path manipulation using os.path, you could try pathlib(part of standard library as of Python 3.4):
from pathlib import Path
p = Path(mydir)
#list mydir content
for child in p.iterdir():
print(child)
#Recursive iteration
for child in p.glob("**/*"):
if child.is_dir():
#do dir stuff
else:
print(child.suffix) # extension
print(child.owner()) # file owner
child_info = child.stat()
#file size, mod time
print(child_info.size,child_info.st_mtime)
File creation time is platform-dependent, but this post presents some solutions.
The string of a Path can be accessed as str(p).
To write to a file using pathlib:
textfile = Path(myfilepath)
# create file if it doesn't exist
textfile.touch()
# open file, write string, then close file
textfile.write_text(mystringtext)
# open file with context manager
with textfile.open('r') as f:
f.read()

Related

How do I see if the contents of a csv file exists as a file in another directory?

EDIT:
To better explain my dilemma, I have a csv file that lists a number of applications numbered XXXXXX. Each of these applications have a corresponding xml file that exists in another directory. I'm essentially attempting to write a script that.
unzips the folder that contains the xml files and the csv file.
parse the entries within the csv file and sees that that each application listed in the csv file has a corresponding xml file.
Output another CSV file that sets an application to true if the xml file exists.
So far I've written the script to unzip, but I'm having a hard time wrapping my head around step 2 and 3.
from tkinter import Tk
from tkinter.filedialog import askdirectory
import zipfile
import os
import xml.etree.ElementTree as ET
import pandas as pd
from datetime import datetime
def unzipXML(root):
print(f'({datetime.now().strftime("%b. %d - %H:%M:%S")}) Stage 1 of 5: Unzipping folder(s)...')
# Get filepaths of .zip files
zipPaths = []
for filename in os.listdir(root):
if filename.endswith(".zip"):
zipPaths.append(root + "/" + filename)
# Unzip all .zip files
for path in zipPaths:
with zipfile.ZipFile(path, 'r') as zipRef:
zipRef.extractall(root)
print(f'({datetime.now().strftime("%b. %d - %H:%M:%S")}) {len(zipPaths)} folder(s) unzipped successfully.')
Loop through the names in the csv, calling os.path.exists() on each one.
with open("filenames.csv") as inf, open("apps.csv", "w") as outf:
in_csv = csv.reader(inf)
out_csv = csv.writer(outf)
for row in in_csv:
app_name = row[0] # replace [0] with the correct field number for your CSV
if os.path.exists(os.path.join(directory_path, app_name + ".xml")):
out_csv.writerow([app_name, 'exists'])
else:
out_csv.writerow([app_name, 'notexists'])
I don't know if I understand your problem, but maybe this will help:
#Get files from path
List_Of_Files = glob.glob('./' + '\\*.csv')
for file_name in List_Of_Files:
if file_name == your_var:
...

reading files from a folder using os module

for a pattern recognition application, I want to read and operate on jpeg files from another folder using the os module.
I tried to use str(file) and file.encode('latin-1') but they both give me errors
I tried :
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ str(file.encode('latin-1'))), 'r')
allLines.append(file.read())
print(allLines)
but I get an error saying:
No such file or directory "results/b'thefilename"
when I expect a list with the desired file names that are accessible
If you can use Python 3.4 or newer, you can use the pathlib module to handle the paths.
from pathlib import Path
all_lines = []
path = Path('results/')
for file in path.iterdir():
with file.open() as f:
all_lines.append(f.read())
print(all_lines)
By using the with statement, you don't have to close the file descriptor by hand (what is currently missing), even if an exception is raised at some point.

How to open a file only using its extension?

I have a Python script which opens a specific text file located in a specific directory (working directory) and perform some actions.
(Assume that if there is a text file in the directory then it will always be no more than one such .txt file)
with open('TextFileName.txt', 'r') as f:
for line in f:
# perform some string manipulation and calculations
# write some results to a different text file
with open('results.txt', 'a') as r:
r.write(someResults)
My question is how I can have the script locate the text (.txt) file in the directory and open it without explicitly providing its name (i.e. without giving the 'TextFileName.txt'). So, no arguments for which text file to open would be required for this script to run.
Is there a way to achieve this in Python?
You could use os.listdir to get the files in the current directory, and filter them by their extension:
import os
txt_files = [f for f in os.listdir('.') if f.endswith('.txt')]
if len(txt_files) != 1:
raise ValueError('should be only one txt file in the current directory')
filename = txt_files[0]
You Can Also Use glob Which is easier than os
import glob
text_file = glob.glob('*.txt')
# wild card to catch all the files ending with txt and return as list of files
if len(text_file) != 1:
raise ValueError('should be only one txt file in the current directory')
filename = text_file[0]
glob searches the current directory set by os.curdir
You can change to the working directory by setting
os.chdir(r'cur_working_directory')
Since Python version 3.4, it is possible to use the great pathlib library. It offers a glob method which makes it easy to filter according to extensions:
from pathlib import Path
path = Path(".") # current directory
extension = ".txt"
file_with_extension = next(path.glob(f"*{extension}")) # returns the file with extension or None
if file_with_extension:
with open(file_with_extension):
...

I want to process every file inside a folder line by line and get a particular matching string

I am trying to process every files inside a folder line by line. I need to check for a particular string and write into an excel sheet. Using my code, if i explicitly give the file name, the code will work. If I try to get all the files, then it throws an IOError. The code which I wrote is as below.
import os
def test_extract_programid():
folder = 'C://Work//Scripts//CMDC_Analysis//logs'
for filename in os.listdir(folder):
print filename
with open(filename, 'r') as fo:
strings = ("/uri")
<conditions>
for line in fo:
if strings in line:
<conditions>
I think the error is that the file is already opened when the for loop started but i am not sure. printing the file name prints the file name correctly.
The error shown is IOError: [Errno 2] No such file or directory:
if your working directory is not the same as folder, then you need to give open the path the the file as well:
with open(folder+'/'+filename, 'r') as fo
Alternatively, you can use glob
import glob
for filename in glob.glob(folder+'/*'):
print filename
It can't open the path. You should do
for filename in os.listdir(folder):
print folder+os.sep()+filename

Python: Files stop being opened at a certain point

I've written the following program in Python:
import re
import os
import string
folder = 'C:\Users\Jodie\Documents\Uni\Final Year Project\All Data'
folderlisting = os.listdir(folder)
for eachfolder in folderlisting:
print eachfolder
if os.path.isdir(folder + '\\' + eachfolder):
filelisting = os.listdir('C:\Users\Jodie\Documents\Uni\Final Year Project\All Data\\' + eachfolder)
print filelisting
for eachfile in filelisting:
if re.search('.genbank.txt$', eachfile):
genbankfile = open(eachfile, 'r')
print genbankfile
if re.search('.alleles.txt$', eachfile):
allelesfile = open(eachfile, 'r')
print allelesfile
It looks through a lot of folders, and prints the following:
The name of each folder, without the path
A list of all files in each folder
Two specific files in each folder (Any files containing ".genbank.txt" and ".alleles.txt").
The code works until it reaches a certain directory, and then fails with the following error:
Traceback (most recent call last):
File "C:\Users\Jodie\Documents\Uni\Final Year Project\Altering Frequency Practice\Change_freq_data.py", line 16, in <module>
genbankfile = open(eachfile, 'r')
IOError: [Errno 2] No such file or directory: 'ABP1.genbank.txt'
The problem is:
That file most definitely exists, since the program lists it before it tries to open the file.
Even if I take that directory out of the original group of directories, the program throws up the same error for the next folder it iterates to. And the next, if that one is removed. And so on.
This makes me think that it's not the folder or any files in it, but some limitation of Python? I have no idea. It has stumped me.
Any help would be appreciated.
You should use os.walk() http://docs.python.org/library/os.html#os.walk
Also, you need to read the contents of the file, you don't want to print the file object. And you need to close the file when you're done or use a context manager to close it for you.
would look something like:
for root, dirs, files in os.walk(folder):
for file_name in files:
if re.search('.genbank.txt$', file_name) or \
re.search('.alleles.txt$', file_name):
with open(os.path.join(root, f), 'r') as f:
print f.read()
Keep in mind this is not 'exactly' what you're doing, this will walk the entire tree, you may just want to walk a single level like you are already doing.

Categories

Resources