I wrote the following code to make an inventory of every files in a library. the idea is that the 3 columns have infromation about the file.
1) complete path 2) name of the parent directory 3) filename.
import os
import openpyxl
def crearlista (*arg, **kw):
inventario = openpyxl.Workbook(encoding = "Utf-8")
sheet = inventario.active
i = 1
f = 1
e = ""
for dirpath, subdirs, files in os.walk(*arg, **kw):
for name in subdirs:
e = os.path.join (name)
for name in files:
sheet.cell(row=i, column=3).value = name
sheet.cell(row=i, column=1).value = dirpath
sheet.cell(row=i, column=2).value = e
i = i + 1
inventario.save("asd3.xlsx")
crearlista("//media//rayeus/Datos/Mis Documentos/Nueva carpeta/", topdown=False)
The problem is that it iterates first through the files in the first folder and after that starts filling the 'e' variable with the name of the first folder.
That way it starts writing late the names in the folder column. And it writes theme as many times as files in the next folder, not as many files there are in THAT folder.
How can i solve this?
Related
Working on a script to automatically remove files from various directories. At first I simply just made an IF statement for each file. This resulted in 3 different statements and while it works, It got me thinking that there has to be a way to use iteration to shorten the code.
Right now everything works right up to the last chunk of code; which is supposed to remove the files from their respective folders.
import os
cwd = os.chdir('C:\\Users\\name\\Desktop\\C008_Start\\')
folder = os.listdir('C:\\Users\\name\\Desktop\\C008_Start')
#Define variables and paths
LMP = 'LMP data for CDRL 008'
WIP = 'WIP Consumption CDRL Financials'
WIPRename = 'WIP Consumption CDRL.xlsx'
JBL = 'JBLM - CDRL S006 CDL2'
xlsx = '.xlsx'
LMPPath = 'C:\\Users\\name\\Desktop\\Python\\lmp\\'
WIPPath = 'C:\\Users\\name\\Desktop\\Python\\wip\\'
JBLPath = 'C:\\Users\\name\\Desktop\\Python\\jbl\\'
#add .xlsx extension to file names
paths = [LMPPath,WIPPath,JBLPath]
filelist = [LMP, WIP, JBL, WIPRename]
filelist2 = []
for item in filelist:
filelist2.append(item + xlsx)
#rename WIP file
if filelist2[1] in folder:
os.rename(filelist2[1], WIPRename)
#remove files from locations
for item in filelist2:
if item in paths:
os.remove(paths + item)
else:
print(item + " not in: " + str(paths))
I was doing "robot" which it should run through the contents of the folder and folders inside the folders, and save in excel a list of files lying there in the format. Everything is doing great before going inside the folder which is located in a folder. Like fixedtest --> test --> insidetest. The issue that when I run debug, passing through "if not os.path.isdir(file): " for some reason, he fulfills this condition, considering that this folder is not a folder.
I'm beginner in python and os library, so may be idk something and how it's work, but everything seems to be correct and it's confused. What I should to change, how I can solve the problem?
import openpyxl
import os
def add_row(rowN, folderName, fileName, ext):
sheet.cell(row=rowN, column=1).value = rowN
sheet.cell(row=rowN, column=2).value = folderName
sheet.cell(row=rowN, column=3).value = fileName
sheet.cell(row=rowN, column=4).value = ext
path = os.path.abspath(os.getcwd())
folders = []
i = 0
wb = openpyxl.Workbook()
sheet = wb.active
sheet['A1'] = 'Number of row'
sheet['B1'] = 'Folder where file located'
sheet['C1'] = 'File name'
sheet['D1'] = 'File extension'
folders.append(path)
for folder in folders:
try:
for file in os.listdir(folder):
if not os.path.isdir(file):
i = i + 1
add_row(i, os.path.basename(folder), os.path.splitext(file)[0],
os.path.splitext(file)[1])
else:
folders.append(file)
except:
print(folders)
wb.save("test.xlsx")
wb.close()
import os, unicodecsv as csv
# open and store the csv file
IDs = {}
with open('labels.csv','rb') as csvfile:
timeReader = csv.reader(csvfile, delimiter = ',')
# build dictionary with associated IDs
for row in timeReader:
IDs[row[0]] = row[1]
# move files
path = 'train/'
tmpPath = 'train2/'
for oldname in os.listdir(path):
# ignore files in path which aren't in the csv file
if oldname in IDs:
try:
os.rename(os.path.join(path, oldname), os.path.join(tmpPath, IDs[oldname]))
except:
print 'File ' + oldname + ' could not be renamed to ' + IDs[oldname] + '!'
I am trying to sort my files according to this csv file. But the file contains many ids with same name. Is there a way to move files with same name to 1 folder or adding a number in front of a file if the file with same name already exist in directory?
Example-
id name
001232131hja1.jpg golden_retreiver
0121221122ld.jpg black_hound
0232113222kl.jpg golden_retreiver
0213113jjdsh.jpg alsetian
05hkhdsk1233a.jpg black_hound
I actually want to move all the files having id corresponding to golden_retreiver to one folder and so on.
Based on what you describe, here is my approach:
import csv
import os
SOURCE_ROOT = 'train'
DEST_ROOT = 'train2'
with open('labels.csv') as infile:
next(infile) # Skip the header row
reader = csv.reader(infile)
seen = set()
for dogid, breed in reader:
# Create a new directory if needed
if breed not in seen:
os.mkdir(os.path.join(DEST_ROOT, breed))
seen.add(breed)
src = os.path.join(SOURCE_ROOT, dogid + '.jpg')
dest = os.path.join(DEST_ROOT, breed, dogid + '.jpg')
try:
os.rename(src, dest)
except WindowsError as e:
print e
Notes
For every line in the data file, I create the breed directory at the destination. I use the set seen to make sure that I only create each directory once.
After that, it is a trivia matter of moving files into place
One possible move error: file does not exist in the source dir. In which case, the code just prints out the error and ignore it.
I have a folder (Molecules) with many sdf files (M00001.sdf, M00002.sdf and so on) representing different molecules. I also have a csv where each row represents the a molecule (M00001, M00002 etc).
I'm writing a code in order to get files on Molecules folder if their name is a row on the csv file.
First attempt
import os
path_to_files = '/path_to_folder/Molecules' # path to Molecules folder
for files in os.listdir(path_to_files):
names = os.path.splitext(files)[0] # get the basename (molecule name)
with open('molecules.csv') as ligs: # Open the csv file of molecules names
for hits in ligs:
if names == hits:
print names, hits
else:
print 'File is not here'
However this returns nothing on the command line (literally nothing). What is wrong with this code?
I am not sure that this is the best way (I only know that the following code works for my data) but if your molecule.csv has the standard csv format, i.e. "molecule1,molecule2,molecule3 ...", you can try to rearrange your code in this way:
import os
import csv
path_to_files = '/path_to_folder/Molecules' # path to Molecules folder
for files in os.listdir(path_to_files):
names = os.path.basename(files)
names = names.replace(".sdf","")
with open('molecules.csv','r') as ligs:
content = csv.reader(ligs)
for elem in content:
for hits in elem:
if names == hits:
print names, hits
else:
print 'File is not here'
See csv File Reading and Writing for csv module
I solved the problem with a rather brute approach
import os
import csv
import shutil
path_to_files = None # path to Molecules folder
new_path = None # new folder to save files
os.mkdir(new_path) # create the folder to store the molecules
hits = open('molecules.csv', 'r')
ligands = []
for line in hits:
lig = line.rstrip('\n')
ligands.append(lig)
for files in os.listdir(path_to_files):
molecule_name = os.path.splitext(files)[0]
full_name = '/' + molecule_name + '.sdf'
old_file = path_to_files + full_name
new_file = new_path + full_name
if molecule_name in ligands:
shutil.copy(old_file, new_file)
Im trying to put into an array files[] the paths of each file from the Data folder but when I try to go into subfolders I want it to be able to go down to the end of the Data file, for example I can read files in a subfolder of the main folder Data which im trying to get a list of all the paths of each file into an array but it doesn't go deeper it does not access the subfolder of the subfolder of Data without writing a loop. Want I want is a loop which has infinit depth of view of files in the Data folder so I can get all the file paths.
For example this is what I get:
['Data/DataReader.py', 'Data/DataReader - Copy.py', 'Data/Dat/DataReader.py', 'Data/fge/er.txt']
This is what I want but it can still go into deeper folders:
['Data/DataReader.py', 'Data/DataReader - Copy.py', 'Data/Dat/DataReader.py', 'Data/fge/er.txt', 'Data/fge/Folder/dummy.png', 'Data/fge/Folder/AnotherFolder/data.dat']
This is my current path, what would i need to add or change?
import os
from os import walk
files = []
folders = []
for (dirname, dirpath, filename) in walk('Data'):
folders.extend(dirpath)
files.extend(filename)
break
filecount = 0
for i in files:
i = 'Data/' + i
files[filecount] = i
filecount += 1
foldercount = 0
for i in folders:
i = 'Data/' + i
folders[foldercount] = i
foldercount += 1
subfolders = []
subf_files = []
for i in folders:
for (dirname, dirpath, filename) in walk(i):
subfolders.extend(dirpath)
subf_files.extend(filename)
break
subf_files_count = 0
for a in subf_files:
a = i + '/'+a
files = files
files.append(a)
print files
subf_files = []
print files
print folders
Thanks a lot!
Don't understand what are your trying to do, especially why you break your walk after the first element:
import os
files = []
folders = []
for (path, dirnames, filenames) in os.walk('Data'):
folders.extend(os.path.join(path, name) for name in dirnames)
files.extend(os.path.join(path, name) for name in filenames)
print files
print folders