Combing multiple csv files from multiple subfolders in one folder - python

Im trying to combine multiple files located in a directory. Each of the the files is located 3 subfolders(each subfolder has another folder or file) down from the main folder and I am unable to combine all of them. The best I can do is combine the ones in each bottom most subfolder. I can get a list of every specific file I want to combine from scanning but I can't combine them. I've gone through several methods and tutorials and can't find a way to do this. The code I have is below:
import pandas as pd
import os
import glob
os.getcwd()
path_of_the_directory = 'C:\\Users\\user\\Downloads\\top_folder'
ext = ('.csv')
for files in os.listdir(path_of_the_directory):
if files.endswith(ext):
print(files)
else:
continue
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
for name in files:
filepath = root + os.sep + name
if filepath.endswith(".csv"):
r.append(os.path.join(root, name))
return r
print(r)
files = []
for file in r:
#for dir, dir_name, file_list in os.walk(path):
files.append(os.path.join(path,file))
combined_df = pd.concat([pd.read_csv(file) for file in files])
df = pd.concat([pd.read_csv(f) for f in files])
df.to_csv("merged.csv")
print(files)
list_files(data_dir)
data_dir = r'C:\\Users\\user\\Downloads\top_folder'
sub_folders = os.listdir(data_dir)
sub_folders
path = os.path.join(data_dir, sub_folders[2])
os.chdir(path)
files = glob.glob(path + ".\*\*.csv")
files
df = pd.concat([pd.read_csv(f) for f in chat_files])
df.to_csv("merged.csv")
Any help or direction would be extremely appreciated.

Related

How to traverse through all subfolders inside a folder for renaming using glob function? [duplicate]

I have a folder structure:
I am using os.walk(path) to get all the files from the "test" folder. I would like to all files except the folder "B" and the files inside it.
test (root-folder)
t1.txt
t2.txt
A
f.txt
B
f1.txt
C
f4.txt
list1 = ['A', 'C']
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(path) for f in filenames if os.path.splitext(f)[1] == '.txt']
for items in result:
for fname in list1:
if fname in items.lower():
result.remove(items)
print(result)
I tried it, but it takes only the A and C. Not the files in main folder? Can you help? Where am i wrong?
Thank you
Possible solution is to use glob library:
import glob
dir_to_exclude = ['B', 'C']
files = glob.glob('**/*.txt', recursive=True)
files_paths = [_ for _ in files if _.split("\\")[0] not in dir_to_exclude]
files_names = [_.split("\\")[-1] for _ in files if _.split("\\")[0] not in dir_to_exclude]
print(f'List of file names with path: {files_paths}')
print(f'List of file names: {files_names}')
I think this should work
file_paths = []
forbidden_path = GetForbiddenPath()
for root, dirs, files in os.walk(path):
for name in files:
file_path = os.path.join(root, name)
if forbidden_path in file_path:
if os.path.splitext(file_path)[1] == '.txt':
file_paths += [file_path]

python get files recursively

I have a folder structure:
I am using os.walk(path) to get all the files from the "test" folder. I would like to all files except the folder "B" and the files inside it.
test (root-folder)
t1.txt
t2.txt
A
f.txt
B
f1.txt
C
f4.txt
list1 = ['A', 'C']
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(path) for f in filenames if os.path.splitext(f)[1] == '.txt']
for items in result:
for fname in list1:
if fname in items.lower():
result.remove(items)
print(result)
I tried it, but it takes only the A and C. Not the files in main folder? Can you help? Where am i wrong?
Thank you
Possible solution is to use glob library:
import glob
dir_to_exclude = ['B', 'C']
files = glob.glob('**/*.txt', recursive=True)
files_paths = [_ for _ in files if _.split("\\")[0] not in dir_to_exclude]
files_names = [_.split("\\")[-1] for _ in files if _.split("\\")[0] not in dir_to_exclude]
print(f'List of file names with path: {files_paths}')
print(f'List of file names: {files_names}')
I think this should work
file_paths = []
forbidden_path = GetForbiddenPath()
for root, dirs, files in os.walk(path):
for name in files:
file_path = os.path.join(root, name)
if forbidden_path in file_path:
if os.path.splitext(file_path)[1] == '.txt':
file_paths += [file_path]

Create a list of all file names and their file extension in a directory

I am trying to create a dataset using pd.DataFrame to store file name and file extension of all the files in my directory. I eventually want to have two variables named Name and Extension. The name variable will have a list of file names and the extension variable should have a file type such as xlsx, and png.
I am new to python and was only able to get to this. This gives me a list of file names but I don't know how to incorporate the file extension part. Could anyone please help?
List = pd.DataFrame()
path = 'C:/Users/documnets/'
filelist = []
filepath = []
# r=root, d=directories, f = files
for subdir, dirs, files in os.walk(path):
for file in files:
filelist.append(file)
filename, file_extension = os.path.splitext('/path/to/somefile.xlsx')
filepath.append(file_extension)
List = pd.DataFrame(flielist, filepath)
Also, for this part: os.path.splitext('/path/to/somefile.xlsx'), can I leave what's in the parenthesis as it is or should I replace with my directory path?
Thank you
You can do this:
import os
import pandas as pd
path = 'C:/Users/documnets/'
filename = []
fileext = []
for file in os.listdir(path):
name, ext = file.split('.')
filename.append(name)
fileext.append(ext)
columns = ["Name", "Extension"]
data = [filename, fileext]
df = pd.DataFrame(data, columns).transpose()

Python: How to get the full path of a file in order to move it?

I had files that were in zips. I unzipped them with Zip-7 so they are in folders with the zip file names.
Each of these folders has either a .otf or .ttf (some have both) that I want out of them and moved to another folder.
I have tried a few methods of getting the full path of the files but every one of them leaves out the folder that the file is actually in.
Here is my latest try:
import os
import shutil
from pathlib import Path
result = []
for root, dirs, files in os.walk("."):
for d in dirs:
continue
for f in files:
if f.endswith(".otf"):
print(f)
p = Path(f).absolute()
parent_dir = p.parents[1]
p.rename(parent_dir / p.name)
elif f.endswith(".ttf"):
print(f)
p = Path(f).absolute()
parent_dir = p.parents[1]
p.rename(parent_dir / p.name)
else:
continue
Other attempts:
# parent_dir = Path(f).parents[1]
# shutil.move(f, parent_dir)
#print("OTF: " + f)
# fn = f
# f = f[:-4]
# f += '\\'
# f += fn
# result.append(os.path.realpath(f))
#os.path.relpath(os.path.join(root, f), "."))
I know this is something simple but I just can't figure it out. Thanks!
You should join the file name with the path name root:
for root, dirs, files in os.walk("."):
for d in dirs:
continue
for f in files:
if f.endswith(".otf"):
p = Path(os.path.join(root, f)).absolute()
parent_dir = p.parents[1]
p.rename(parent_dir / p.name)
elif f.endswith(".ttf"):
p = Path(os.path.join(root, f)).absolute()
parent_dir = p.parents[1]
p.rename(parent_dir / p.name)
else:
continue
for root, dirs, files in os.walk(".")
for d in dirs:
continue
for f in files:
print(os.path.abspath(f))
You can use os.path.abspath() to get a path of a full file
You would also need to still filter for the certain file types.

Getting paths of each file of a directory into an Array in python

Im trying to put into an array files[] the paths of each file from the Data folder but when I try to go into subfolders I want it to be able to go down to the end of the Data file, for example I can read files in a subfolder of the main folder Data which im trying to get a list of all the paths of each file into an array but it doesn't go deeper it does not access the subfolder of the subfolder of Data without writing a loop. Want I want is a loop which has infinit depth of view of files in the Data folder so I can get all the file paths.
For example this is what I get:
['Data/DataReader.py', 'Data/DataReader - Copy.py', 'Data/Dat/DataReader.py', 'Data/fge/er.txt']
This is what I want but it can still go into deeper folders:
['Data/DataReader.py', 'Data/DataReader - Copy.py', 'Data/Dat/DataReader.py', 'Data/fge/er.txt', 'Data/fge/Folder/dummy.png', 'Data/fge/Folder/AnotherFolder/data.dat']
This is my current path, what would i need to add or change?
import os
from os import walk
files = []
folders = []
for (dirname, dirpath, filename) in walk('Data'):
folders.extend(dirpath)
files.extend(filename)
break
filecount = 0
for i in files:
i = 'Data/' + i
files[filecount] = i
filecount += 1
foldercount = 0
for i in folders:
i = 'Data/' + i
folders[foldercount] = i
foldercount += 1
subfolders = []
subf_files = []
for i in folders:
for (dirname, dirpath, filename) in walk(i):
subfolders.extend(dirpath)
subf_files.extend(filename)
break
subf_files_count = 0
for a in subf_files:
a = i + '/'+a
files = files
files.append(a)
print files
subf_files = []
print files
print folders
Thanks a lot!
Don't understand what are your trying to do, especially why you break your walk after the first element:
import os
files = []
folders = []
for (path, dirnames, filenames) in os.walk('Data'):
folders.extend(os.path.join(path, name) for name in dirnames)
files.extend(os.path.join(path, name) for name in filenames)
print files
print folders

Categories

Resources