read all files in sub folder with pandas

read all files in sub folder with pandas - python

My notebook is in the home folder where I also have another folder "test". In the test folder, I have 5 sub folders. Each of the folder contains a .shp file. I want to iterate in all sub folders within test and open all .shp files. It doesn't matter if they get overwritten.
data = gpd.read_file("./test/folder1/file1.shp")
data.head()
How can I do so? I tried this
path = os.getcwd()
files = glob.glob(os.path.join(path + "/test/", "*.shp"))
print(files)
but this would only go in 1 layer deep.

you can use the os.walk method in the os library.
import os
import pandas as pd
for root, dirs, files in os.walk("./test"):
for name in files:
fpath = os.path.join(root, name)
data = pd.read_file(fpath)

Just do os.chdir(path), and then use glob.glob(os.path.join('*.shp')). It should work.
You have already given the string to join 'os.path'.

Related

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?

A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.

I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

I want to add all the names of the files in a specific folder to a list

I want to add the names of all the files in a specific folder to a list how can i do that? the pathway is from dropbox -> a folder called 'UMM' -> a folder called '2018' could someone help me with the code on this. I have tried using os.walk() but it doesn't seem to work

You can use os.walk and append only names which are in files.
from os import walk
file_names = list()
path = 'path/of/folder'
for root, dirc, files in walk(path):
for FileName in files:
file_names.append(FileName)
print(file_names)
This will append all the files name from all the directories and sub-directories of the specified path.

this will create a list of the files in a folder
from os import listdir
# the path
path = ''
fileList = listdir(path)

rename a files within a folder of a folder to its parent folder?

I have a batch of folders that have a name based on the date. Each folder has a folder where they have file names which are all the same.
Is there a way rename the files so they become unique based on the directory structure (which appears is the parent folder (the first folder) which is based on the date) that they are held within.
\user\date\1_2_2019\ABC\0001.csv -> abc_1_2_2019.csv
\user\date\1_3_2019\JKL\0001.csv -> JKL_1_3_2019.csv
\user\date\1_4_2019\XYZ\0001.csv -> XYZ_1_4_2019.csv
\user\date\1_5_2019\123\0001.csv -> 123_1_5_2019.csv
\user\date\1_6_2019\456\0001.csv -> 456_1_6_2019.csv
I know the basic python code to get to all the files is this
import os
for dirname, _, filenames in os.walk('\user\date'):
for filename in filenames:
print(os.path.join(dirname, filename))
But is there a python code to change all the names of the files to add at the very least have the date of the parent file in the beginning name.
thanks in advance!

Here is one-way using pathlib from python 3.4+ and f-strings from python 3.6+
first you need to set your path at the top-level directory, so we can recursively find all the csv files and rename with a simple for loop.
from pathlib import Path
files = Path(r'C:\Users\datanovice\Documents\Excels').rglob('*.csv')
# remove 'r' string if you're on macos.
for file in files:
parent_1 = file.parent.name
parent_2 = file.parent.parent.name
file.rename(Path(file.parent,f"{parent_1}_{parent_2}{file.suffix}"))
print(f"{file.name} --> {parent_1}_{parent_2}{file.suffix}")
#1.csv --> ABC_1_2_2019.csv
#1.csv --> ABC_2_2_2019.csv
result
for f in files:
print(f)
C:\Users\datanovice\Documents\Excels\1_2_2019\ABC\1.csv
C:\Users\datanovice\Documents\Excels\2_2_2019\ABC\1.csv
#after
for f in files:
print(f)
C:\Users\datanovice\Documents\Excels\1_2_2019\ABC\ABC_1_2_2019.csv
C:\Users\datanovice\Documents\Excels\2_2_2019\ABC\ABC_2_2_2019.csv

Iterate over files located in different folders

I’d like to write a function to iterate over excel files that are in different folders. Parts of the path of each file are the same, for instance:
C:\Main\Division\Reports\Year\Data.xls
The only part of each path that changes is ‘Year’. The files all have the same name.
Is there a way to do this with a placeholder for Year? If not, what approach should I take?

You can use os.listdir function
directory = "C:\Main\Division\Reports"
root_dir = os.path.dirname(directory)
for data in os.listdir(directory):
file_name = os.path.join(root_dir, data, 'Data.xls')
# do something

You could try os.walk
import os
parent = "C:\Main\Division\Reports"
for root, directory, files in os.walk(parent):
print root
print directory
print files

How to copy folder structure under another directory?

I have some questions related to copying a folder structure. In fact, I need to do a conversion of pdf files to text files. Hence I have such a folder structure for the place where I import the pdf:
D:/f/subfolder1/subfolder2/a.pdf
And I would like to create the exact folder structure under "D:/g/subfolder1/subfolder2/" but without the pdf file since I need to put at this place the converted text file. So after the conversion function it gives me
D:/g/subfolder1/subfolder2/a.txt
And also I would like to add if function to make sure that under "D:/g/" the same folder structure does not exist before creating.
Here is my current code. So how can I create the same folder structure without the file?
Thank you!
import converter as c
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
with open("D:/g/"+ ,mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

For me the following works fine:
Iterate over existing folders
Build the structure for the new folders based on existing ones
Check, if the new folder structure does not exist
If so, create new folder without files
Code:
import os
inputpath = 'D:/f/'
outputpath = 'D:/g/'
for dirpath, dirnames, filenames in os.walk(inputpath):
structure = os.path.join(outputpath, dirpath[len(inputpath):])
if not os.path.isdir(structure):
os.mkdir(structure)
else:
print("Folder does already exits!")
Documentation:
os.walk
os.mkdir
os.path.isdir

How about using shutil.copytree()?
import shutil
def ig_f(dir, files):
return [f for f in files if os.path.isfile(os.path.join(dir, f))]
shutil.copytree(inputpath, outputpath, ignore=ig_f)
The directory you want to create should not exist before calling this function. You can add a check for that.
Taken from shutil.copytree without files

A minor tweak to your code for skipping pdf files:
for root, dirs, files in os.walk('.', topdown=False):
for name in files:
if name.find(".pdf") >=0: continue
with open("D:/g/"+ ,mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

read all files in sub folder with pandas - python

you can use the os.walk method in the os library. import os import pandas as pd for root, dirs, files in os.walk("./test"): for name in files: fpath = os.path.join(root, name) data = pd.read_file(fpath)

Just do os.chdir(path), and then use glob.glob(os.path.join('*.shp')). It should work. You have already given the string to join 'os.path'.

Related

Python loop through directories

I want to add all the names of the files in a specific folder to a list

rename a files within a folder of a folder to its parent folder?

Iterate over files located in different folders

How to copy folder structure under another directory?

Categories

Resources