Writing a user specified folder structure to csv in Python

Writing a user specified folder structure to csv in Python - python

I am trying to write a python script that will take in a user specified directory and then write the contents of that directory to a csv file. The challenge I am facing is to write the folder structure to the csv in a certain format. The format I need is shown below:
I've tried using os.walk(dir) to list the directories and files but I'm having trouble writing to the csv in the above format.
I've also found some code that creates a nested dictionary out of a given folder structure but I'm finding it very difficult to navigate through this nested structure and write the rows in the way that I need.
If anyone has an easier approach to accomplishing this task it would be much appreciated.

Ok, here is the code that does the job:
import os
import csv
data = []
for root, dirs, files in os.walk('.'):
for f in files:
l = root.split('/') + [f]
data.append(l)
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
for d in data:
writer.writerow(d)
If you're on windows, replace split('/') with split('\\')

Related

Edit multiple text files, and save as new files

My first post on StackOverflow, so please be nice. In other words, a super beginner to Python.
So I want to read multiple files from a folder, divide the text and save the output as a new file. I currently have figured out this part of the code, but it only works on one file at a time. I have tried googling but can't figure out a way to use this code on multiple text files in a folder and save it as "output" + a number, for each file in the folder. Is this something that's doable?
with open("file_path") as fReader:
corpus = fReader.read()
loc = corpus.find("\n\n")
print(corpus[:loc], file=open("output.txt","a"))

Possibly work with a list, like:
from pathlib import Path
source_dir = Path("./") # path to the directory
files = list(x for x in filePath.iterdir() if x.is_file())
for i in range(len(files)):
file = Path(files[i])
outfile = "output_" + str(i) + file.suffix
with open(file) as fReader, open(outfile, "w") as fOut:
corpus = fReader.read()
loc = corpus.find("\n\n")
fOut.write(corpus[:loc])
** sorry for multiple editting....

welcome to the site. Yes, what you are asking above is completely doable and you are on the right track. You will need to do a little research/practice with the os module which is highly useful when working with files. The two commands that you will want to research a bit are:
os.path.join()
os.listdir()
I would suggest you put two folders within your python file, one called data and the other called output to catch the results. Start and see if you can just make the code to list all the files in your data directory, and just keep building that loop. Something like this should list all the files:
# folder file lister/test writer
import os
source_folder_name = 'data' # the folder to be read that is in the SAME directory as this file
output_folder_name = 'output' # will be used later...
files = os.listdir(source_folder_name)
# get this working first
for f in files:
print(f)
# make output folder names and just write a 1-liner into each file...
for f in files:
output_filename = f.split('.')[0] # the part before the period
output_filename += '_output.csv'
output_path = os.path.join(output_folder_name, output_filename)
with open(output_path, 'w') as writer:
writer.write('some data')

can't zip files from Jupyter Notebook

I try to zip files, I used the example from https://thispointer.com/python-how-to-create-a-zip-archive-from-multiple-files-or-directory/
with ZipFile('sample2.zip', 'w') as zipObj2:
# Add multiple files to the zip
zipObj2.write('sample_file.csv')
sample2.zip is created, but it is empty. Of course that the csv file exists and is not empty.
I run this code from Jupyter Notebook
edit: I'm using relative paths -
input_dir = "../data/example/"
with zipfile.ZipFile(os.path.join(input_dir, 'f.zip'), 'a') as zipObj2:
zipObj2.write(os.path.join(input_dir, 'f.tif'))

you tried to close zip file to save ?
from zipfile import ZipFile
with ZipFile('sample2.zip', 'w') as zipObj2:
zipObj2.write('sample_file.csv')
zipObj2.close()

I'm a little confused by your question, but if I'm correct it sounds like you're trying to place multiple CSV files within a single zipped file? If so, this is what you're looking for:
#initiate files variable that contains the directory from which you wish to zip csv files
files=[f for f in os.listdir("./your_directory") if f.endswith('.csv')]
#initalize empty DataFrame
all_data = pd.DataFrame()
#iterate through the files variable and concatenate them to all_data
for file in files:
df = pd.read_csv('./your_directory' + file)
all_data = pd.concat([all_data, df])
Then call your new DataFrame(all_data) to verify that contents were transferred.

i have multiple txt files in 1 folder and I need to insert the txt data into MySql table using python

I have Multiple txt file in a folder. I need to insert the data from the txt file into mySql table
I also need to sort the files by modified date before inserting the data into the sql table named TAR.
below is the file inside one of the txt file. I also need to remove the first character in every line
SSerial1234
CCustomer
IDivision
Nat22
nAembly
rA0
PFVT
fchassis1-card-linec
RUnk
TP
Oeka
[06/22/2020 10:11:50
]06/22/2020 10:27:22
My code only reads all the files in the folder and prints the contents of the file. im not sure how to sort the files before reading the files 1 by 1.
Is there also a way to read only a specific file (JPE*.log)
import os
for path, dirs, files in os.walk("C:\TAR\TARS_Source/"):
for f in files:
fileName = os.path.join(path, f)
with open(fileName, "r") as myFile:
print(myFile.read())

Use glob.glob method to get all files using a regex like following...
import glob
files=glob.glob('./JPE*.log')
And you can use following to sort files
sorted_files=sorted(files)

Merge CSV files in ADLS2 that are prepared through DataBricks

While running DataBricks code and preparing CSV files and loading them into ADLS2, the CSV files are split into many CSV files and are being loaded into ADLS2.
Is there a way to merge these CSV files in ADLS2 thru pyspark.
Thanks

Is there a way to merge these CSV files in ADLS2 thru pyspark.
As i know,spark dataframe does makes the files separately.Theoretically,you could use spark.csv method which could accept list of strings as parameters.
>>> df = spark.read.csv('path')
Then use df.toPandas().to_csv() method to write objects into pandas dataframe.You could refer to some clues from this case:Azure Data-bricks : How to read part files and save it as one file to blob?.
However,i'm afraid that this process could not hold such high memory consumption. So,i'd suggest you just using os package to do the merge job directly.I tested below 2 snippet of code for your reference.
1st:
import os
path = '/dbfs/mnt/test/'
file_suffix = '.csv'
filtered_files = [file for file in files if file.endswith(file_suffix)]
print(filtered_files)
with open(path + 'final.csv', 'w') as final_file:
for file in filtered_files:
with open(file) as f:
lines = f.readlines()
final_file.writelines(lines[1:])
2rd:
import os
path = '/dbfs/mnt/test/'
file_suffix = '.csv'
filtered_files = [os.path.join(root, name) for root, dirs, files in os.walk(top=path , topdown=False) for name in files if name.endswith(file_suffix)]
print(filtered_files)
with open(path + 'final2.csv', 'w') as final_file:
for file in filtered_files:
with open(file) as f:
lines = f.readlines()
final_file.writelines(lines[1:])
The second one is compatible hierarchy.
In additional, i provide a way here which is using ADF copy activity to transfer multiple csv files into one file in ADLS gen2.
Please refer to this doc and configure the folder path in ADLS gen2 source dataset.Then set MergeFiles with copyBehavior property.(Besides, you could use wildFileName like *.csv to exclude files which you don't want to touch in the specific folder)
Merges all files from the source folder to one file. If the file name
is specified, the merged file name is the specified name. Otherwise,
it's an autogenerated file name.

Executing the script one time in each folder

I need to make a script that executes a script one time in each folder of a directory.
Script in question:
f = open('OrderEXAMPLE.txt', 'r')
data = f.readlines()
mystr = ",".join([line.strip() for line in data])
with open('CSV.csv', 'w') as f2:
f2.write(mystr)
With this script, it changes a list of customer data into csv form.
Each order form has its own folder, so my intial thought was to put the same script into each folder. From there, write another script that executes each script simultaneously.
Folder structure is like so:
Order_forms
--Order_123
-----Order_form
--Order_124
-----Order_form
Amateur at python, so advice is needed and appreciated.

Just walk the directory structure with one script. This will write a separate CSV for each file with the name <original_filename>_CSV.csv. Without more clarity on the desired output nor knowing what the data looks like I can't help much more. You should be able to tweak this for whatever you need.
import os
parent_folder = 'Order_forms'
for root, dirs, files in os.walk(parent_folder):
for f in files:
with open(os.path.join(root, f), 'r') as f1:
data = f1.readlines()
mystr = ",".join([line.strip() for line in data])
with open(os.path.join(root, '{}_CSV.csv'.format(f)), 'w') as f2:
f2.write(mystr)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing a user specified folder structure to csv in Python - python

Related

Edit multiple text files, and save as new files

can't zip files from Jupyter Notebook

i have multiple txt files in 1 folder and I need to insert the txt data into MySql table using python

Merge CSV files in ADLS2 that are prepared through DataBricks

Executing the script one time in each folder

Categories

Resources