I have the following txt files (10000s) in multiple directories eg.
BaseDirectory\04_April\2019-04-14\UniqeDirectoryName1 (username)\345308457384745637.txt
BaseDirectory\04_April\2019-04-14\UniqeDirectoryName2 (username)\657453456456546543.txt
BaseDirectory\04_April\2019-04-14\UniqeDirectoryName3 (username)\234545743564356774.txt
BaseDirectory\05_May\2019-05-14\UniqeDirectoryName1 (username)\266434564564563565.txt
BaseDirectory\05_May\2019-05-14\UniqeDirectoryName2 (username)\934573845739632048.txt
BaseDirectory\05_May\2019-05-14\UniqeDirectoryName3 (username)\634534534535654501.txt
so in other words in each date folder there are multiple directories that again contains text files.
import os
import re
import csv
for path, subdirs, files in os.walk("E:\\BaseDir\\"):
for name in files:
file_fullinfo = os.path.join(path, name)
path, filename = os.path.split(file_fullinfo)
NoExtension = os.path.splitext(file_fullinfo)[0]
file_noext = str(NoExtension)
file_splitinfo = re.split('\\\\', file_noext, 0)
file_month = file_splitinfo[2]
file_date = file_splitinfo[3]
file_folder = re.sub(r'\([^)]*\)', '', file_splitinfo[4])
file_name = file_splitinfo[5]
file_category = file_folder
My script generates the following..
['E:', 'BaseDirectory', '04_April', '2019-04-09', 'UniqeDirectoryName', '345308457384745637.txt', 'UniqeDirectoryName']
So far so good, writing this to a generic CSV file is also straight forward, but I want to create a new CSV file based on the changing date like this.
E:\BaseDir\2019-04-09.csv
file_folder, file_name, file_category
'UniqeDirectoryName', '543968732948754398','UniqeDirectoryName'
'UniqeDirectoryName', '345308457384745637','UniqeDirectoryName'
'UniqeDirectoryName', '324089734983987439','UniqeDirectoryName'
E:\BaseDir\2019-05-14.csv
file_folder, file_name, file_category
'UniqeDirectoryName', '543968732948754398','UniqeDirectoryName'
'UniqeDirectoryName', '345308457384745637','UniqeDirectoryName'
'UniqeDirectoryName', '324089734983987439','UniqeDirectoryName'
How can I accomplise this can't quite wrap my head a around it, the struggle of being a Python noob is real.. :)
If you can live without the first line as a header row it can be achieved quite simply.
output_file_path = 'D:/output_files/' + file_date + '.csv'
with open(file=output_file_path, mode='a') as csv_file: #open a csv file to write to in append mode
csv_file.write("my data\n")
if you absolutely must have the header then you can test if the file exists first, if it doesn't exist write the header row
import os.path
output_file_path = 'D:/output_files/' + file_date + '.csv'
if not os.path.exists(output_file_path): #open a csv file to write header row if doesn't exist
with open(file=output_file_path, mode='a') as csv_file:
csv_file.write("my header row\n")
with open(file=output_file_path, mode='a') as csv_file: #open a csv file to write to in append mode
csv_file.write("my data\n")
Related
There are few csv files in different folders and sub folders. I need to separate each csv file to incoming and outgoing traffic.
if source == ac:37:43:9b:92:24 && Receiver address ==
8c:15:c7:3a:d0:1a then those rows need to get written to .out.csv
files.
if Transmitter address == 8c:15:c7:3a:d0:1a && Destination==
ac:37:43:9b:92:24 then those rows need to get written into .in.csv
files.
The output files (files that got separated as incoming and outgoing) have to get the same name as input files (eg: if input file is aaa.csv then output files will be aaa.in.csv and aaa.out.csv).
And output files needs to get written into folders and sub folders as input files were.
I tried the below code, but not working.
I am new to programming, so not sure is this code correct or wrong. Any help is greatly appreciated. Thanks
import csv
import os
import subprocess
startdir = '.'
outdir = '.'
suffix = '.csv'
def decode_to_file(cmd, in_file, new_suffix):
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
fileName = outdir + '/' + in_file[len(startdir):-len(suffix)] + new_suffix
os.makedirs(os.path.dirname(fileName), exist_ok=True)
csv_writer = csv.writer(open(fileName, 'w'))
for line_bytes in proc.stdout:
line_str = line_bytes.decode('utf-8')
csv_writer.writerow(line_str.strip().split(','))
for root, dirs, files in os.walk(startdir):
for name in files:
if not name.endswith(suffix):
continue
in_file = os.path.join(root, name)
decode_to_file(
cmd= [if source== ac:37:43:9b:92:24 && Receiver address== 8c:15:c7:3a:d0:1a],
in_file=in_file,
new_suffix='.out.csv'
)
decode_to_file(
cmd= [if Transmitter address == 8c:15:c7:3a:d0:1a && Destination== ac:37:43:9b:92:24],
in_file=in_file,
new_suffix='.in.csv'
)
You could make use of Python's CSV library to process the rows and glob.glob could be used to walk over the files. os.path.splitext() can be used to help with changing the file extension. For example:
import csv
import glob
import os
for filename in glob.glob('**/*.csv', recursive=True):
basename, extension = os.path.splitext(filename)
print(f"Processing - {filename}")
with open(filename, encoding='utf-8') as f_input, \
open(basename + '.in.csv', 'w', newline='', encoding='utf-8') as f_in, \
open(basename + '.out.csv', 'w', newline='', encoding='utf-8') as f_out:
csv_input = csv.reader(f_input)
csv_in = csv.writer(f_in)
csv_out = csv.writer(f_out)
for row in csv_input:
if row[3] == 'ac:37:43:9b:92:24' and row[4] == '8c:15:c7:3a:d0:1a':
csv_out.writerow(row)
if row[5] == '8c:15:c7:3a:d0:1a' and row[6] == 'ac:37:43:9b:92:24':
csv_in.writerow(row)
This assumes that your CSV file are in a standard format e.g. aaa,bbb,ccc,ddd. The csv.reader() will read each line of the file and convert it into a list of values automatically split on the commas. So the first value in each row is row[0].
I'm using the Sniffer class in CSV Reader to determine what a delimiter is in a CSV file and it works on single files but if I add in a loop and point it to a folder with the same CSV in, it throws out this error:
File "delimiter.py", line 17, in read_csv_delimit
reader = csv.reader(csvfile, dialect)
TypeError: "delimiter" must be a 1-character string
The script looks like this:
#!/usr/local/bin/python3
import csv
import os
def read_csv_delimit(file_dir, csv_file):
# Initialise list
file_csv = []
# Open csv & check delimiter
with open(file_dir + "/" + csv_file, newline='', encoding = "ISO-8859-1") as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
for item in reader:
file_csv.append(item[0])
#del file_csv[0]
return file_csv
def split_path(full_path):
#path = path.rstrip(os.sep)
head, tail = os.path.split(full_path)
return (head, tail)
machine_dir = input("Drop the folder here: ")
# Get list of machine csv
machines = os.listdir(machine_dir)
for machine in machines:
print(machine)
#file_dir, csv_file = split_path(csv_file)
machine_list = read_csv_delimit(machine_dir, machine)
print(machine_list)
Given the trace, it seems that your script does pick non-CSV files, indeed. You can use the glob module for fine-tuning the search pattern to pick up only the files you want, but even a simple extension lookup should suffice:
target = input("Drop the folder here: ")
machine_list = [read_csv_delimit(target, m) for m in os.listdir(target) if m[-4:] == ".csv"]
print(machine_list)
Checking for the entered directory validity, tho, is highly recommended, even if it's performed with the simplest os.path.isdir(target).
I'd also recommend you to use os.path facilities to build up your path in the read_csv_delimit() function, e.g.:
with open(os.path.join(file_dir, csv_file), newline='', encoding = "ISO-8859-1") as csvfile:
I have multiple directories, each of which containing any number of .xls files.
I'd like to take the files in any given directory and combine them into one .xls file, using the file names as the tab names.
For example if there are the files NAME.xls, AGE.xls, LOCATION.xls, I'd like to combine them into a new file with the data from NAME.xls on a tab called NAME, the data from AGE.xls on a tab called AGE and so on.
Each source .xls file only has one column of data with no headers.
This is what I have so far, and well it's not working.
Any help would be greatly appreciated (I'm fairly new to Python and I've never had to do anything like this before).
wkbk = xlwt.Workbook()
xlsfiles = glob.glob(os.path.join(path, "*.xls"))
onlyfiles = [f for f in listdir(path) if isfile(join(path, f))]
tabNames = []
for OF in onlyfiles:
if str(OF)[-4:] == ".xls":
sheetName = str(OF)[:-4]
tabNames.append(sheetName)
else:
pass
for TN in tabNames:
outsheet = wkbk.add_sheet(str(TN))
data = pd.read_excel(path + "\\" + TN + ".xls", sheet_name="data")
data.to_excel(path + "\\" + "Combined" + ".xls", sheet_name = str(TN))
Here is a small helper function - it supports both .xls and .xlsx files:
import pandas as pd
try:
from pathlib import Path
except ImportError: # Python 2
from pathlib2 import Path
def merge_excel_files(dir_name, out_filename='result.xlsx', **kwargs):
p = Path(dir_name)
with pd.ExcelWriter(out_filename) as xls:
_ = [pd.read_excel(f, header=None, **kwargs)
.to_excel(xls, sheet_name=f.stem, index=False, header=None)
for f in p.glob('*.xls*')]
Usage:
merge_excel_files(r'D:\temp\xls_directory', 'd:/temp/out.xls')
merge_excel_files(r'D:\temp\xlsx_directory', 'd:/temp/out.xlsx')
Can you try
import pandas as pd
import glob
path = 'YourPath\ToYour\Files\\' # Note the \\ at the end
# Create a list with only .xls files
list_xls = glob.glob1(path,"*.xls")
# Create a writer for pandas
writer = pd.ExcelWriter(path + "Combined.xls", engine = 'xlwt')
# Loop on all the files
for xls_file in list_xls:
# Read the xls file and the sheet named data
df_data = pd.read_excel(io = path + xls_file, sheet_name="data")
# Are the sheet containing data in all your xls file named "data" ?
# Write the data into a sheet named after the file
df_data.to_excel(writer, sheet_name = xls_file[:-4])
# Save and close your Combined.xls
writer.save()
writer.close()
Let me know if it works for you, I never tried engine = 'xlwt' as I don't work with .xls file but .xlsx
I have a csv with two columns Directory and Filename. Each row in the csv shows what directory each file belongs like so
Directory, File Name
DIR18, IMG_42.png
DIR12, IMG_16.png
DIR4, IMG_65.png
So far I have written code that grabs each directory and filename from the csv and then all files at their destination like so:
movePng.py
import shutil
import os
import csv
from collections import defaultdict
columns = defaultdict(list) # each value in each column is appended to a list
with open('/User/Results.csv') as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
source = '/User/PNGItems'
files = os.listdir(source)
for f in files:
pngName = f[:-4]
for filename in columns['File Name']:
fileName = filename[:-4]
if pngName == fileName
# GET THIS POSITION IN columns['File Name'] for columns['Directory']
shutil.move(f, source + '/' + DIRECTORY)
How do I get the index of the columns['File Name'] and grab the corresponding directory out of columns['Directory'] ?
You should read the assignments into a dictionary and then query that:
folder_assignment_file = "folders.csv"
file_folder = dict()
with open(folder_assignment_file, "r") as fh:
reader = csv.reader(fh)
for folder, filename in reader:
file_folder[filename] = folder
And then get the target folder like so: DIRECTORY = file_folder[fileName].
Some other hints:
filename, fileName are not good variable names, this will only lead to hard to find bugs because Python is case sensitive
use os.path.splitext to split the extension off the filename
if not all your files are in one folder the glob module and os.walk might come in handy
Edit:
Creating the dict can be made even nicer like so:
with open(folder_assignment_file, "r") as fh:
reader = csv.reader(fh)
file_folders = {filename: folder for folder, filename in reader}
To solve this I used #Peter Wood suggestion and it worked beautifully. Also I had to modify shutil.
Here is the code below
for f in files:
pngName = f[:-4]
for filename, directory in zip(columns['File Name'], columns['Directory']):
fileName = filename[:-4]
if pngName == fileName:
directoryName = directory[1:]
shutil.move(os.path.join(source, f), source + '/' + directoryName)
I have a text file that doesn't have a standard delimiter. I need to be able to check if the current line is equal to a certain phrase and if it is, the code should use a certain delimiter until another phrase is found. delimiters used are ',' '-',':' and '='.
Please help me out :)
This is what my code is at the moment
import csv
import glob
import os
directory = raw_input("INPUT Folder for Log Dump Files:")
output = raw_input("OUTPUT Folder for .csv files:")
txt_files = os.path.join(directory, '*.txt')
for txt_file in glob.glob(txt_files):
with open(txt_file, "rb") as input_file:
in_txt = csv.reader(input_file, delimiter=':')
filename = os.path.splitext(os.path.basename(txt_file))[0] + '.csv'
with open(os.path.join(output, filename), 'wb') as output_file:
out_csv = csv.writer(output_file)
out_csv.writerows(in_txt)
I cannot speak to the time efficiency of this method, but it might just get what you want done. The basic idea is to create a list to contain the lines of each text file, and then output the list to your new csv file. You save a 'delimiter' variable and then change it by checking each line as you go through the text files.
For example:
I created two text files on my Desktop. They read as follows:
delimiter_test_1.txt
test=delimiter=here
does-it-work
I'm:Not:Sure
delimiter_test_2.txt
This:File:Uses:Colons
Pretty:Much:The:Whole:Time
does-it-work
If-Written-Correctly-yes
I then ran this script on them:
import csv
import glob
import os
directory = raw_input("INPUT Folder for Log Dump Files:")
output = raw_input("OUTPUT Folder for .csv files:")
txt_files = os.path.join(directory, '*.txt')
delimiter = ':'
for txt_file in glob.glob(txt_files):
SavingList = []
with open(txt_file, 'r') as text:
for line in text:
if line == 'test=delimiter=here\n':
delimiter = '='
elif line == 'does-it-work\n':
delimiter = '-'
elif line == "I'm:Not:Sure":
delimiter = ':'
SavingList.append(line.split(delimiter))
with open('%s.csv' %os.path.join(output, txt_file.split('.')[0]), 'wb') as output_file:
writer = csv.writer(output_file)
for m in xrange(len(SavingList)):
writer.writerow(SavingList[m])
And got two csv files with the text split based on the desired delimiter. Depending on how many different lines you have for changing the delimiter you could set up a dictionary of said lines. Then your check becomes:
if line in my_dictionary.keys():
delimiter = my_dictionary[line]
for example.