Multiple editing of CSV files - python

I have a small delay with operating CSV files in python (3.5). Previously I was working with single files and there was no problem, but right now I have >100 files in one folder.
So, my goal is:
To parse all *.csv files in the directory
From each file delete first 6 rows , the files consists of the following data:
"nu(Ep), 2.6.8"
"Date: 2/10/16, 11:18:21 AM"
19
Ep,nu
0.0952645,0.123776,
0.119036,0.157720,
...
0.992060,0.374300,
Save each file separately (for example adding "_edited"), so there should be only numbers saved.
As an option - I have data subdivided on two parts for one material. For example: Ag(0-1_s).csv and Ag(1-4)_s.csv (after steps 1-3 the should be like Ag(*)_edited.csv). How can I merge this two files in a way of adding data from (1-4) to the end of (0-1) saving it in a third file?
My code so far is the following:
import os, sys
import csv
import re
import glob
import fileinput
def get_all_files(directory, extension='.csv'):
dir_list = os.listdir(directory)
csv_files = []
for i in dir_list:
if i.endswith(extension):
csv_files.append(os.path.realpath(i))
return csv_files
csv_files = get_all_files('/Directory/Path/Here')
#Here is the problem with csv's, I dont know how to scan files
#which are in the list "csv_files".
for n in csv_files:
#print(n)
lines = [] #empty, because I dont know how to write it properly per
#each file
input = open(n, 'r')
reader = csv.reader(n)
temp = []
for i in range(5):
next(reader)
#a for loop for here regarding rows?
#for row in n: ???
# ???
input.close()
#newfilename = "".join(n.split(".csv")) + "edited.csv"
#newfilename can be used within open() below:
with open(n + '_edited.csv', 'w') as nf:
writer = csv.writer(nf)
writer.writerows(lines)

This is the fastest way I can think of. If you have a solid-state drive, you could throw multiprocessing at this for more of a performance boost
import glob
import os
for fpath in glob.glob('path/to/directory/*.csv'):
fname = os.basename(fpath).rsplit(os.path.extsep, 1)[0]
with open(fpath) as infile, open(os.path.join('path/to/dir', fname+"_edited"+os.path.extsep+'csv'), 'w') as outfile:
for _ in range(6): infile.readline()
for line in infile: outfile.write(line)

Related

remove the unwanted columns in data

I have 500 txt files in a folder
data example:
1-1,0-7,10.1023,
1-2,7-8,/,
1-3,8-9,A,
1-4,9-10,:,
1-5,10-23,1020940830716,
I would like to delete the last "," in each line. to :
1-1,0-7,10.1023
1-2,7-8,/
1-3,8-9,A
1-4,9-10,:
1-5,10-23,1020940830716
How do I do that with a for loop to delete them from all of 500 files?
Try using this code:
for fname in filenames:
with open(fname, 'r') as f:
string = f.read().replace(',\n','\n')
with open(fname, 'w') as w:
w.write(string)
I usually do something like this.
Change the folder_path variable
Change the filename_pattern variable. This is just extra in case you have specific file patterns in your folder that you want to consider. You can simply set this variable to (blank) if irrelevant.
Also, the * takes anything that matches the pattern i.e. Book1, Book2, etc. Before running the code print(files) to make sure you have all of the correct files. I am not sure if :
import glob
import os
import pandas
#read files in
folder_path = 'Documents'
filename_pattern = 'Book'
files = glob.glob(f'{folder_path}//{filename_pattern}*.txt')
df = (pd.concat([pd.read_csv(f, header=None)
.assign(filename=os.path.basename(f))
for f in files]))
#read files out
for file, data in df.groupby('filename'):
data.iloc[:,:-2].to_csv(f'{folder_path}/{file}',
index=False,
header=False)

Python: concatenating text files

Using Python, I'm seeking to iteratively combine two set of txt files to create a third set of txt files.
I have a directory of txt files in two categories:
text_[number].txt (eg: text_0.txt, text_1.txt, text_2.txt....text_20.txt)
comments_[number].txt (eg: comments_0.txt, comments_1.txt, comments_2.txt...comments_20.txt).
I'd like to iteratively combine the text_[number] files with the matching comments_[number] files into a new file category feedback_[number].txt. The script would combine text_0.txt and comments_0.txt into feedback_0.txt, and continue through each pair in the directory. The number of text and comments files will always match, but the total number of text and comment files is variable depending on preceding scripts.
I can combine two pairs using the code below with a list of file pairs:
filenames = ['text_0.txt', 'comments_0.txt']
with open("feedback_0.txt", "w") as outfile:
for filename in filenames:
with open(filename) as infile:
contents = infile.read()
outfile.write(contents)
However, I'm uncertain how to structure iteration for the rest of the files. I'm also curious how to generate lists from the contents of the file directory. Any advice or assistance on moving forward is greatly appreciated.
It would be far simpler (and possibly faster) to just fork a cat process:
import subprocess
n = ... # number of files
for i in range(n):
with open(f'feedback_{i}.txt', 'w') as f:
subprocess.run(['cat', 'text_{i}.txt', 'comments_{i}.txt'], stdout=f)
Or, if you already have lists of the file names:
for text, comment, feedback in zip(text_files, comment_files, feedback_files):
with open(feedback, 'w') as f:
subprocess.run(['cat', text, comment], stdout=f)
Unless these are all extremely small files, the cost of reading and writing the bytes will outweigh the cost of forking a new process for each pair.
Maybe not the most elegant but...
length = 10
txt = [f"text_{n}.txt" for n in range(length)]
com = [f"comments_{n}.txt" for n in range(length)]
feed = [f"feedback_{n}.txt" for n in range(length)]
for f, t, c in zip(feed, txt, com):
with open(f, "w") as outfile:
with open(t) as infile1:
contents = infile1.read()
outfile.write(contents)
with open(c) as infile2:
contents = infile2.read()
outfile.write(contents)
There are many ways to achieve this, but I don't seem to see any solution that's both beginner-friendly and takes into account the structure of the files you described.
You can iterate through the files, and for every text_[num].txt, fetch the corresponding comments_[num].txt and write to feedback_[num].txt as shown below. There's no need to add any counters or make any other assumptions about the files that might not always be true:
import os
srcpath = 'path/to/files'
for f in os.listdir(srcpath):
if f.startswith('text'):
index = f[5:-4] # extract the [num] part
# Build the paths to text, comment, feedback files
txt_path = os.path.join(srcpath, f)
cmnt_path = os.path.join(srcpath, f'comments_{index}.txt')
fb_path = os.path.join(srcpath, f'feedback_{index}.txt')
# write to output – reading in in byte mode following chepner's advice
with open(fb_path, 'wb') as outfile:
outfile.write(open(txt_path, 'rb').read())
outfile.write(open(cmnt_path, 'rb').read())
The simplest way would probably be to just iterate from 1 onwards, stopping at the first missing file. This works assuming that your files are numbered in increasing order and with no gaps (e.g. you have 1, 2, 3 and not 1, 3).
import os
from itertools import count
for i in count(1):
t = f'text_{i}.txt'
c = f'comments_{i}.txt'
if not os.path.isfile(t) or not os.path.isfile(c):
break
with open(f'feedback_{i}.txt', 'wb') as outfile:
outfile.write(open(t, 'rb').read())
outfile.write(open(c, 'rb').read())
You can try this
filenames = ['text_0.txt', 'comments_0.txt','text_1.txt', 'comments_1.txt','text_2.txt', 'comments_2.txt','text_3.txt', 'comments_3.txt']
for i,j in enumerate (zip(filenames[::2],filenames[1::2])):
with open(f'feedback_{i}','a+') as file:
for k in j:
with open(k,'r') as f:
files=f.read()
file.write(files)
I have taken a list here. Instead, you can do
import os
filenames=os.listdir('path/to/folder')

getting a specific row of multiple csv's and writing to a new csv

Have had a good search but can't quite find what I'm looking for. I have a number of csv files printed by a CFD simulation. The goal of my python script was to:
get the final row of each csv and
add the rows to a new file with the filename added to the start of each row
Currently I have
if file.endswith(".csv"):
with open(file, 'r') as f:
tsrNum = file.translate(None, '.csv')
print(tsrNum + ', ' + ', '.join(list(csv.reader(f))[-1]))
Which prints the correct values into the terminal, but I have to manually and paste it into a new file.
Can somebody help with the last step? I'm not familiar enough with the syntax of python, certainly on my to-do list once I finish this CFD project as so far it's been fantastic when I've managed to implement it correctly. I tried using loops and csv.dictWriter, but with little success.
EDIT
I couldn't get the posted solution working. Here's the code a guy helped me make
import csv
import os
import glob
# get a list of the input files, all CSVs in current folder
infiles = glob.glob("*.csv")
# open output file
ofile = open('outputFile.csv', "w")
# column names
fieldnames = ['parameter','time', 'cp', 'cd']
# access it as a dictionary for easy access
writer = csv.DictWriter(ofile, fieldnames=fieldnames)
# output file header
writer.writeheader()
# iterate through list of input files
for ifilename in infiles:
# open input file
ifile = open(ifilename, "rb+")
# access it as a dictionary for easy access
reader = csv.DictReader(ifile)
# get the rows in reverse order
rows = list(reader)
rows.reverse()
# get the last row
row = rows[0]
# output row to output csv
writer.writerow({'parameter': ifilename.translate(None, '.csv'), 'time': row['time'], 'cp': row['cp'], 'cd': row['cd']})
# close input file
ifile.close()
# close output file
ofile.close()
Split your problem in smaller pieces:
looping over directory
getting last line
writing to your new csv.
I have tried to be very verbose, so that you should try to do something like this:
import os
def get_last_line_of_this_file(filename):
with open(filename) as f:
for line in f:
pass
return line
def get_all_csv_filenames(directory):
for filename in os.listdir(directory):
if filename.endswith('.csv'):
yield filename
def write_your_new_csv_file(new_filename):
with open(new_filename, 'w') as writer:
for filename in get_all_csv_filenames('now the path to your dir'):
last_line = get_last_line_of_this_file(filename)
writer.write(filename + ' ' + last_line)
if __name__ == '__main__':
write_your_new_csv_file('your_created_filename.csv')

update file in a subdirectory from a list- filename gotten from csv

I've got a python script that pulls a filename from a csv file, and updates that file by adding a value to a field within the file. My problem is the file I need to update is actually in a subdirectory, with a folder name completely unrelated to the file I need to update.
my .csv list is like this:
file1, fieldx, value
file2, fieldx, value
and the files are in folders like this:
abcd/file1
efgh/file2
How can I update my code to find the file within the folder? I'm really new to Python, and I know it involves either glob, glob2, or os.walk, but I'm not sure how to nest / loop since I'm pulling the filename value from the .csv.
Here's my code:
import csv
startfile = raw_input("Please enter the name of the csv file: ")
with open(startfile, 'r') as f:
reader = csv.reader(f)
changelist = list(reader)
for x in changelist:
linnum = 0
fname=x[0]+".xml"
fieldlookup = x[1]
with open(fname) as f:
for num, line in enumerate(f, 1):
if fieldlookup in line:
linnum = num
f.close()
f = open(fname, 'r')
lines = f.readlines()
if linnum > 0:
lines[linnum-1] = " <"+fieldlookup+">"+str(x[2])+"</"+fieldlookup+">\n"
f.close()
f = open(fname, 'w')
f.writelines(lines)
f.close()
print "success!"+str(x[0])+"\n"
If there exist only one level of subdirs, you should be able to do it just by:
import glob
...
fname=x[0]+".xml"
real_fname=glob.glob("./*/"+fname)
And use real_fname afterwards. If you need cross platform compatibility you could use also os.path.join instead of +.

Saving Filenames with Condition

I'm trying to save the names of files that fulfill a certain condition.
I think the easiest way to do this would make a short Python program that imports and reads the files, checks if the condition is met, and (assuming it is met) then saves the names of the files.
I have data files with just two columns and four rows, something like this:
a: 5
b: 5
c: 6
de: 7
I want to save the names of the files (or part of the name of the files, if that's a simple fix, otherwise I can just sed the file afterwards) of the data files that have the 4th number ([3:1]) greater than 8. I tried importing the files with numpy, but it said it couldn't import the letters in the first column.
Another way I was considering trying to do it was from the command line something along the lines of cat *.dat >> something.txtbut I couldn't figure out how to do that.
The code I've tried to write up to get this to work is:
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing value datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.loadtxt("file", delimiter=' ', usecols=1)
if a[3:0] > 8:
print >> f, filename
f.close()
When I do this, I get an error that says TypeError: 'int' object is not iterable, but I don't know what that's referring to.
I ended up using
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.genfromtxt(file)
if a[3,1] > 8:
f.write(filename + "\n")
f.close()
it is hard to tell exactly what you want but maybe something like this
from glob import glob
from re import findall
fpattern = "/path/to/*.dat"
def test(fname):
with open(fname) as f:
try:
return int(findall("\d+",f.read())[3])>8
except IndexError:
pass
matches = [fname for fname in glob(fpattern) if test(fname)]
print matches

Categories

Resources