Saving Filenames with Condition - python

I'm trying to save the names of files that fulfill a certain condition.
I think the easiest way to do this would make a short Python program that imports and reads the files, checks if the condition is met, and (assuming it is met) then saves the names of the files.
I have data files with just two columns and four rows, something like this:
a: 5
b: 5
c: 6
de: 7
I want to save the names of the files (or part of the name of the files, if that's a simple fix, otherwise I can just sed the file afterwards) of the data files that have the 4th number ([3:1]) greater than 8. I tried importing the files with numpy, but it said it couldn't import the letters in the first column.
Another way I was considering trying to do it was from the command line something along the lines of cat *.dat >> something.txtbut I couldn't figure out how to do that.
The code I've tried to write up to get this to work is:
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing value datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.loadtxt("file", delimiter=' ', usecols=1)
if a[3:0] > 8:
print >> f, filename
f.close()
When I do this, I get an error that says TypeError: 'int' object is not iterable, but I don't know what that's referring to.

I ended up using
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.genfromtxt(file)
if a[3,1] > 8:
f.write(filename + "\n")
f.close()

it is hard to tell exactly what you want but maybe something like this
from glob import glob
from re import findall
fpattern = "/path/to/*.dat"
def test(fname):
with open(fname) as f:
try:
return int(findall("\d+",f.read())[3])>8
except IndexError:
pass
matches = [fname for fname in glob(fpattern) if test(fname)]
print matches

Related

Write new file with digit before extension

I have a several files in a directory with the the following names
example1.txt
example2.txt
...
example10.txt
and a bunch of other files.
I'm trying to write a script that can get all the files with a file name like <name><digit>.txt and then get the one with higher digit (in this case example10.txt) and then write a new file where we add +1 to the digit, that is example11.txt
Right now I'm stuck at the part of selecting the files .txt and getting the last one.
Here is the code
import glob
from natsort import natsorted
files = natsorted(glob.glob('*[0-9].txt'))
last_file = files[-1]
print(files)
print(last_file)
You can use a regular expression to split the file name in the text and number part, increment the number and join everything else together to have your new file name:
import re
import glob
from natsort import natsorted
files = natsorted(glob.glob('*[0-9].txt'))
last_file = files[-1]
base_name, digits = re.match(r'([a-zA-Z]+)([0-9]+)\.txt', last_file).groups()
next_number = int(digits) + 1
next_file_name = f'{base_name}{next_number}.txt'
print(files)
print(last_file)
print(next_file_name)
Note that the regex assumes that the base name of the file has only alpha characters, with no spaces or _, etc. The regex can be extended if needed.
you can use this script, it will work well for your purpose i think.
import os
def get_last_file():
files = os.listdir('./files')
for index, file in enumerate(files):
filename = str(file)[0:str(file).find('.')]
digit = int(''.join([char for char in filename if char.isdigit()]))
files[index] = digit
files.sort()
return files[-1]
def add_file(file_name, extension):
last_digit = get_last_file() +1
with open('./files/' + file_name + str(last_digit) + '.' + extension, 'w') as f:
f.write('0')
# call this to create a new incremental file.
add_file('example', 'txt')
Here's a simple solution.
files = ["example1.txt", "example2.txt", "example3.txt", "example10.txt"]
highestFileNumber = max(int(file[7:-4]) for file in files)
fileToBeCreated = f"example{highestFileNumber+1}.txt"
print(fileToBeCreated)
output:
example11.txt
.txt and example are constants, so theres no sense in looking for patterns. just trim the example and .txt

remove the unwanted columns in data

I have 500 txt files in a folder
data example:
1-1,0-7,10.1023,
1-2,7-8,/,
1-3,8-9,A,
1-4,9-10,:,
1-5,10-23,1020940830716,
I would like to delete the last "," in each line. to :
1-1,0-7,10.1023
1-2,7-8,/
1-3,8-9,A
1-4,9-10,:
1-5,10-23,1020940830716
How do I do that with a for loop to delete them from all of 500 files?
Try using this code:
for fname in filenames:
with open(fname, 'r') as f:
string = f.read().replace(',\n','\n')
with open(fname, 'w') as w:
w.write(string)
I usually do something like this.
Change the folder_path variable
Change the filename_pattern variable. This is just extra in case you have specific file patterns in your folder that you want to consider. You can simply set this variable to (blank) if irrelevant.
Also, the * takes anything that matches the pattern i.e. Book1, Book2, etc. Before running the code print(files) to make sure you have all of the correct files. I am not sure if :
import glob
import os
import pandas
#read files in
folder_path = 'Documents'
filename_pattern = 'Book'
files = glob.glob(f'{folder_path}//{filename_pattern}*.txt')
df = (pd.concat([pd.read_csv(f, header=None)
.assign(filename=os.path.basename(f))
for f in files]))
#read files out
for file, data in df.groupby('filename'):
data.iloc[:,:-2].to_csv(f'{folder_path}/{file}',
index=False,
header=False)

combine multiple text files into one text file?

I'm trying to combine multiple files into one, where each of them contains one column and I need to get one file with two columns and plot the resulted file like (x,y) as follows:
x y result
1 4 1 4
2 5 2 5
3 6 3 6
and run the code for n text files.
How can I do this?
A simple solution assuming that all files have the same number of elements in float format.
import numpy as np
filename_list=['file0.txt', 'file1.txt']#and so on
columns = []
for filename in filename_list:
f=open(filename)
x = np.array([float(raw) for raw in f.readlines()])
columns.append(x)
columns = np.vstack(columns).T
np.savetxt('filename_out.txt', columns)
see also the method savetxt to customize the output
EDIT:
if you have 100 files in a certain directory (let's call it files_dir) you can use the method listdir in the os library, be be careful, since listdir returns directories and files:
import os
filename_list = [f for f in os.listdir(files_dir) if os.path.isfile(f)]
here's a quick-and-dirty solution. I assumed that all files have the exact number of rows. The function write_files gets an input files which is a list of file-paths (strings).
def write_files(files):
opened_files = []
for f in files:
opened_files.append(open(f, "r"))
output_file = open("output.txt", "w")
num_lines = sum(1 for line in opened_files[0])
opened_files[0].seek(0,0)
for i in range(num_lines):
line = [of.readline().rstrip() for of in opened_files]
line = " ".join(line)
line += "\n"
output_file.write(line)
for of in opened_files:
of.close()
output_file.close()
write_files(["1.txt", "2.txt"])

Multiple editing of CSV files

I have a small delay with operating CSV files in python (3.5). Previously I was working with single files and there was no problem, but right now I have >100 files in one folder.
So, my goal is:
To parse all *.csv files in the directory
From each file delete first 6 rows , the files consists of the following data:
"nu(Ep), 2.6.8"
"Date: 2/10/16, 11:18:21 AM"
19
Ep,nu
0.0952645,0.123776,
0.119036,0.157720,
...
0.992060,0.374300,
Save each file separately (for example adding "_edited"), so there should be only numbers saved.
As an option - I have data subdivided on two parts for one material. For example: Ag(0-1_s).csv and Ag(1-4)_s.csv (after steps 1-3 the should be like Ag(*)_edited.csv). How can I merge this two files in a way of adding data from (1-4) to the end of (0-1) saving it in a third file?
My code so far is the following:
import os, sys
import csv
import re
import glob
import fileinput
def get_all_files(directory, extension='.csv'):
dir_list = os.listdir(directory)
csv_files = []
for i in dir_list:
if i.endswith(extension):
csv_files.append(os.path.realpath(i))
return csv_files
csv_files = get_all_files('/Directory/Path/Here')
#Here is the problem with csv's, I dont know how to scan files
#which are in the list "csv_files".
for n in csv_files:
#print(n)
lines = [] #empty, because I dont know how to write it properly per
#each file
input = open(n, 'r')
reader = csv.reader(n)
temp = []
for i in range(5):
next(reader)
#a for loop for here regarding rows?
#for row in n: ???
# ???
input.close()
#newfilename = "".join(n.split(".csv")) + "edited.csv"
#newfilename can be used within open() below:
with open(n + '_edited.csv', 'w') as nf:
writer = csv.writer(nf)
writer.writerows(lines)
This is the fastest way I can think of. If you have a solid-state drive, you could throw multiprocessing at this for more of a performance boost
import glob
import os
for fpath in glob.glob('path/to/directory/*.csv'):
fname = os.basename(fpath).rsplit(os.path.extsep, 1)[0]
with open(fpath) as infile, open(os.path.join('path/to/dir', fname+"_edited"+os.path.extsep+'csv'), 'w') as outfile:
for _ in range(6): infile.readline()
for line in infile: outfile.write(line)

How to read in multiple files separately from multiple directories in python

I have x directories which are Star_{v} with v=0 to x.
I have 2 csv files in each directory, one with the word "epoch", one without.
If one of the csv files has the word "epoch" in it needs to be sent through one set of code, else through another.
I think dictionaries are probably the way to go but this section of the code is a bit of a wrong mess
directory_dict={}
for var in range(0, len(subdirectory)):
#var refers to the number by which the subdirectories are labelled by Star_0, Star_1 etc.
directory_dict['Star_{v}'.format(v=var)]=directory\\Star_{var}
#directory_dict['Star_0'], directory_dict['Star_1'] etc.
read_csv(f) for f in os.listdir('directory_dict[Star_{var}') if f.endswith(".csv")
#reads in all the files in the directories(star{v}) ending in csv.
if 'epoch' in open(read_csv[0]).read():
#if the word epoch is in the csv file then it is
directory_dict[Star_{var}][read] = csv.reader(read_csv[0])
directory_dict[Star_{var}][read1] = csv.reader(read_csv[1])
else:
directory_dict[Star_{var}][read] = csv.reader(read_csv[1])
directory_dict[Star_{var}][read1] = csv.reader(read_csv[0])
when dealing with csvs, you should use the csv module, and for your particular case, you can use a dictreader and parse the headers to check for the column you're looking for
import csv
import os
directory = os.path.abspath(os.path.dirname(__file__)) # change this to your directory
csv_list = [os.path.join(directory, c) for c in os.listdir(directory) if os.path.splitext(c) == 'csv']
def parse_csv_file():
" open CSV and check the headers "
for c in csv_list:
with open(c, mode='r') as open_csv:
reader = csv.DictReader(open_csv)
if 'epoch' in reader.fieldnames:
# do whatever you want here
else:
# do whatever else
then you can extract it from the DictReader's CSV header and do whatever you want
Also your python looks invalid

Categories

Resources