Find filename with highest integer in name [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
Summary
If FCR_Network_Coordinates_0 and FCR_Network_Coordinates_2 exists, it should write to file FCR_Network_Coordinates_3 and not to FCR_Network_Coordinates_1
Details
I have the following problem:
I want to write a new csv file, if it does not exist and increase the extension number if some file was found in directory. But if as an example a file with number extension "1" exists, and one with "3", but none with "2", it should write the next file with "4". So it should add 1 to the highest number extension
My code so far is:
index = 0
while os.path.exists('../FCR_Network_Coordinates_'+ str(index) + '.csv'):
index+=1
with open('../FCR_Network_Coordinates_'+str(index)+'.csv', 'wb') as csv_file:
writer = csv.writer(csv_file, delimiter=";")
for key, value in sparse1.items():
writer.writerow(['{:.1f}'.format(t) for t in key]+value)
EDIT
It should also work for paths where parameters are added in path name
"../FCR_Network_Coordinates_"+"r_"+radius+"x_"+x+"y_"+y+"z_"‌​+z+"fcr_"+fcr_size+"‌​_"+new_number+".csv"
could look like:
FCR_Network_Coordinates_radius_3_x_0.3_y_0.3_z_2_fcr_2_1.csv
EDIT2
Furthermore if there are other parameters in the filename it should not look to the highest number of all files, but of the highest number of that file that have these parameters too

Your code will stop searching at file "2" (if "2" does not exist) even if there are files "3" & "4"
You need to use glob to get all files that match your pattern
import glob
import re
files=glob.glob("../FCR_Network_Coordinates_*.csv")
Next remove all the nondigits from your file names
file_nums=[]
for i, s in enumerate(files):
num_str = re.search("(\d+).csv$", files[i]) #capture only integer before ".csv" and EOL
file_nums.append(parseInt(num_str.group(1))) #convert to number
new_number=max(file_nums)+1 #find largest and increment
Sort the list of files to find which has the highest number.

Something like the following should work for you:
import glob
import os
# .....
existing_matches = glob.glob('../FCR_Network_Coordinates_*.csv')
if existing_matches:
used_numbers = []
for f in existing_matches:
try:
file_number = int(os.path.splitext(os.path.basename(f))[0].split('_')[-1])
used_numbers.append(file_number)
except ValueError:
pass
save_number = max(used_numbers) + 1
else:
save_number = 1
with open('../FCR_Network_Coordinates_{}.csv'.format(save_number), 'wb') as csv_file:
writer = csv.writer(csv_file, delimiter=";")
for key, value in sparse1.items():
writer.writerow(['{:.1f}'.format(t) for t in key] + value)
glob finds all files with names similar to your pattern, where * is used as a wildcard.
We then use os.path to manipulate each filename and work out what the number in the name is:
os.path.basename() gets us just the filename - e.g. 'FCR_Network_Coordinates_1.csv'
os.path.splitext() splits the file name ('FCR_Network_Coordinates_1') from the extension ('.csv'). Taking the element at index 0 gets us the filename rather than the extension
splitting this based on '_' splits this every time there is an '_' - resulting in a list of ['FCR', 'Network', 'Coordinates', '1']. Taking the index -1 gets us the last entry in this list, i.e. the 1.
we have to wrap this as an int() to be able to apply numeric operations to it.
We also catch an error in case there is some filename using letters rather than numbers after the underscore. Then, we take the max of the numbers found and add one. If no numbers have been found, we use 1 for the filename.
EDIT:
In response to the question update, we just need to alter our glob and the final name we write to - the glob changes to:
existing_matches = glob.glob('../FCR_Network_Coordinates_r_{}_x_{}_y_{}_z_{}_fcr_{}_*.csv'.format(
radius, x, y, z, fcr_size))
and the file opening line changes to:
with open('../FCR_Network_Coordinates_r_{}_x_{}_y_{}_z_{}_fcr_{}_{}.csv'.format(
radius, x, y, z, fcr_size, save_number), 'wb') as csv_file:

Related

Python: Generate a new unique filename from a given filename [duplicate]

This question already has answers here:
How do I create an incrementing filename in Python?
(13 answers)
Closed 2 years ago.
Suppose that I have a textfile text.txt. In Python, I read it to lines (which is a str object), make some changes to lines, and after that I want to save lines to a new file in the same directory. The new filename should be
text.new.txt if it doesn't exist already,
text.new.2.txt if text.new.txt already exists,
text.new.3.txt if text.new.txt and text.new.2.txt already exist,
etc.
I came up with the following solution:
import itertools
import os
file_1 = r'C:\Documents\text.txt'
with open(file_1, mode='r', encoding='utf8') as f:
lines = f.read()
# Here I modify `lines`
# ...
postfix = '.new'
root, ext = os.path.splitext(file_1)
file_2 = f'{root}{postfix}{ext}'
if os.path.isfile(file_2):
for j in itertools.count(start=2):
file_2 = f'{root}{postfix}.{j}{ext}'
if not os.path.isfile(file_2):
break
with open(file_2, mode='w', encoding='utf8') as f:
f.write(lines)
I realize that I can use while True ... break instead of itertools.count() but it is essentially the same thing. I am wondering if there are any substantially better solutions to this problem.
If you want to keep your code and your format is filename.new.index.txt, when you check that filename.new.2.txt exists you can check the last file with glob:
from glob import glob
# path contains path
filelist = glob(path + "\\filename.new.*")
last_index = max([filename.split(".")[-2] for filename in filelist])
And assign last_index+1 as index for the new file.
For a more robust approach, consider applying str.zfill() to indices in order to sort them easily (i.e. 001, 002, 003...).

Python - Write a list to several files named with contents of list

I have a list that represents objects which in turn have values YEAR, NAME, NUMBER.
Is it possible to then loop through that list of objects, get all values and write that row to a file named with that objects YEAR value?
inData is the list of objects, outPath is just the folder where I want them to go.
When I execute the code it seems as if only one line is represented per year. It's like if the next write line overwrites the previous value
Example of code:
def writeFileFromList(inData, outPath) :
for row in inData:
outfile = open(str(outPath +"/"+row.getYear(), "w+"))
outfile.write(str(row) + "\n")
outfile.close()
Example of what I want in contents of the output file:
2002;AAAAAA;1
2002;BBBBBB;2
2002;CCCCCC;3
To append a line to a file, use
open("filename","a")
Here, a stands for append.
Try this
def writeFileFromList(inData, outPath) :
for row in inData:
with open(str(outPath +"/"+row.getYear(), "a")) as outfile:
outfile.write(str(row) + "\n")
When you open a file, you need to specific the mode you want to use, you can refer to below link for more detail:
https://docs.python.org/2/library/functions.html#open
In your case as you want to append the content into your file of year, you should use 'a' mode to tell python that do not overwrite files, so below is the corrected code base on your example:
def writeFileFromList(inData, outPath) :
for row in inData:
outfile = open(str(outPath +"/"+row.getYear(), "a"))
outfile.write(str(row) + "\n")
outfile.close()

Python: Looping through multiple csv files and making multiple new csv files

I am starting out in Python, and I am looking at csv files.
Basically my situation is this:
I have coordinates X, Y, Z in a csv.
X Y Z
1 1 1
2 2 2
3 3 3
and I want to go through and add a user defined offset value to all Z values and make a new file with the edited z-values.
here is my code so far which I think is right:
# list of lists we store all data in
allCoords = []
# get offset from user
offset = int(input("Enter an offset value: "))
# read all values into memory
with open('in.csv', 'r') as inFile: # input csv file
reader = csv.reader(inFile, delimiter=',')
for row in reader:
# do not add the first row to the list
if row[0] != "X":
# create a new coord list
coord = []
# get a row and put it into new list
coord.append(int(row[0]))
coord.append(int(row[1]))
coord.append(int(row[2]) + offset)
# add list to list of lists
allCoords.append(coord)
# write all values into new csv file
with open(in".out.csv", "w", newline="") as f:
writer = csv.writer(f)
firstRow = ['X', 'Y', 'Z']
allCoords.insert(0, firstRow)
writer.writerows(allCoords)
But now come's the hard part. How would I go about going through a bunch of csv files (in the same location), and producing a new file for each of the csv's.
I am hoping to have something like: "filename.csv" turns into "filename_offset.csv" using the original file name as a starter for the new filename, appending ".offset" to the end.
I think I need to use "os." functions, but I am not sure how to, so any explanation would be much appreciated along with the code! :)
Sorry if I didn't make much sense, let me know if I need to explain more clearly. :)
Thanks a bunch! :)
shutil.copy2(src, dst)¶
Similar to shutil.copy(), but metadata is copied as well
shutil
The glob module finds all the pathnames matching a specified pattern
according to the rules used by the Unix shell. No tilde expansion is
done, but *, ?, and character ranges expressed with [] will be correctly matched
glob
import glob
from shutil import copy2
import shutil
files = glob.glob('cvs_DIR/*csv')
for file in files:
try:
# need to have full path of cvs_DIR
oldName = os.path.join(cvs_DIR, file)
newName = os.path.join(cvs_DIR, file[:4] + '_offset.csv')
copy2(oldName,newName)
except shutil.Error as e:
print('Error: {}'.format(e))
BTW, you can write ...
for row in reader:
if row[0] == "X":
break
for row in reader:
coord = []
...
... instead of ...
for row in reader:
if row[0] != "X":
coord = []
...
This stops checking for 'X'es after the first line.
It works because you dont work with a real list here but with a self consuming iterator, which you can stop and restart.
See also: Detecting if an iterator will be consumed.

How to read in multiple files separately from multiple directories in python

I have x directories which are Star_{v} with v=0 to x.
I have 2 csv files in each directory, one with the word "epoch", one without.
If one of the csv files has the word "epoch" in it needs to be sent through one set of code, else through another.
I think dictionaries are probably the way to go but this section of the code is a bit of a wrong mess
directory_dict={}
for var in range(0, len(subdirectory)):
#var refers to the number by which the subdirectories are labelled by Star_0, Star_1 etc.
directory_dict['Star_{v}'.format(v=var)]=directory\\Star_{var}
#directory_dict['Star_0'], directory_dict['Star_1'] etc.
read_csv(f) for f in os.listdir('directory_dict[Star_{var}') if f.endswith(".csv")
#reads in all the files in the directories(star{v}) ending in csv.
if 'epoch' in open(read_csv[0]).read():
#if the word epoch is in the csv file then it is
directory_dict[Star_{var}][read] = csv.reader(read_csv[0])
directory_dict[Star_{var}][read1] = csv.reader(read_csv[1])
else:
directory_dict[Star_{var}][read] = csv.reader(read_csv[1])
directory_dict[Star_{var}][read1] = csv.reader(read_csv[0])
when dealing with csvs, you should use the csv module, and for your particular case, you can use a dictreader and parse the headers to check for the column you're looking for
import csv
import os
directory = os.path.abspath(os.path.dirname(__file__)) # change this to your directory
csv_list = [os.path.join(directory, c) for c in os.listdir(directory) if os.path.splitext(c) == 'csv']
def parse_csv_file():
" open CSV and check the headers "
for c in csv_list:
with open(c, mode='r') as open_csv:
reader = csv.DictReader(open_csv)
if 'epoch' in reader.fieldnames:
# do whatever you want here
else:
# do whatever else
then you can extract it from the DictReader's CSV header and do whatever you want
Also your python looks invalid

Python 2.7 - how to count lines in files and use the results further

I have to write a script that lists all the text files in a directory, then counts the number of lines in each file and then gives you the max amount, the minimum amount and average.
so far I have this:
import glob
import os
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f, start = 1):
pass
return i
files = glob.glob("/home/seb/Learning/*.txt")
print files
length = []
for file in files:
file_len(file)
length.append(i)
print length
As you (and me) could expect it works up until
length.append(i)
because i is not identified - I thought it was worth a shot though.
My question would be, how can I use the return of the function to append it to a list?
You need to assign the return value of file_len(file) to a variable:
flength = file_len(file)
length.append(flength)
The name i is a local name in file_len and not visible outside of the function, but the function does return the value.

Categories

Resources