This question already has answers here:
How do I create an incrementing filename in Python?
(13 answers)
Closed 2 years ago.
Suppose that I have a textfile text.txt. In Python, I read it to lines (which is a str object), make some changes to lines, and after that I want to save lines to a new file in the same directory. The new filename should be
text.new.txt if it doesn't exist already,
text.new.2.txt if text.new.txt already exists,
text.new.3.txt if text.new.txt and text.new.2.txt already exist,
etc.
I came up with the following solution:
import itertools
import os
file_1 = r'C:\Documents\text.txt'
with open(file_1, mode='r', encoding='utf8') as f:
lines = f.read()
# Here I modify `lines`
# ...
postfix = '.new'
root, ext = os.path.splitext(file_1)
file_2 = f'{root}{postfix}{ext}'
if os.path.isfile(file_2):
for j in itertools.count(start=2):
file_2 = f'{root}{postfix}.{j}{ext}'
if not os.path.isfile(file_2):
break
with open(file_2, mode='w', encoding='utf8') as f:
f.write(lines)
I realize that I can use while True ... break instead of itertools.count() but it is essentially the same thing. I am wondering if there are any substantially better solutions to this problem.
If you want to keep your code and your format is filename.new.index.txt, when you check that filename.new.2.txt exists you can check the last file with glob:
from glob import glob
# path contains path
filelist = glob(path + "\\filename.new.*")
last_index = max([filename.split(".")[-2] for filename in filelist])
And assign last_index+1 as index for the new file.
For a more robust approach, consider applying str.zfill() to indices in order to sort them easily (i.e. 001, 002, 003...).
Related
I have many many files where the starting 3 letters is different. I am trying to concatenate the monthly files into semi annually using the code below. But rather than replace FXE 7 times with the different letters, I want to just replace it in 1 place. I tried a few methods including using ETF = FXE and then substituting FXE for {ETF} but my inexperience with the syntax is stumping me. Any quick advice is appreciated. Thx in advance.
# Creating a list of filenames
filenames = ['FXE_2022_01.txt', 'FXE_2022_02.txt', 'FXE_2022_03.txt', 'FXE_2022_04.txt', 'FXE_2022_05.txt', 'FXE_2022_06.txt']
# Open file3 in write mode
with open('FXE_2022_01_to_06.txt', 'w') as outfile:
# Iterate through list
for names in filenames:
# Open each file in read mode
with open(names) as infile:
# read the data from file1 and
# file2 and write it in file3
outfile.write(infile.read())
# Add '\n' to enter data of file2
# from next line
outfile.write("\n")
Here's one possible way to do it using list comprehension and f strings. I also generalized the year and index numbers:
f_headers = ['FXE', 'ETF', 'GBJ']
f_year = '2022'
f_range = list(range(1,7))
fnames = [[f'{fh}_{f_year}_{fr:02d}.txt' for fr in f_range] for fh in f_headers]
for fn in fnames:
print(fn)
Output:
['FXE_2022_01.txt', 'FXE_2022_02.txt', 'FXE_2022_03.txt', 'FXE_2022_04.txt', 'FXE_2022_05.txt', 'FXE_2022_06.txt']
['ETF_2022_01.txt', 'ETF_2022_02.txt', 'ETF_2022_03.txt', 'ETF_2022_04.txt', 'ETF_2022_05.txt', 'ETF_2022_06.txt']
['GBJ_2022_01.txt', 'GBJ_2022_02.txt', 'GBJ_2022_03.txt', 'GBJ_2022_04.txt', 'GBJ_2022_05.txt', 'GBJ_2022_06.txt']
I have data (mixed text and numbers in txt files) and I'd like to write a for loop that creates a list of lists, such that I can process the data from all the files using fewer lines.
So far I have written this:
import csv
path = (some path...)
files = [path + 'file1.txt',path + 'file2.txt', path +
'file3.txt', ...]
for i in files:
with open(i, 'r') as j:
Reader = csv.reader(j)
List = [List for List in Reader]
I think I overwrite List instead of creating a nested list, since I get Reader with size of 1 and a list that's with dimensions for one of the files.
My questions:
Given that the files may contain different line numbers, is it the right approach to save some lines of code? (What could be done better?)
I think the problem is in [List for List in Reader], is there a way to change it so I don't overwrite List? Something like add to List?
You can use the list append() method to add to an existing list. Since csv.reader instances are iterable objects, you can just pass one of them to the method as shown below:
import csv
from pathlib import Path
path = Path('./')
filenames = ['in_file1.txt', 'in_file2.txt'] # etc ...
List = []
for filename in filenames:
with open(path / filename, 'r', newline='') as file:
List.append(list(csv.reader(file)))
print(List)
Update
An even more succinct way to do it would be to use something called a "list comprehension":
import csv
from pathlib import Path
path = Path('./')
filenames = ['in_file1.txt', 'in_file2.txt'] # etc ...
List = [list(csv.reader(open(path / filename, 'r', newline='')))
for filename in filenames]
print(List)
Yes, use .append():
import numpy as np
import matplotlib.pyplot as plt
import csv
path = (some path...)
files = [path+x for x in ['FILESLIST']]
for i in files:
with open(i, 'r') as j:
Reader = csv.reader(j)
List.append([L for L in Reader])
This question already has answers here:
How to delete a specific line in a file?
(17 answers)
Closed 3 years ago.
I would like to read a lot of data in a folder, and want to delete lines that have "DT=(SINGLE SINGLE SINGLE)", and then write it as new data.
In that Data folder, there are 300 data files!
My code is
import os, sys
path = "/Users/xxx/Data/"
allFiles = os.listdir(path)
for fname in allFiles:
print(fname)
with open(fname, "r") as f:
with open(fname, "w") as w:
for line in f:
if "DT=(SINGLE SINGLE SINGLE)" not in line:
w.write(line)
FileNotFoundError: [Errno 2] No such file or directory: '1147.dat'
I want to do it for a bunch of dataset.
How can I automatically read and write to delete the lines?
and is there way to make a new dataset with a different name? e.g. 1147.dat -> 1147_new.dat
The below should do; code demos of what each annotated line does afterwards:
path = "/Users/xxx/Data/"
allFiles = [os.path.join(path, filename) for filename in os.listdir(path)] # [1]
del_keystring = "DT=(SINGLE SINGLE SINGLE)" # general case
for filepath in allFiles: # better longer var names for clarity
print(filepath)
with open(filepath,'r') as f_read: # [2]
loaded_txt = f_read.readlines()
new_txt = []
for line in loaded_txt:
if del_keystring not in line:
new_txt.append(line)
with open(filepath,'w') as f_write: # [2]
f_write.write(''.join([line for line in new_txt])) # [4]
with open(filepath,'r') as f_read: # [5]
assert(len(f_read.readlines()) <= len(loaded_txt))
1 os.listdir returns only the filenames, not the filepaths; os.path.join joins its inputs into a fullpath, with separators (e.g. \\): folderpath + '\\' + filename
[2] NOT same as doing with open(X,'r') as .., with open(X,'w') as ..:; the as 'w' empties the file, thus nothing for as 'r' to read
[3] If f_read.read() == "Abc\nDe\n12", then f_read.read().split('\n')==["Abc,"De","12"]
[4] Undoes [3]: if _ls==["a","bc","12"], then "\n".join([x for x in _ls])=="a\nbc\n12"
[5] Optional code to verify that saved file's # of lines is <= original file's
NOTE: you may see the saved filesize slightly bigger than original's, which may be due to original's better packing, compression, etc - which you can figure from its docs; [5] ensures it isn't due to more lines
# bonus code to explicitly verify intended lines were deleted
with open(original_file_path,'r') as txt:
print(''.join(txt.readlines()[:80])) # select small excerpt
with open(processed_file_path,'r') as txt:
print(''.join(txt.readlines()[:80])) # select small excerpt
# ''.join() since .readlines() returns a list, delimited by \n
NOTE: for more advanced caveats, see comments below answer; for a more compact alternative, see Torxed's version
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
Summary
If FCR_Network_Coordinates_0 and FCR_Network_Coordinates_2 exists, it should write to file FCR_Network_Coordinates_3 and not to FCR_Network_Coordinates_1
Details
I have the following problem:
I want to write a new csv file, if it does not exist and increase the extension number if some file was found in directory. But if as an example a file with number extension "1" exists, and one with "3", but none with "2", it should write the next file with "4". So it should add 1 to the highest number extension
My code so far is:
index = 0
while os.path.exists('../FCR_Network_Coordinates_'+ str(index) + '.csv'):
index+=1
with open('../FCR_Network_Coordinates_'+str(index)+'.csv', 'wb') as csv_file:
writer = csv.writer(csv_file, delimiter=";")
for key, value in sparse1.items():
writer.writerow(['{:.1f}'.format(t) for t in key]+value)
EDIT
It should also work for paths where parameters are added in path name
"../FCR_Network_Coordinates_"+"r_"+radius+"x_"+x+"y_"+y+"z_"+z+"fcr_"+fcr_size+"_"+new_number+".csv"
could look like:
FCR_Network_Coordinates_radius_3_x_0.3_y_0.3_z_2_fcr_2_1.csv
EDIT2
Furthermore if there are other parameters in the filename it should not look to the highest number of all files, but of the highest number of that file that have these parameters too
Your code will stop searching at file "2" (if "2" does not exist) even if there are files "3" & "4"
You need to use glob to get all files that match your pattern
import glob
import re
files=glob.glob("../FCR_Network_Coordinates_*.csv")
Next remove all the nondigits from your file names
file_nums=[]
for i, s in enumerate(files):
num_str = re.search("(\d+).csv$", files[i]) #capture only integer before ".csv" and EOL
file_nums.append(parseInt(num_str.group(1))) #convert to number
new_number=max(file_nums)+1 #find largest and increment
Sort the list of files to find which has the highest number.
Something like the following should work for you:
import glob
import os
# .....
existing_matches = glob.glob('../FCR_Network_Coordinates_*.csv')
if existing_matches:
used_numbers = []
for f in existing_matches:
try:
file_number = int(os.path.splitext(os.path.basename(f))[0].split('_')[-1])
used_numbers.append(file_number)
except ValueError:
pass
save_number = max(used_numbers) + 1
else:
save_number = 1
with open('../FCR_Network_Coordinates_{}.csv'.format(save_number), 'wb') as csv_file:
writer = csv.writer(csv_file, delimiter=";")
for key, value in sparse1.items():
writer.writerow(['{:.1f}'.format(t) for t in key] + value)
glob finds all files with names similar to your pattern, where * is used as a wildcard.
We then use os.path to manipulate each filename and work out what the number in the name is:
os.path.basename() gets us just the filename - e.g. 'FCR_Network_Coordinates_1.csv'
os.path.splitext() splits the file name ('FCR_Network_Coordinates_1') from the extension ('.csv'). Taking the element at index 0 gets us the filename rather than the extension
splitting this based on '_' splits this every time there is an '_' - resulting in a list of ['FCR', 'Network', 'Coordinates', '1']. Taking the index -1 gets us the last entry in this list, i.e. the 1.
we have to wrap this as an int() to be able to apply numeric operations to it.
We also catch an error in case there is some filename using letters rather than numbers after the underscore. Then, we take the max of the numbers found and add one. If no numbers have been found, we use 1 for the filename.
EDIT:
In response to the question update, we just need to alter our glob and the final name we write to - the glob changes to:
existing_matches = glob.glob('../FCR_Network_Coordinates_r_{}_x_{}_y_{}_z_{}_fcr_{}_*.csv'.format(
radius, x, y, z, fcr_size))
and the file opening line changes to:
with open('../FCR_Network_Coordinates_r_{}_x_{}_y_{}_z_{}_fcr_{}_{}.csv'.format(
radius, x, y, z, fcr_size, save_number), 'wb') as csv_file:
I'm trying to save the names of files that fulfill a certain condition.
I think the easiest way to do this would make a short Python program that imports and reads the files, checks if the condition is met, and (assuming it is met) then saves the names of the files.
I have data files with just two columns and four rows, something like this:
a: 5
b: 5
c: 6
de: 7
I want to save the names of the files (or part of the name of the files, if that's a simple fix, otherwise I can just sed the file afterwards) of the data files that have the 4th number ([3:1]) greater than 8. I tried importing the files with numpy, but it said it couldn't import the letters in the first column.
Another way I was considering trying to do it was from the command line something along the lines of cat *.dat >> something.txtbut I couldn't figure out how to do that.
The code I've tried to write up to get this to work is:
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing value datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.loadtxt("file", delimiter=' ', usecols=1)
if a[3:0] > 8:
print >> f, filename
f.close()
When I do this, I get an error that says TypeError: 'int' object is not iterable, but I don't know what that's referring to.
I ended up using
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.genfromtxt(file)
if a[3,1] > 8:
f.write(filename + "\n")
f.close()
it is hard to tell exactly what you want but maybe something like this
from glob import glob
from re import findall
fpattern = "/path/to/*.dat"
def test(fname):
with open(fname) as f:
try:
return int(findall("\d+",f.read())[3])>8
except IndexError:
pass
matches = [fname for fname in glob(fpattern) if test(fname)]
print matches