Concatenating multiple .txt in files in subdirectories - python

I have multiple .txt files in different levels of subdirectory. All txt files are in the final iteration, that is, there is no level with both .txt files and further directories. I want to concatenate them all into one new text file, but can't find a simple way to go through all the subdirectories.
A command I've found, that is to be entered into the python command line terminal thus:
$ cat source/file/*.txt > source/output/output.txt
But I am not sure how I could make this iterate over multiple subdirectories.
(I am a real beginner with python, there seems to be some confusion. Is this not a python command? The source I found it from claimed it was...)

You could build them using a python script using something like:
file_list = ['c:\data.txt', 'c:\folder1\data.txt']
for everything in file_list:
f = open(everything)
for line in f.readlines():
print line
Call the script from cmd e.g. \python.exe 'buildtxtfiles.py'

Related

Python: Iterate through directory to find specific text file, open it, write some of its contents to a new file, close it, and move on to the next dir

I have a script that takes an input text file then finds data in it, puts that data as a variable, then later I call that variable to write to a new file. This snippet of code is just for reading the txt file and storing the data from it as variables.
searchfile = open('C://Users//Me//DynamicFolder//report//summary.txt','r', encoding='utf-8')
slab_count=0
slab_number=[]
slab_total=0
for line in searchfile:
if "Slab" in line:
slab_num = ([float(s) for s in re.findall(r'[-+]?(?:\d*\.\d+|\d+)', line)])
slab_percent = slab_num[-1]
slab_number.append(slab_percent)
slab_count=slab_count+1
slab_total=0
for slab_percent in slab_number:
slab_total+=slab_percent
searchfile.close()
I am using xlsxwriter to write the variables to an excel doc.
My question is, how do I iterate this to search through a given directories sub-directories for summary.txt when there is a dynamic folder.
So C://Users//Me//DynamicFolder//report//summary.txt is a path to one of the files. There are several folders I named DynamicFolder that are there because another process puts them there, they change their names all the time. I need have this script go into each of those dynamic folders to a subdir called report, this is a static name and is always the same. So each of those dynamicfolders has another subdir called report, and in the report folder is a file called summary.txt. I am trying to go through each of those dynamicfolders into the subdir report > summary.txt and then opening and writing data from those txt files.
How do I iterate or loop this? Right now I have 18 folders with those DynamicFolder names that will change when they are over written. How can I put this snip of code to iterate through?
for path in Path('C://Users//Me//DynamicFolder//report//summary.txt').rglob('summary.txt'):
report folder is not the only folder with a summary.txt file, but its the only folder with the file I want. So this code above pulls ALL summary.txt files from all subdir's under the DynamicFolder (not just report folder). I am wondering if I can make this JUST do the 'report' subdir folders under DynamicFolders, and somehow use this to iterate the rest of my code?

Check if there are .format files in a directory

I have been trying to figure out for a while how to check if there are .pkl files in a given directory. I checked the website and I could find ways to find if there are files in the directory and list them, but I just want to check if they are there.
In my directory are a total of 7 .pkl files, as soon as I create one, the others are created so to check if the seven of them exist, it will be enough to check if one exists. Therefore, I would like to check if there is any .pkl file.
This is working if I do:
os.path.exists('folder1/folder2/filename.pkl')
But I had to write one of my file names. I would like to do so without searching for a specific file. I also tried
os.path.exists('folder1/folder2/*.pkl'),
but it is not working neither as I don't have any file named *.pkl.
You can use the python module glob (https://docs.python.org/3/library/glob.html)
Specifically, glob.glob('folder1/folder2/*.pkl') will return a list of all .pkl files in folder2.
You can use :
for dir_path, dir_names, file_names in os.walk(search_dir):
# Go over all files and folders
for file_name in file_names:
if (file_name.endswith(".pkl")):
# do something like break after the first one you find
Note : This can be used if you want to search entire directory with sub directories also
In case you want to search only one directory , you can run the "for" on os.listdir(path)

Saving a .txt file in a different directory than the original python3

I am trying to save a file that my program creates a directory that is within the directory the program is run in.
Basically my it looks like this:
DIRECTORY_ONE:
program.py
DIRECTORY_TWO:
So I want program.py to save to DIRECTORY_TWO
I've tried
outFile = open("/output_DB/" + "out.txt",'w')
and making into a with ... as block among a few other methods
What is the best way to do this task?
Check if directory exists. If not, create it.
if not os.path.exists("output_DB"):
os.makedirs("output_DB")
outFile = open("output_DB/" + "out.txt",'w')

How to write python script on linux to run a specific command requiring two files, which have similar file names

I am struggling for creating python script in Linux terminal. I am trying to make command to analyze hundreds of files.
And there is a python program(ngCGH) which analyze bam.file. ngCGH's command is following:
ngCGH -o /mnt/data/A/B/C.txt [normal.bam] [tumor.bam]
Two .bam files have following naming role
1N-------.bam
1T-------.bam
2N-------.bam
2T-------.bam
In short, files having matched number should be analyzed.
In addition, I want to make output file name differently like in following way.
1N------.bam 1T------.bam
Result: 1NT analysis.txt
2N------.bam 2T------.bam
Result: 2NT analysis.txt
The output files should be txt files having different names.
import os
j = 1
files = set(x[2:len(x)-4] for x in os.listdir() if x.endswith('.bam'))
for i in files:
command = 'ngCGH -o <path>/'+str(j)+'NT analysis.txt '+str(j)+'N'+i+'.bam '+str(j)+'T'+i+'.bam'
os.system(command)
replace <path> with actual path.
Hope this was what you were expecting.

Apple Automator process csv files and create new files

Is it possible to loop through a set of selected files, process each, and save the output as new files using Apple Automator?
I have a collection of .xls files, and I've gotten Automator to
- Ask for Finder Items
- Open Finder Items
- Convert Format of Excel Files #save each .xls file to a .csv
I've written a python script that accepts a filename as an argument, processes it, and saves it as p_filename in the directory the script's being run from. I'm trying to use Run Shell Script with the /usr/bin/python shell and my python script pasted in.
Some things don't translate too well, though, especially since I'm not sure how it deals with python's open('filename','w') command. It probably doesn't have permissions to create new files, or I'm entering the command incorrectly. I had the idea to instead output the processed file as text, capture it with Automator, and then save it to a new file.
To do so, I tried to use New Text File, but I can't get it to create a new text file for each file selected back in the beginning. Is it possible to loop through all the selected Finder Items?
Why do you want this done in the folder of the script? Or do you mean the folder of the files you are getting from the Finder items? In that case just get the path for each file passed into Python.
When you run open('filename','w') you should thus pass in a full pathname. Probably what's happening is you are actually writing to the root directory rather than where you think you are.
Assuming you are passing your files to the shell command in Automator as arguments then you might have the following:
import sys, os
args = sys.argv[1:]
for a in args:
p = os.path.dirname(a)
mypath = p + "/" + name
f = open(mypath, "w")

Categories

Resources