Python read 100's of text files - python

I have some data in simplejson format in txt files, which I read using:
with open("my_file.txt") as f: any_variable = simplejson.load(f)
It works fine, no problems. However, I now have 100's of such text files (some of which, I dont know the names for!) to read from and I was wondering, if there was a pythonic way to read all these files and assign them to say: any_variable1 to any_variableN. I dont really care in what order they are read in.
Obviously, a simple way would be to loop and store results, yet, I was wondering if there was a pythonic way here.

If the files are inside a directory, you can use:
variables = []
path = "/your/path"
for filename in os.listdir(path):
variables.append(simplejson.load(open(os.path.join(path, filename))))

Related

Python search files in multiple subdirectories for specific string and return file path(s) if present

I would be very grateful indeed for some help for a frustrated and confused Python beginner.
I am trying to create a script that searches a windows directory containing multiple subdirectories and different file types for a specific single string (a name) in the file contents and if found prints the filenames as a list. There are approximately 2000 files in 100 subdirectories, and all the files I want to search don't necessarily have the same extension - but are all in essence, ASCII files.
I've been trying to do this for many many days but I just cannot figure it out.
So far I have tried using glob recursive coupled with reading the file but I'm so very bewildered. I can successfully print a list of all the files in all subdirectories, but don't know where to go from here.
import glob
files = []
files = glob.glob('C:\TEMP' + '/**', recursive=True)
print(files)
Can anyone please help me? I am 72 year old scientist trying to improve my skills and "automate the boring stuff", but at the moment I'm just losing the will.
Thank you very much in advance to this community.
great to have you here!
What you have done so far is found all the file paths, now the simplest way is to go through each of the files, read them into the memory one by one and see if the name you are looking for is there.
import glob
files = glob.glob('C:\TEMP' + '/**', recursive=True)
target_string = 'John Smit'
# itereate over files
for file in files:
try:
# open file for reading
with open(file, 'r') as f:
# read the contents
contents = f.read()
# check if contents have your target string
if target_string in conents:
print(file)
except:
pass
This will print the file path each time it found a name.
Please also note I have removed the second line from your code, because it is redundant, you initiate the list in line 3 anyway.
Hope it helps!
You could do it like this, though i think there must be a better approach
When you find all files in your directory, you iterate over them and check if they contain that specific string.
for file in files:
if(os.path.isfile(file)):
with open(file,'r') as f:
if('search_string' in f.read()):
print(file)

Python appending multiple returned files into one file

I would like to be able to take my multiple returned files, which are all of the same structure in .csv format, into one file. Below I have posted the part of the code that I'm referring to.
I am not sure if I should use .write or .append to achieve this efficiently.
for thisfile in files:
if re.match(pattern, thisfile):
s3.download_file(Bucket=os.environ.get('SP_BUCKET'), Key=thisfile)
return files
append files here
Lets call the file names part-1 part-2 and part-3. Thank you.

Using a list to find and move specific files - python 2.7

I've seen a lot of people asking questions about searching through folders and creating a list of files, but I haven't found anything that has helped me do the opposite.
I have a csv file with a list of files and their extensions (xxx0.laz, xxx1.laz, xxx2.laz, etc). I need to read through this list and then search through a folder for those files. Then I need to move those files to another folder.
So far, I've taken the csv and created a list. At first I was having trouble with the list. Each line had a "\n" at the end, so I removed those. From the only other example I've found... [How do I find and move certain files based on a list in excel?. So I created a set from the list. However, I'm not really sure why or if I need it.
So here's what I have:
id = open('file.csv','r')
list = list(id)
list_final = ''.join([item.rstrip('\n') for item in list])
unique_identifiers = set(list_final)
os.chdir(r'working_dir') # I set this as the folder to look through
destination_folder = 'folder_loc' # Folder to move files to
for identifier in unique_identifiers:
for filename in glob.glob('%s_*' % identifier)"
shutil.move(filename, destination_folder)
I've been wondering about this ('%s_*' % identifier) with the glob function. I haven't found any examples with this, perhaps that needs to be changed?
When I do all that, I don't get anything. No errors and no actual files moved...
Perhaps I'm going about this the wrong way, but that is the only thing I've found so far anywhere.
its really not hard:
for fname in open("my_file.csv").read().split(","):
shutil.move(fname.strip(),dest_dir)
you dont need a whole lot of things ...
also if you just want all the *.laz files in a source directory you dont need a csv at all ...
for fname in glob.glob(os.path.join(src_dir,"*.laz")):
shutil.move(fname,dest_dir)

Skip first row in python

I am using
for file in fileList:
f.write(open(file).read())
I am combining files if a folder to one csv. However I dont need X amount of headers in the one file.
Is there a way to use this and have it write everything but the first row (header) coming from the files in the files?
Use python csv module
Or something like that:
for file_name in file_list:
file_obj = open(file_name)
file_obj.read()
f.write(file_obj.read())
This solution doesn't load whole file into memory, so when you use file_obj.readlines(), whole file content is load into memory
Note, that it isn't good practice to name variables with builtin names
for file in fileList:
mylines = open(file).readlines()
f.write("".join(mylines[1:]))
This should point you in the right direction. Please don't do your homework on stackoverflow.
If it's a cvs file, look into python csv lib.

Create for loop for naming output file Python

So i'm importing a list of names
e.g.
Textfile would include:
Eleen
Josh
Robert
Nastaran
Miles
my_list = ['Eleen','Josh','Robert','Nastaran','Miles']
Then i'm assigning each name to a list and I want to write a new excel file for each name in that list.
#1. Is there anyway I can create a for loop where on the line:
temp = os.path.join(dir,'...'.xls')
_________________________
def high_throughput(names):
import os
import re
# Reading file
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
out_file=open(temp,'w')
data = []
for line in in_file:
data.append(line)
in_file.close()
I'm still not sure what you're trying to do (and by "not sure", I mean "completely baffled"), but I think I can explain some of what you're doing wrong, and how to do it right:
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
At this point, you don't have the input list of names. That's what you're reading from in_file, and you haven't read it yet. Later on, you read those named into data, after which you can use them. So:
in_file=open(names,'r')
dir,file=os.path.split(names)
data = []
for line in in_file:
data.append(line)
in_file.close()
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
Note that I put the for loop outside the function call, because you have to do that. And that's a good thing, because you presumably want to open each path (and do stuff to each file) inside that loop, not open a single path made out of a loop of files.
But if you don't insist on using a for loop, there is something that may be closer to what you were looking for: a list comprehension. You have a list of names. You can use that to build a list of paths. And then you can use that to build a list of open files. Like this:
paths = [os.path.join(dir, '{}.xls'.format(name)) for name in data]
out_files = [open(path, 'w') for path in paths]
Then, later, after you've built up the string you want to write to all the files, you can do this:
for out_file in out_files:
out_file.write(stuff)
However, this is kind of an odd design. Mainly because you have to close each file. They may get closed automatically by the garbage collection, and even if they don't, they may get flushed… but unless you get lucky, all that data you wrote is just sitting around in buffers in memory and never gets written to disk. Normally you don't want to write programs that depend on getting lucky. So, you want to close your files. With this design, you'd have to do something like:
for out_file in out_files:
out_file.close()
It's probably a lot simpler to go back to the one big loop I suggested in the first place, so you can do this:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
out_file.write(stuff)
out_file.close()
Or, even better:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
with open(temp,'w') as out_file:
out_file.write(stuff)
A few more comments, while we're here…
First, you really shouldn't be trying to generate .xls files manually out of strings. You can use a library like openpyxl. Or you can create .csv files instead—they're easy to create with the csv library that comes built in with Python, and Excel can handle them just as easily as .xls files. Or you can use win32com or pywinauto to take control of Excel and make it create your files. Really, anything is better than trying to generate them by hand.
Second, the fact that you can write for line in in_file: means that an in_file is some kind of sequence of lines. So, if all you want to do is convert it to a list of lines, you can do that in one step:
data = list(in_file)
But really, the only reason you want this list in the first place is so you can loop around it later, creating the output files, right? So why not just hold off, and loop over the lines in the file in the first place?
Whatever you do to generate the output stuff, do that first. Then loop over the file with the list of filenames and write stuff. Like this:
stuff = # whatever you were doing later, in the code you haven't shown
dir = os.path.dirname(names)
with open(names, 'r') as in_file:
for line in in_file:
temp = os.path.join(dir, '{}.xls'.format(line))
with open(temp, 'w') as out_file:
out_file.write(stuff)
That replaces all of the code in your sample (except for that function named high_throughput that imports some modules locally and then does nothing).
Take a look at openpyxl, especially if you need to create .xlsx files. Below example assumes the Excel workbooks are created as blank.
from openpyxl import Workbook
names = ['Eleen','Josh','Robert','Nastaran','Miles']
for name in names:
wb = Workbook()
wb.save('{0}.xlsx'.format(name))
Try this:
in_file=open(names,'r')
dir,file=os.path.split(names)
for name in in_file:
temp = os.path.join(dir, name + '.xls')
with open(temp,'w') as out_file:
# write data to out_file

Categories

Resources