I'm trying to iterate over files in a folder, but the code is always executed on only one file (there are two files at the moment). The program should open the txt file, convert it to a Python list, and write the list to a new created file (each separately). I can't find the bug alone.
import io
import os
'''printing program directory'''
ANVCONDA_directory = os.path.dirname(os.path.realpath(__file__)) + os.sep
print ANVCONDA_directory
inputdir = ANVCONDA_directory + "TEMP"
print os.listdir(inputdir)
'''counting files in folder'''
path, dirs, files = os.walk(inputdir).next()
file_count = len(files)
print file_count
for filename in os.listdir(inputdir):
#opening txt file to extract data
with io.open(inputdir + "\\" + filename, "r", encoding="cp1250") as file:
ocr_results = [line.strip() for line in file]
#spliting into list
for line in ocr_results:
print("[" + line + "]")
#writing list to file
import pickle
with open('outfile' + filename, 'wb') as fp:
pickle.dump(ocr_results, fp)
The indentation for the block of code where you are writing the output seems to be wrong. It should be within the for loop. Since it is currently outside the loop it is only executed after the loop above has ended.
Another point that I would like to mention is that os.listdir includes everything in the folder including sub directories. Maybe you should look at os.isdir if you want to skip directories
Related
I'm trying to execute a script that will unzip all files in a zipped folder which has multiple txts and .csv files, search only the .csv files for a string, if it contains that string, copy the entire zipped folder to a new folder, if it doesn't, move on to the next zipped folder. I have several scripts that do part of this but can't piece them together. I am a beginner in python so this script looks like it gets complicated.
This script prints the files in the zipped folder, my next step is to search within the .csv files it contains for the string PROGRAM but I don't know how to code it, I'm thinking it goes at the end of this code since it looks like it's running through a loop.
import os
import pandas as pd
import zipfile
curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '\namedfile.zip')
text_files = zf.infolist()
list_ = []
print ("Uncompressing and reading data... ")
for text_file in text_files:
print(text_file.filename)
I wrote this script separately, searches for the string PROGRAM in a folder that contains .csv files
import os
from pathlib import Path
#Searches the .csv files within the "AllCSVFiles"
#folder for the string "GBSD"
search_path = "./AllCSVFiles"
file_type = ".csv"
search_str = "PROGRAM"
if not (search_path.endswith("/") or search_path.endswith("\\") ):
search_path = search_path + "/"
if not os.path.exists(search_path):
search_path ="."
for fname in os.listdir(path=search_path):
if fname.endswith(file_type):
fo = open(search_path + fname)
line = fo.readline()
line_no = 1
while line != '' :
index = line.find(search_str)
if ( index != -1) :
print(fname, "[", line_no, ",", index, "] ", sep="")
line = fo.readline()
line_no += 1
fo.close()
Is there an easier way to work this code?
I think the first thing is to make sure you know the structure of the solution.
Reading your description, I'd say it's this:
# Create empty list, for marked zip file
# Iterate over zip files
# Unzip
# Iterate over files
# If file ends in .csv
# If file contains SEARCH_STR
# Mark this zip file to be copied
# Stop searching this zip file
# Iterate marked zip files
# Copy zip file to DEST_DIR
If that is the structure, is this enough to help you see where to put your code?
After that, you can clean up your search for search_str in file quite a bit:
with open(search_path + fname) as csv_file:
line_no = 0
for line in csv_file:
line_no += 1
if search_str in line:
search_index = line.index(search_str)
print(f'{fname}[{line_no},{search_index}]')
# Mark the zip file this csv_file is in
# figure out how to stop searching this zip file
for line in csv_file: text files opened in Python have a built-in mechanism for iterating over lines
if search_str in line: if you don't need to know the line exactly where search_str is, simply test for membership, is search_str in the string line?
I'm trying to write a function that traverses a given path and opens/reads all the .txt files therein and returns these as a string or returns a value that I can use to apply text normalization.
Currently my code only returns the first .txt file it finds, except when I use a print(f.read()) statement, then it prints all the files it read.
I would like it to return all the files
def readtxt(path):
import os
for subdir, dirs, files in os.walk(path):
for file in files:
filepath = subdir + os.sep + file
if file.endswith(".txt"):
filelist = filepath.split()
for file in filelist:
with open(os.path.join(path, filepath), 'r') as f:
lines = (f.read())
return lines
readtxt('/Users/path/')
When you use a return statement, the function ends. That's why it only gives you 1 file - it stops when it finds one. You could add them all into a list & return that afterwards instead.
found = []
And then, inside of your loops, you simply do:
with open(os.path.join(path, filepath), 'r') as f:
lines = (f.read())
found.append(lines) # Append to list instead of returning
so that you can return everything you found using:
return found
In order to open all the files in a specific directory (path). I use the following code:
for filename in os.listdir(path): # For each file inside path
with open(path + filename, 'r') as xml_file:
#Do some stuff
However, I want to read the files in the directory starting from a specific position. For instance, if the directory contains the files f1.xml, f2.xml, f3.xml, ... ,f10.xml in this order, how can I read all the files starting from f3.xml (and ignore f1.xml and f2.xml) ?
Straightforward way
import os
keep = False
first = 'f3.xml'
for filename in os.listdir(path): # For each file inside path
keep = keep or filename == first
if keep:
with open(path + filename, 'r') as xml_file:
#Do some stuff
I have a directory (c:\temp) with some files:
a.txt
b.py
c.html
I need to read all of the files in a directory and output it to a text file. I've got that part handled (I think):
WD = "c:\\temp"
import glob
files = glob.glob('*.*')
with open('dirList.txt', 'w') as in_files:
for eachfile in files: in_files.write(eachfile + '\n')
I need the output to look like:
a|a.txt
b|b.py
c|c.html
I'm not quite sure where to look next.
I'd split the file name by . and take the first part:
for eachfile in files:
in_files.write('%s|%s\n' % (eachfile.split('.')[0], eachfile))
You have almost solved your problem. I am not quite sure where you are getting stuck. If all you need to write is the file name (without extension) followed by | then you just need to update your code as this:
import glob
files = glob.glob('*.*')
with open('dirList.txt', 'w') as in_files:
for eachfile in files:
file_name_without_extension = eachfile.split(".")[0]
in_files.write( file_name_without_extension + "|" + eachfile
+ '\n')
I have written a function that finds all of the version.php files in a path. I am trying to take the output of that function and find a line from that file. The function that finds the files is:
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
print os.path.join(root,file)
find_file()
There are several version.php files in the path and I would like to return a string from each of those files.
Edit:
Thank you for the suggestions, my implementation of the code didn't fit my need. I was able to figure it out by creating a list and passing each item to the second part. This may not be the best way to do it, I've only been doing python for a few days.
def cmsoutput():
fileList = []
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
fileList.append(os.path.join(root,file))
for path in fileList:
with open(path) as f:
for line in f:
if line.startswith("$wp_version ="):
version_number = line[15:20]
inst_path = re.sub('wp-includes/version.php', '', path)
version_number = re.sub('\';', '', version_number)
print inst_path + " = " + version_number
cmsoutput()
Since you want to use the output of your function, you have to return something. Printing it does not cut it. Assuming everything works it has to be slightly modified as follows:
import os
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
return os.path.join(root,file)
foundfile = find_file()
Now variable foundfile contains the path of the file we want to look at. Looking for a string in the file can then be done like so:
with open(foundfile, 'r') as f:
content = f.readlines()
for lines in content:
if '$wp_version =' in lines:
print(lines)
Or in function version:
def find_in_file(string_to_find, file_to_search):
with open(file_to_search, 'r') as f:
content = f.readlines()
for lines in content:
if string_to_find in lines:
return lines
# which you can call it like this:
find_in_file("$wp_version =", find_file())
Note that the function version of the code above will terminate as soon as it finds one instance of the string you are looking for. If you wanna get them all, it has to be modified.