Creating a .txt file of file directories - python

I am trying to create a .txt file of file directories at a location, remove the prefixes and save the text file.
I use the os.walk module to build a list of directories of a location into a .txt file. I always get the text file of the directories.
The part where it removes the prefixes of those lines of directories in the next chunk of code doesn't work. It creates its own .txt file (as it is supposed to) but it is always empty.
If there is a solution that does all of this in one .txt file and one block of code that would be even better!
Here is what I have so far, and I'm using dummy directories for privacy's sake.
import os
from datetime import datetime
# this is to create a filename with the timestamp_directory_list for a .txt file
now = datetime.now()
filename = datetime.now().strftime("%Y_%m_%d_%H_%M_%S_directory_list.txt")
# uses os module to walk the directories and files
# within a given location, then writes it line by line to a .txt file
with open(filename, "w") as directory_list:
for path, subdirs, files in os.walk(r"C:/Users"):
for filenameX in files:
f = os.path.join(path)
directory_list.write(str(f) + os.linesep)
# Open up .txt file, read a line, trim the prefix, then save it
# this is to create a filename with the timestamp_directory_list for a .txt file
trim = datetime.now().strftime("%Y_%m_%d_%H_%M_%S_trimmed_directories.txt")
def remove_prefix(text, prefix):
# Remove prefix from supplied text string
if prefix in text:
return text[len(prefix):]
return text
with open(filename, "r") as trim_input, \
open(trim, "a") as trim_output:
for line in trim_input:
print line
if "C" in line:
print line
trim_output = remove_prefix(trim_input, 'C')
trim_output.write(line+ os.linesep)

you've mixed up the variable names, actually I'd expect that if run it would raise some exceptions.
you have trim_output for both the output file and the trimmed line
you are calling remove_prefix on the "input_file_object" not on the line
you get the trimmed line (overriding, I think, the output file refference), but you write the (not trimmed) line to the output file
your code should be something like
with open(filename, "r") as trim_input, \
open(trim, "a") as trim_output:
for line in trim_input:
print line
if "C" in line:
# this if is a bit useless you have an another if inside the remove_prefix,
# also you are skyping all the lines without prefix
print line
trimmed_line = remove_prefix(line, 'C')
trim_output.write(trimmed_line+ os.linesep)
Later edit:
in order to have the code behave as stated in the initial description the "if" should not be present, and the code unindented by one level
also the remove_prefix is flawed
def remove_prefix(text, prefix):
# Remove prefix from supplied text string
if prefix in text:
# if prefix is "C", this is true for "Ctest" and also for "testC"
return text[len(prefix):] # but it removes the first chars
return text
it should be
def remove_prefix(text, prefix):
# Remove prefix from supplied text string
if text and text.startswith(prefix):
return text[len(prefix):]
return text

Related

Search for a string in multiple .csv files from a multiple zipped folders

I'm trying to execute a script that will unzip all files in a zipped folder which has multiple txts and .csv files, search only the .csv files for a string, if it contains that string, copy the entire zipped folder to a new folder, if it doesn't, move on to the next zipped folder. I have several scripts that do part of this but can't piece them together. I am a beginner in python so this script looks like it gets complicated.
This script prints the files in the zipped folder, my next step is to search within the .csv files it contains for the string PROGRAM but I don't know how to code it, I'm thinking it goes at the end of this code since it looks like it's running through a loop.
import os
import pandas as pd
import zipfile
curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '\namedfile.zip')
text_files = zf.infolist()
list_ = []
print ("Uncompressing and reading data... ")
for text_file in text_files:
print(text_file.filename)
I wrote this script separately, searches for the string PROGRAM in a folder that contains .csv files
import os
from pathlib import Path
#Searches the .csv files within the "AllCSVFiles"
#folder for the string "GBSD"
search_path = "./AllCSVFiles"
file_type = ".csv"
search_str = "PROGRAM"
if not (search_path.endswith("/") or search_path.endswith("\\") ):
search_path = search_path + "/"
if not os.path.exists(search_path):
search_path ="."
for fname in os.listdir(path=search_path):
if fname.endswith(file_type):
fo = open(search_path + fname)
line = fo.readline()
line_no = 1
while line != '' :
index = line.find(search_str)
if ( index != -1) :
print(fname, "[", line_no, ",", index, "] ", sep="")
line = fo.readline()
line_no += 1
fo.close()
Is there an easier way to work this code?
I think the first thing is to make sure you know the structure of the solution.
Reading your description, I'd say it's this:
# Create empty list, for marked zip file
# Iterate over zip files
# Unzip
# Iterate over files
# If file ends in .csv
# If file contains SEARCH_STR
# Mark this zip file to be copied
# Stop searching this zip file
# Iterate marked zip files
# Copy zip file to DEST_DIR
If that is the structure, is this enough to help you see where to put your code?
After that, you can clean up your search for search_str in file quite a bit:
with open(search_path + fname) as csv_file:
line_no = 0
for line in csv_file:
line_no += 1
if search_str in line:
search_index = line.index(search_str)
print(f'{fname}[{line_no},{search_index}]')
# Mark the zip file this csv_file is in
# figure out how to stop searching this zip file
for line in csv_file: text files opened in Python have a built-in mechanism for iterating over lines
if search_str in line: if you don't need to know the line exactly where search_str is, simply test for membership, is search_str in the string line?

Find files that had been stored from a single text file

Is there any way for me to read file that I saved inside a text file using python?
For example I have a file called filenames.txt. The content of the file should have name of other files such as:
/home/ikhwan/acespc.c
/home/ikhwan/trloc.cpp
/home/ikhwan/Makefile.sh
/home/ikhwan/Readme.txt
So, theoretically what I want to do is I have a Python script to change some header of the file. So filenames.txt will act as a platform for me whenever I want to run the script to change only selected file. The reason is I have so many files inside directory and subdirectories and I just want python to read only the files that I put inside filenames.txt and only change that particular file. In the future, if I want to run the script on other files, I just can add or replace filenames in filenames.txt
So the flow of the script will be as follows:
Run script-->script start search for the filenames inside filenames.txt-->script will add or change header of the file.
Current, i used os.walk but it will search within all directory and subdirectory. Here are my current function.
def read_file(file):
skip = 0
headStart = None
headEnd = None
yearsLine = None
haveLicense = False
extension = os.path.splitext(file)[1]
logging.debug("File extension is %s",extension)
type = ext2type.get(extension)
logging.debug("Type for this file is %s",type)
if not type:
return None
settings = typeSettings.get(type)
with open(file,'r') as f:
lines = f.readlines()
You don't need to walk through the file system if you already have your file paths listed in the filenames.txt, just open it, read it line by line and then process each file path from it, e.g.
# this is your method that will be called with each file path from the filenames.txt
def process_file(path):
# do whatever you want with `path` in terms of processing
# let's just print it to STDOUT as an example
with open(path, "r") as f:
print(f.read())
with open("filenames.txt", "r") as f: # open filenames.txt for reading
for line in f: # read filenames.txt line by line
process_file(line.rstrip()) # send the path stored on the line to process_file()

Find fileS and then find a string in those files

I have written a function that finds all of the version.php files in a path. I am trying to take the output of that function and find a line from that file. The function that finds the files is:
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
print os.path.join(root,file)
find_file()
There are several version.php files in the path and I would like to return a string from each of those files.
Edit:
Thank you for the suggestions, my implementation of the code didn't fit my need. I was able to figure it out by creating a list and passing each item to the second part. This may not be the best way to do it, I've only been doing python for a few days.
def cmsoutput():
fileList = []
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
fileList.append(os.path.join(root,file))
for path in fileList:
with open(path) as f:
for line in f:
if line.startswith("$wp_version ="):
version_number = line[15:20]
inst_path = re.sub('wp-includes/version.php', '', path)
version_number = re.sub('\';', '', version_number)
print inst_path + " = " + version_number
cmsoutput()
Since you want to use the output of your function, you have to return something. Printing it does not cut it. Assuming everything works it has to be slightly modified as follows:
import os
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
return os.path.join(root,file)
foundfile = find_file()
Now variable foundfile contains the path of the file we want to look at. Looking for a string in the file can then be done like so:
with open(foundfile, 'r') as f:
content = f.readlines()
for lines in content:
if '$wp_version =' in lines:
print(lines)
Or in function version:
def find_in_file(string_to_find, file_to_search):
with open(file_to_search, 'r') as f:
content = f.readlines()
for lines in content:
if string_to_find in lines:
return lines
# which you can call it like this:
find_in_file("$wp_version =", find_file())
Note that the function version of the code above will terminate as soon as it finds one instance of the string you are looking for. If you wanna get them all, it has to be modified.

Deleting a line in multiple files in python

i am a beginner in python and i am practicing at the moment.
So what I want to do is a script that finds a line that I am writing with raw_input and that will search this line in multiple files and delete it.
Something like this but for more files:
word = raw_input("word: ")
f = open("file.txt","r")
lines = f.readlines()
f.close()
f = open("file.txt","w")
for line in lines:
if line!=mail+"\n":
f.write(line)
f.close()
It's an easy task but it's actually hard for me since I can't find an example anywhere.
Instead of reading the entire file into memory, you should iterate through the file and write the lines that are OK to a temporary file. Once you've gone through the entire file, delete it and rename the temporary file to the name of the original file. This is a classic pattern that you'll most likely frequently encounter in the future.
I'd also recommend breaking this down into functions. You should first write the code for removing all occurrences of a line from only a single file. Then you can write another function that simply iterates through a list of filenames and calls the first function (that operates on individual files).
To get the filenames of the all the files in the directory, use os.walk. If you do not want to apply this function to all of the files in the directory, you can set the files variable yourself to store whatever configuration of filenames you want.
import os
def remove_line_from_file(filename, line_to_remove, dirpath=''):
"""Remove all occurences of `line_to_remove` from file
with name `filename`, contained at path `dirpath`.
If `dirpath` is omitted, relative paths are used."""
filename = os.path.join(dirpath, filename)
temp_path = os.path.join(dirpath, 'temp.txt')
with open(filename, 'r') as f_read, open(temp_path, 'w') as temp:
for line in f_read:
if line.strip() == line_to_remove:
continue
temp.write(line)
os.remove(filename)
os.rename(temp_path, filename)
def main():
"""Driver function"""
directory = raw_input('directory: ')
word = raw_input('word: ')
dirpath, _, files = next(os.walk(directory))
for f in files:
remove_line_from_file(f, word, dirpath)
if __name__ == '__main__':
main()
TESTS
All of these files are in the same directory. On the left is what they looked like before running the command, on the right is what they look like afterwards. The "word" I input was Remove this line.
a.txt
Foo Foo
Remove this line Bar
Bar Hello
Hello World
Remove this line
Remove this line
World
b.txt
Nothing Nothing
In In
This File This File
Should Should
Be Changed Be Changed
c.txt
Remove this line
d.txt
The last line will be removed The last line will be removed
Remove this line
something like this should work:
source = '/some/dir/path/'
for root, dirs, filenames in os.walk(source):
for f in filenames:
this_file = open(os.path.join(source, f), "r")
this_files_data = this_file.readlines()
this_file.close()
# rewrite the file with all line except the one you don't want
this_file = open(os.path.join(source, f), "w")
for line in this_files_data:
if line != "YOUR UNDESIRED LINE HERE":
this_file.write(line)
this_file.close()

Opening all files in a directory for CSV and reading each line Python

Hi I am looking for some help with regards to a problem I am having. I want to search the directory that my data is in (shown below) for only certain file types. Below is my code but it is not quite functioning.
The current output is one of two results. Either it prints just the first line of the file, just prints blank result.
OK so here is what I want to do. I want to search the directory listed for only csv files. Then what I want to do is to get the loop to read each file line by line and print each line in the file and then repeat this for the rest of the csv files.
Can anyone please show me how to edit the code below to search only for CSV files and also how to print each line that is in the file and then repeat for the next CSV file untill all the CSV files are found and opened. Is this possible?
import os
rootdir= 'C:\Documents and Settings\Guest\My Documents\Code'
def doWhatYouWant(line):
print line
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file,'r')
lines = f.readlines()
f.close()
f=open(file,'wU')
for lines in lines:
newline=doWhatYouWant(line)
f.write(newline)
f.close
Thanks for your help.
This code below works. I have commented what were modified inline.
import os
rootdir= 'C:\\Documents\ and\ Settings\\Guest\\My\ Documents\\Code\\'
#use '\\' in a normal string if you mean to make it be a '\'
#use '\ ' in a normal string if you mean to make it be a ' '
def doWhatYouWant(line):
print line
return line
#let the function return, not only print, to get the value for use as below
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(rootdir+file,'r') #use the absolute URL of the file
lines = f.readlines()
f.close()
f=open(file,'w') #universal mode can only be used with 'r' mode
for line in lines:
newline=doWhatYouWant(line)
f.write(newline)
f.close()

Categories

Resources