multy line string concatenation and writing to a text file - python

I am using the os library of python to help me do the following:
Ask the user for a path.
Print all the directories and files included in it.
Save the information in a text file.
this is my code:
import os
text = 'List:'
def print_tree(dir_path,text1):
for name in os.listdir(dir_path):
full_path = os.path.join(dir_path, name)
x = name.find('.')
if x!= -1:
print name #replace with full path if needed
text1 = text1 + name
else:
print '------------------------------------'
text1 = text1 + '------------------------------------'
print name
text1 = text1 + name
if os.path.isdir(full_path):
os.path.split(name)
print '------------------------------------'
text1 = text1 + '------------------------------------'
print_tree(full_path,text1)
path = raw_input('give me a dir path')
print_tree(path,text)
myfile = open('text.txt','w')
myfile.write(text)
I have two problems. First, although there's no error whatsoever, the only thing that actually exists in the text file after running this is 'List:'. Also i don't know how to use string concatenation in order to put each file name on a different line. What am i missing? How can i accomplish this?

Strings are immutable in Python, and the += operator on them is just an illusion. You can concatenate a string all you want in the function, but unless you return it, the string outside the function will not change: text1 = text1 + 'blah' creates a new string, and assigns its reference to text1. The string outside the function has not changed. The solution is to build up a string and then return it:
import os
text = 'List:' + os.linesep
def print_tree(dir_path,text1):
for name in os.listdir(dir_path):
full_path = os.path.join(dir_path, name)
x = name.find('.')
if x!= -1:
print name #replace with full path if needed
text1 = text1 + name + os.linesep
else:
print '------------------------------------'
text1 = text1 + '------------------------------------' + os.linesep
print name
text1 = text1 + name + os.linesep
if os.path.isdir(full_path):
os.path.split(name)
print '------------------------------------'
text1 = text1 + '------------------------------------' + os.linesep
text1 = print_tree(full_path,text1)
return text1
path = raw_input('give me a dir path')
text = print_tree(path,text)
myfile = open('text.txt','w')
myfile.write(text)
I have also take the liberty of appending os.linesep to your concatenated strings. This is done by default by print, so if you want things to look the same, it is a good idea.

Related

How to have all possibilities of path with specific beginning?

I have to check a variable file at a location :
variable = "40014ee0aee34570"
os.path.realpath(/dev/disk/by-id/wwn-0x{0}'.format(variable))
I need to check if there is also 40014ee0aee34570-part1, 40014ee0aee34570-part2 and etc.
For now I can do it like this
os.path.realpath('/dev/disk/by-id/wwn-0x{0}{1}'.format(variable, '-part1')
But how I can do check for every possibility number after part in this line programatically ?
Thank you
I would suggest using glob module.
import glob
variable = "40014ee0aee34570"
# file wildcard, use * at the end to get all suffixes
file_wildcard = "/dev/disk/by-id/wwn-0x{0}*".format(variable)
possible_file_paths = glob.glob(file_wildcard)
for file_path in possible_file_paths:
os.path.realpath(file_path)
You can simply concatenate srtings with +
Example :
string1 = "I am "
string2 = "foo"
string3 = string1 + string2
print(string3)
OUT[1]:
>> I am foo
Thus, you can use it to generate your path to your file programatically with all the components (root path , variable name, suffixes, numbers ,etc..) :
import os
path = r"/dev/disk/by-id/"
file_prefix = "wwn-0x"
variable = "40014ee0aee34570"
file_suffix = "-part"
for number in range(0,10):
file = os.path.join(path , file_prefix + variable + file_suffix + str(number))
if os.path.exists(file):
print(f"{file} exists")
else :
print(f"no file with that part number {number}")

Name definition error when defining variables in function

I have a bit of code that accepts a .csv with a list of filenames as an input, then breaks the filename down into its component parts and re-orders them along with some additional characters.
Input Example:
3006419_3006420_ENG_FRONT.jpg
Output Example:
;E3006419_3006420_FRONT_Image_Container;
However, I'd like to make the portion of the for loop that splits up the filename into a function that I can call elsewhere, so that I can re-use it in a second for loop that outputs in a different format. When I try to define a function, though, it seems I have a scoping error with my variables and can't use them in my output.write statement.
Working Code
from csv import reader
import sys
if len(sys.argv) != 2:
print('USAGE ERROR:\nRun like "python <script.py> <input file.csv>"') #error message if code is not run with correct number of arguments
exit()
file = open(sys.argv[1]) #open input file
output = open('output.impex','w+') #define output impex file
for line in file:
nameAndExtension = line.split('.') #split file into filename and file extension
name = nameAndExtension[0]
extension = nameAndExtension[1].replace('\n','') #save file extension as variable extension and remove \n
elements = name.split('_') #split filename into constituent elements. Filenames are formatted as PARENTSKU_CHILDSKU_LANG_ANGLE.extension, eg '3006419_3006420_ENG_FRONT.jpg'
parentSKU = elements[0]
childSKU = elements[1]
lang = elements[2]
angle = elements[3]
output.write(";E" + parentSKU + "_" + childSKU + "_" + angle + '_Image_Container;\n')
Non-Working Code:
from csv import reader
import sys
if len(sys.argv) != 2:
print('USAGE ERROR:\nRun like "python <script.py> <input file.csv>"') #error message if code is not run with correct number of arguments
exit()
file = open(sys.argv[1]) #open input file
output = open('output.impex','w+') #define output impex file
def lineSplitting(x):
nameAndExtension = x.split('.') #split file into filename and file extension
name = nameAndExtension[0]
extension = nameAndExtension[1].replace('\n','') #save file extension as variable extension and remove \n
elements = name.split('_') #split filename into constituent elements. Filenames are formatted as PARENTSKU_CHILDSKU_LANG_ANGLE.extension, eg '3006419_3006420_ENG_FRONT.jpg'
parentSKU = elements[0]
childSKU = elements[1]
lang = elements[2]
angle = elements[3]
for line in file:
lineSplitting(line)
output.write(";E" + parentSKU + "_" + childSKU + "_" + angle + '_Image_Container;\n')
I get "NameError: name 'parentSKU' is not defined" I think because the of the variable scope - but I don't know what I need to do to make the variable re-usable in the for-loop. What do I need to do to make all that splitting and variable definition into a function?
you should return your value from the function
def lineSplitting(x):
nameAndExtension = x.split('.') #split file into filename and file extension
name = nameAndExtension[0]
extension = nameAndExtension[1].replace('\n','') #save file extension as variable extension and remove \n
elements = name.split('_') #split filename into constituent elements. Filenames are formatted as PARENTSKU_CHILDSKU_LANG_ANGLE.extension, eg '3006419_3006420_ENG_FRONT.jpg'
parentSKU = elements[0]
childSKU = elements[1]
lang = elements[2]
angle = elements[3]
return parentSKU,childSKU,angle
and the next code will call the function
for line in file:
parentSKU,childSKU,angle =lineSplitting(line)
output.write(";E" + parentSKU + "_" + childSKU + "_" + angle + '_Image_Container;\n')

Find files in a directory containing desired string in Python

I'm trying to find a string in files contained within a directory. I have a string like banana that I know that exists in a few of the files.
import os
import sys
user_input = input("What is the name of you directory?")
directory = os.listdir(user_input)
searchString = input("What word are you trying to find?")
for fname in directory: # change directory as needed
if searchString in fname:
f = open(fname,'r')
print('found string in file %s') %fname
else:
print('string not found')
When the program runs, it just outputs string not found for every file. There are three files that contain the word banana, so the program isn't working as it should. Why isn't it finding the string in the files?
You are trying to search for string in filename, use open(filename, 'r').read():
import os
user_input = input('What is the name of your directory')
directory = os.listdir(user_input)
searchstring = input('What word are you trying to find?')
for fname in directory:
if os.path.isfile(user_input + os.sep + fname):
# Full path
f = open(user_input + os.sep + fname, 'r')
if searchstring in f.read():
print('found string in file %s' % fname)
else:
print('string not found')
f.close()
We use user_input + os.sep + fname to get full path.
os.listdir gives files and directories names, so we use os.path.isfile to check for files.
Here is another version using the Path module from pathlib instead of os.
def search_in_file(path,searchstring):
with open(path, 'r') as file:
if searchstring in file.read():
print(f' found string in file {path.name}')
else:
print('string not found')
from pathlib import Path
user_input = input('What is the name of your directory')
searchstring = input('What word are you trying to find?')
dir_content = sorted(Path(user_input).iterdir())
for path in dir_content:
if not path.is_dir():
search_in_file(path, searchstring)
This is my solution for the problem. It comes with the feature of also checking in sub-directories, as well as being able to handle multiple file types. It is also quite easy to add support for other ones. The downside is of course that it's quite chunky code. But let me know what you think.
import os
import docx2txt
from pptx import Presentation
import pdfplumber
def findFiles(strings, dir, subDirs, fileContent, fileExtensions):
# Finds all the files in 'dir' that contain one string from 'strings'.
# Additional parameters:
# 'subDirs': True/False : Look in sub-directories of your folder
# 'fileContent': True/False :Also look for the strings in the file content of every file
# 'fileExtensions': True/False : Look for a specific file extension -> 'fileContent' is ignored
filesInDir = []
foundFiles = []
filesFound = 0
if not subDirs:
for filename in os.listdir(dir):
if os.path.isfile(os.path.join(dir, filename).replace("\\", "/")):
filesInDir.append(os.path.join(dir, filename).replace("\\", "/"))
else:
for root, subdirs, files in os.walk(dir):
for f in files:
if not os.path.isdir(os.path.join(root, f).replace("\\", "/")):
filesInDir.append(os.path.join(root, f).replace("\\", "/"))
print(filesInDir)
# Find files that contain the keyword
if filesInDir:
for file in filesInDir:
print("Current file: "+file)
# Define what is to be searched in
filename, extension = os.path.splitext(file)
if fileExtensions:
fileText = extension
else:
fileText = os.path.basename(filename).lower()
if fileContent:
fileText += getFileContent(file).lower()
# Check for translations
for string in strings:
print(string)
if string in fileText:
foundFiles.append(file)
filesFound += 1
break
return foundFiles
def getFileContent(filename):
'''Returns the content of a file of a supported type (list: supportedTypes)'''
if filename.partition(".")[2] in supportedTypes:
if filename.endswith(".pdf"):
content = ""
with pdfplumber.open(filename) as pdf:
for x in range(0, len(pdf.pages)):
page = pdf.pages[x]
content = content + page.extract_text()
return content
elif filename.endswith(".txt"):
with open(filename, 'r') as f:
content = ""
lines = f.readlines()
for x in lines:
content = content + x
f.close()
return content
elif filename.endswith(".docx"):
content = docx2txt.process(filename)
return content
elif filename.endswith(".pptx"):
content = ""
prs = Presentation(filename)
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
content = content+shape.text
return content
else:
return ""
supportedTypes = ["txt", "docx", "pdf", "pptx"]
print(findFiles(strings=["buch"], dir="C:/Users/User/Desktop/", subDirs=True, fileContent=True, fileExtensions=False))
Here is the most simple answer I can give you. You don't need the colors, they are just cool and you may find that you can learn more than one thing in my code :)
import os
from time import sleep
#The colours of the things
class bcolors:
HEADER = '\033[95m'
OKBLUE = '\033[94m'
OKGREEN = '\033[92m'
WARNING = '\033[93m'
FAIL = '\033[91m'
ENDC = '\033[0m'
BOLD = '\033[1m'
UNDERLINE = '\033[4m'
# Ask the user to enter string to search
search_path = input("Enter directory path to search : ")
file_type = input("File Type : ")
search_str = input("Enter the search string : ")
# Append a directory separator if not already present
if not (search_path.endswith("/") or search_path.endswith("\\") ):
search_path = search_path + "/"
# If path does not exist, set search path to current directory
if not os.path.exists(search_path):
search_path ="."
# Repeat for each file in the directory
for fname in os.listdir(path=search_path):
# Apply file type filter
if fname.endswith(file_type):
# Open file for reading
fo = open(search_path + fname, 'r')
# Read the first line from the file
line = fo.read()
# Initialize counter for line number
line_no = 1
# Loop until EOF
if line != '' :
# Search for string in line
index = line.find(search_str)
if ( index != -1) :
print(bcolors.OKGREEN + '[+]' + bcolors.ENDC + ' ', fname, sep="")
print(' ')
sleep(0.01)
else:
print(bcolors.FAIL + '[-]' + bcolors.ENDC + ' ', fname, ' ', 'does not contain', ' ', search_str, sep="")
print(" ")
sleep(0.01)
line = fo.readline()
# Increment line counter
line_no += 1
# Close the files
fo.close()
That is it!
I was trying with the following code for this kind of problem, please have a look.
import os,sys
search_path=input("Put the directory here:")
search_str = input("Enter your string")
# Append a directory separator if not already present
if not (search_path.endswith("/") or search_path.endswith("\\") ):
search_path = search_path + "/"
# If path does not exist, set search path to current directory
if not os.path.exists(search_path):
search_path ="."
# Repeat for each file in the directory
for fname in os.listdir(path=search_path):
# Apply file type filter
if fname.endswith(file_type):
# Open file for reading
fo = open(search_path + fname)
# Read the first line from the file
line = fo.readline()
# Initialize counter for line number
line_no = 1
# Loop until EOF
while line != '' :
# Search for string in line
index = line.find(search_str)
if ( index != -1) :
print(fname, "[", line_no, ",", index, "] ", line, sep="")
# Read next line
line = fo.readline()
# Increment line counter
line_no += 1
# Close the files
fo.close()

epub3 : how to add the mimetype at first in archive

I'm working on a script to create epub from html files, but when I check my epub I have the following error : Mimetype entry missing or not the first in archive
The Mimetype is present, but it's not the first file in the epub. Any idea how to put it in first place in any case using Python ?
Sorry, I don't have the time right now to give a detailed explanation, but here's a (relatively) simple epub processing program I wrote a while ago that shows how to do that.
epubpad.py
#! /usr/bin/env python
''' Pad the the ends of paragraph lines in an epub file with a single space char
Written by PM 2Ring 2013.05.12
'''
import sys, re, zipfile
def bold(s): return "\x1b[1m%s\x1b[0m" % s
def report(attr, val):
print "%s '%s'" % (bold(attr + ':'), val)
def fixepub(oldname, newname):
oldz = zipfile.ZipFile(oldname, 'r')
nlist = oldz.namelist()
#print '\n'.join(nlist) + '\n'
if nlist[0] != 'mimetype':
print bold('Warning!!!'), "First file is '%s', not 'mimetype" % nlist[0]
#get the name of the contents file from the container
container = 'META-INF/container.xml'
# container should be in nlist
s = oldz.read(container)
p = re.compile(r'full-path="(.*?)"')
a = p.search(s)
contents = a.group(1)
#report("Contents file", contents)
i = contents.find('/')
if i>=0:
dirname = contents[:i+1]
else:
#No directory separator in contents name!
dirname = ''
report("dirname", dirname)
s = oldz.read(contents)
#print s
p = re.compile(r'<dc:creator.*>(.*)</dc:creator>')
a = p.search(s)
creator = a.group(1)
report("Creator", creator)
p = re.compile(r'<dc:title>(.*)</dc:title>')
a = p.search(s)
title = a.group(1)
report("Title", title)
#Find the names of all xhtml & html text files
p = re.compile(r'\.[x]?htm[l]?')
htmnames = [i for i in nlist if p.search(i) and i.find('wrap')==-1]
#Pattern for end of lines that don't need padding
eolp = re.compile(r'[>}]$')
newz = zipfile.ZipFile(newname, 'w', zipfile.ZIP_DEFLATED)
for fname in nlist:
print fname,
s = oldz.read(fname)
if fname == 'mimetype':
f = open(fname, 'w')
f.write(s)
f.close()
newz.write(fname, fname, zipfile.ZIP_STORED)
print ' * stored'
continue
if fname in htmnames:
print ' * text',
#Pad lines that are (hopefully) inside paragraphs...
newlines = []
for line in s.splitlines():
if len(line)==0 or eolp.search(line):
newlines.append(line)
else:
newlines.append(line + ' ')
s = '\n'.join(newlines)
newz.writestr(fname, s)
print
newz.close()
oldz.close()
def main():
oldname = len(sys.argv) > 1 and sys.argv[1]
if not oldname:
print 'No filename given!'
raise SystemExit
newname = len(sys.argv) > 2 and sys.argv[2]
if not newname:
if oldname.rfind('.') == -1:
newname = oldname + '_P'
else:
newname = oldname.replace('.epub', '_P.epub')
newname = newname.replace(' ', '_')
print "Processing '%s' to '%s' ..." % (oldname, newname)
fixepub(oldname, newname)
if __name__ == '__main__':
main()
FWIW, I wrote this program to process files for my simple e-reader that annoyingly joins paragraphs together if they don't end with white space.
The solution I've found:
delete the previous mimetype file
when creating the new archive create an new mimetype file before adding anything else : zipFile.writestr("mimetype", "application/epub+zip")
Why does it work : the mimetype is the same for all epub : "application/epub+zip", no need to use the original file.

How do I modify a filepath using the os.path module?

My code
import os.path #gets the module
beginning = input("Enter the file name/path you would like to upperify: ")
inFile = open(beginning, "r")
contents = inFile.read()
moddedContents = contents.upper() #makes the contents of the file all caps
head,tail = os.path.split(beginning) #supposed to split the path
new_new_name = "UPPER" + tail #adds UPPER to the file name
final_name = os.path.join(head + new_new_name) #rejoins the path and new file name
outFile = open(final_name, "w") #creates new file with new capitalized text
outFile.write(moddedContents)
outFile.close()
I'm just trying to change the file name to add UPPER to the beginning to the file name via os.path.split(). Am I doing something wrong?
Change
final_name = os.path.join(head + new_new_name)
to
final_name = head + os.sep + new_new_name
head from os.path.split doesn't have a trailing slash in the end. When you join the head and new_new_name by concatenating them
head + new_new_name
you don't add that missing slash, so the whole path becomes invalid:
>>> head, tail = os.path.split('/etc/shadow')
>>> head
'/etc'
>>> tail
'shadow'
>>> head + tail
'/etcshadow'
The solution is to use os.path.join properly:
final_name = os.path.join(head, new_new_name)

Categories

Resources