I have a problem regarding extracting in python - python

I have been using the following code to extract the files:
import os, zipfile
extension = ".zip"
for item in os.listdir(dir_name): # loop through items in dir
if item.endswith(extension): # check for ".zip" extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(dir_name) # extract file to dir
zip_ref.close() # close file
os.remove(file_name) # delete
The problem is that all the files inside the zip have the same name. For example:
Zip 1 has names,
"File 1, File 2"
Whereas Zip 2 also has names "Files 1" and "File 2"
After extracting, all my files are getting overwritten by the next file.
Is there any solution to this?
I tried extracting files, expected the files to be extracted, but all the files got overridden.

Use os.mkdir("dir_name") with the same name as the Zip File and then Extract them in this new directory zip_ref.extractall("dir_name")
For eg:- Zip File => my_zip.zip
filename = my_zip.zip
os.mkdir(filename.strip(".zip"))
#rest of the code
zip_ref.extractall(filename.strip(".zip"))

Related

Opening PDF within a zip folder fitz.open()

I have a function that opens a zip file, finds a pdf with a given filename, then reads the first page of the pdf to get some specific text. My issue is that after I locate the correct file, I can't open it to read it. I have tried to use a relative path within the zip folder and a absolute path in my downloads folder and I keep getting the error:
no such file: 'Deliverables_Rev B\Plans_Rev B.pdf'
no such file: 'C:\Users\MyProfile\Downloads\Deliverables_Rev B\Plans_Rev B.pdf'
I have been commenting out the os.path.join line to change between the relative and absolute path as self.prefs['download_path'] returns my download folder.
I'm not sure what the issue with with the relative path is, any insight would be helpful, as I think it has to do with trying to read out of a zipped folder.
import zipfile as ZipFile
import fitz
def getjobcode(self, filename):
if '.zip' in filename:
with ZipFile(filename, 'r') as zipObj:
for document in zipObj.namelist():
if 'plans' in document.lower():
document = os.path.join(self.prefs['download_path'], document)
doc = fitz.open(document)
page1 = doc.load_page(0)
page1text = page1.get_text('text')
jobcode = page1text[page1text.index(
'PROJECT NUMBER'):page1text.index('PROJECT NUMBER') + 30][-12:]
return jobcode
I ended up extracting the zip folder into the downloads folder then parsing the pdf to get the data I needed. Afterwords I created a job folder where I wanted it and moved the extracted folder into it from the downloads folder.

Copy and rename pictures based on xml nodes

I'm trying to copy all pictures from one directory (also including subdirectories) to another target directory. Whenever the exact picture name is found in one of the xml files the tool should grap all information (attributes in the parent and child nodes) and create subdirectories based on those node informations, also it should rename the picture file.
The part when it extracts all the information from the nodes is already done.
from bs4 import BeautifulSoup as bs
path_xml = r"path\file.xml"
content = []
with open(res, "r") as file:
content = file.readlines()
content = "".join(content)
def get_filename(_content):
bs_content = bs(_content, "html.parser")
# some code
picture_path = f'{pm_1}{pm_2}\{pm_3}\{pm_4}\{pm_5}_{pm_6}_{pm_7}\{pm_8}\{pm_9}.jpg'
get_filename(content)
So in the end I get a string value with the directory path and the file name I want.
Now I struggle with opening all xml files in one directory instead of just opening one file. I tryed this:
import os
dir_xml = r"path"
res = []
for path in os.listdir(dir_xml):
if os.path.isfile(os.path.join(dir_xml, path)):
res.append(path)
with open(res, "r") as file:
content = file.readlines()
but it gives me this error: TypeError: expected str, bytes or os.PathLike object, not list
How can i read through all xml files instead of just one? I have hundreds of xml files so that will take a wile :D
And another question: How can i create directories base on string?
Lets say the value of picture_path is AB\C\D\E_F_G\H\I.jpg
I would need another directory path for the destination of the created folders and a function that somehow creates folders based on that string. How can I do that?
To read all XML files in a directory, you can modify your code as follows:
import os
dir_xml = r"path"
for path in os.listdir(dir_xml):
if path.endswith(".xml"):
with open(os.path.join(dir_xml, path), "r") as file:
content = file.readlines()
content = "".join(content)
get_filename(content)
This code uses the os.listdir() function to get a list of all files in the directory specified by dir_xml. It then uses a for loop to iterate over the list of files, checking if each file ends with the .xml extension. If it does, it opens the file, reads its content, and passes it to the get_filename function.
To create directories based on a string, you can use the os.makedirs function. For example:
import os
picture_path = r'AB\C\D\E_F_G\H\I.jpg'
dest_path = r'path_to_destination'
os.makedirs(os.path.join(dest_path, os.path.dirname(picture_path)), exist_ok=True)
In this code, os.path.join is used to combine the dest_path and the directory portion of picture_path into a full path. os.path.dirname is used to extract the directory portion of picture_path. The os.makedirs function is then used to create the directories specified by the path, and the exist_ok argument is set to True to allow the function to succeed even if the directories already exist.
Finally, you can use the shutil library to copy the picture file to the destination and rename it, like this:
import shutil
src_file = os.path.join(src_path, picture_path)
dst_file = os.path.join(dest_path, picture_path)
shutil.copy(src_file, dst_file)
Here, src_file is the full path to the source picture file and dst_file is the full path to the destination. The shutil.copy function is then used to copy the file from the source to the destination.
You can use os.walk() for recursive search of files:
import os
dir_xml = r"path"
for root, dirs, files in os.walk(dir_xml): #topdown=False
for names in files:
if ".xml" in names:
print(f"file path: {root}\n XML-Files: {names}")
with open(names, 'r') as file:
content = file.readlines()

Extract all files from a zipped folder inside a directory to other directory without folder using python

# importing required modules
from zipfile import ZipFile
# specifying the zip file name
file_name = "C:\\OPR\\109521P.zip"
# opening the zip file in READ mode
with ZipFile(file_name, 'r') as zip:
# printing all the contents of the zip file
result =zip.printdir()
# extracting all the files
print('Extracting all the files now...')
zip.extractall('images')
print('Done!')
I have around 10 images zipped inside a sub folder in a zipped folder , now i want to extract all the images directly to other directory without the sub folders , I have tried using os.path.basename(name) , but i'm getting severel errors.
After the above code , I', getting all images inside a folder ,,
C:\images\109521P
Above is the output location where all 10 images are being extracted , Now i want the images to be directly extracted at
C:\images
So i want to omit the sub folder 109521P and want the images to be directly extracted at above loaction.
my_dir = r"C:\OPR"
my_zip = r"C:\OPR\109521P.zip"
with zipfile.ZipFile(my_zip) as zip_file:
for member in zip_file.namelist():
filename = os.path.basename(member)
# skip directories
if not filename:
continue
# copy file (taken from zipfile's extract)
source = zip_file.open(member)
target = open(os.path.join(my_dir, filename), "wb")
with source, target:
shutil.copyfileobj(source, target)
I got the answer , just posting it

The directory name is invalid for my temporary folder error?

I am trying to write a nested try-except statement that opens .zip, .gz, and .tar folder, and also regular. txt files. I am downloading these files from the internet, and extract them to a temporary folder. My code works for the .zip, .gz, and .tar folders, but it says my directory name is invalid for the regular .txt files I'm trying to extract to the temporary folder. Here is my code.
def function(file_download_url):
urllib.request.urlopen(file_download_url) #download the zipped file
dirpath = tempfile.mkdtemp() #Generate a temporary directory
#Download the URL as an temporary file that has to be deleted later on
with urllib.request.urlopen(file_download_url) as response:
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
shutil.copyfileobj(response, tmp_file)
print(tmp_file.name) #To display the name of the temporary file created
#Try-Except statement that extracts compressed files to a temporary directory generated
try:
#If it's a .zip file
with ZipFile(tmp_file.name) as my_zip_file: #Open up the downloaded zipped file
my_zip_file.extractall(dirpath) #extract the support bundle to the temporary directory created
path = dirpath #Make the temporary directory the path for searching
except:
try:
#If it's a .gz or .tar file
with tarfile.open(tmp_file.name) as tar:
tar.extractall(dirpath)
path = dirpath
print(path)
except:
#If it's just a .txt or .log file
source = tmp_file.name
dest = dirpath
files = os.listdir(source) #Here is where the error "The directory name is invalid" occurs
for f in files:
shutil.move(source, dest)
The problem line is files = os.listdir(source), you cannot perform a listdir
operation on a single file. You should instead skip right to the move operation:
...
except:
#If it's just a .txt or .log file
source = tmp_file.name
dest = dirpath
shutil.move(source, dest)

python Batch Renames files in mac

I'm trying to write my first script.
I have been reading about python but I am stock.
I'm trying to write a script that will rename all the file names in a specific folder.
this is what I have so far:
import os
files = os.listdir('files_to_Change')
print (files)
Get all the file names from folder:
for i in files:
if i == ".DS_Store":
p = files.index(".DS_Store")
del files[p]
If mac invisible file exists delete from list (maybe a mistake here).
for i in files:
oldName = i
fileName, fileExtension = os.path.splitext(i)
print (oldName)
print (fileName)
os.rename(oldName,fileName)
This is where I am stock, I get this error:
Output:
FileNotFoundError: [Errno 2] No such file or directory: 'File.1'
On the above part I'm just removing the file extension, but that is only the beginning.
I'm also trying to substitute every point by a space and make the first letter of every word a capital.
Can anyone point me in the right direction?
Thanks so much
In your example, when you get a list of files in a files_to_Change directory, you get file names without the directory name:
>>> files = os.listdir('test_folder')
>>> print files[0]
.com.apple.timemachine.supported
So in order to get the full path to that file, from whereever you're in your directory tree, you should join the directory name (files_to_Change) with the file name:
import os
join = os.path.join
src = 'files_to_Change'
files = os.listdir( src )
for i in files:
old = i
new, ext = os.path.splitext ( old )
os.rename( join( src, old ), join( src, fileName ))

Categories

Resources