Python tarfile.extract func not extracting content of directory - python

I'm trying to extract a directory from tarfile using python. But some/ALL of its files inside that directory are missing after extraction. Only pathname got extracted (ie, I get folder home inside /tmp/myfolder but its empty)
Code is as follwing:
for tar in tarfiles:
mytar = tarfile.open(tar)
for file in mytar:
if file == "myfile":
mytar.extract('home', /tmp/myfolder)

Found a fix, by default extract only extracts path of variable, I can get content with
tar.extractall(members=members(tar))
Reference:
https://stackoverflow.com/a/43094365/20223973

Related

Modify all files in specified directory (including subfolders) and saving them in new directory while presevering folder structure (Python)

(I'm new to python so please excuse the probably trivial question. I tried my best looking for similar issues but suprisingly couldn't find someone with the same question.)
I'm trying to build a simple static site generator in Python. The script should take all .txt files in a specific directory (including subfolders), paste the content of each into a template .html file and then save all the newly generated .html files into a new directory while recreating the folder structure of the original directory.
So for I got the code which does the conversion itself for a single file but I'm unsure how to do it for multiple files in a directory.
with open('template/page.html', 'r') as template:
templatedata = template.read()
with open('content/content.txt', 'r') as content:
contentdata = content.read()
pagedata = templatedata.replace('!PlaceholderContent!', contentdata)
with open('www/content.html', 'w') as output:
output.write(pagedata)
To manipulate files and directories, you will need to import some system functionalites under the built-in module os.
import os
The functionalities under the os module include :
Listing the content of a directory :
path_to_template_dir = 'template/'
template_files = os.listdir(path_to_template_dir)
print(template_files)
# Outputs : ['page.html']
Creating a directory (If it does not already exist) :
path_to_output_dir = 'www/'
try :
os.mkdir(path_to_output_dir)
except FileExistsError as e:
print('Directory exists:', path_to_output_dir)
And since you know the names of the directories you want to use, and using these two functions, you now know the names of the files you want to use and generate, you can now concatenate the name of each file to the names of its directories to create the string str of the final file path, which you can then open() for reading and/or writing.
It's hard to give a perfect code example for your question since the logic of how you want to manipulate each of the template and content file is missing, but here is an example for writing a file inside the newly created directory :
path_to_output_file = path_to_output_dir + 'content.html'
with open(path_to_output_file, 'w') as output:
output.write('Content')
And an example for reading all the template files inside the template/ directory and then printing them to the screen.
for template_file in template_files:
path_to_template_file = path_to_template_dir + template_file
with open(path_to_template_file, 'r') as template:
print(template.read())
In the end, manipulating files is all about creating the path string you want to read from or write to, and then accessing it.
Anymore functionalities you might need (for example : checking if a path is a file os.path.isfile() or if it's for a directory os.path.isdir() can be found under the os module.

extract zip file without folder python

I am currently using extratall function in python to unzip, after unziping it also creates a folder like: myfile.zip -> myfile/myfile.zip , how do i get rid of myfile flder and just unzip it to the current folder without the folder, is it possible ?
I use the standard module zipfile. There is the method extract which provides what I think you want. This method has the optional argument path to either extract the content to the current working directory or the the given path
import os, zipfile
os.chdir('path/of/my.zip')
with zipfile.ZipFile('my.zip') as Z :
for elem in Z.namelist() :
Z.extract(elem, 'path/where/extract/to')
If you omit the 'path/where/extract/to' the files from the ZIP-File will be extracted to the directory of the ZIP-File.
import shutil
# loop over everything in the zip
for name in myzip.namelist():
# open the entry so we can copy it
member = myzip.open(name)
with open(os.path.basename(name), 'wb') as outfile:
# copy it directly to the output directory,
# without creating the intermediate directory
shutil.copyfileobj(member, outfile)

Searching a folder structure and modifying XML files using Python

I am trying to create a python script that will iterate through a folder structure, find folders named 'bravo', and modify the xml files contained within them.
In the xml files, I want to modify the 'location' attribute of a tag, called 'file'. Such as:
<file location="e:\one\two"/>
I just need to change the drive letter of the file path from ‘e’ to ‘f’. So that it will read:
<file location="f:\one\two"/>
However...
The name of these xml files are unique, so I cannot search for the exact xml file name. Instead I am searching by the xml file type.
Also, there are other xml files in my folder structure, without the ‘file’ tag reference, that I wish to ignore.
The only constant is that the xml files I want to modify are all stored in folders named, ‘bravo’.
I also wish to create a log file that lists all the xml files and their filepaths which have successfully been updated (and preferably the ones that failed).
Using answers to similar questions on this site, I have cobbled together the following script.
In its current state, the script trys to modify every xml files it finds. I have not been able to successfully add code that only searches folders called, ‘bravo'.
When the script modifies an xml file, not in a 'bravo' folder, it errors because these files do not contain a 'file' tag.
Please could someone help me to correct my script (or create a new one).
Here is an example of the folder structure...
My folder structure
And my script so far...
from xml.dom import minidom
import os
# enter the directory where to start search for xml files...
for root, dirs, files in os.walk("c:/temp"):
for file in files:
#search for xml files...
if file.endswith(".xml"):
xml_file = file
xmldoc = minidom.parse(os.path.join(root, xml_file))
# in the xml file look for tag called "file"...
file_location = xmldoc.getElementsByTagName("file")
# i don't understand the next line of code, but it's needed
file_location = file_location[0]
# 'location_string' is a variable for the 'location' path of the file tag in the xml document
location_string = (file_location.attributes["location"].value)
# the new drive letter is added to the location_string to create 'new_location'
new_location = "f" + location_string[1:]
# replace the 'location' value of the file tag with the new location...
file_location.attributes["location"].value = new_location
# write the change to the original file
with open((os.path.join(root, xml_file)),'w') as f:
f.write(xmldoc.toxml())
print "%s has been updated!" % (os.path.join(root, xml_file))
# add updated file name to log...
log_file = open("filepath_update_log.txt", "a")
log_file.write("%s\n" % (os.path.join(root, xml_file)))
log_file.close
Test if the directory name fits, before your second loop. You'd have to get the last directory in the path first. As in: How to get only the last part of a path in Python?
if os.path.basename(os.path.normpath(root)) == "bravo":
You could use the https://docs.python.org/3/library/logging.html module for logging.
If you only want to replace a single letter, then maybe you can directly replace it instead of parsing xml. As suggested in: https://stackoverflow.com/a/17548459/7062162
def inplace_change(filename, old_string, new_string):
# Safely read the input filename using 'with'
with open(filename) as f:
s = f.read()
if old_string not in s:
print('"{old_string}" not found in {filename}.'.format(**locals()))
return
# Safely write the changed content, if found in the file
with open(filename, 'w') as f:
print('Changing "{old_string}" to "{new_string}" in {filename}'.format(**locals()))
s = s.replace(old_string, new_string)
f.write(s)

Traverse a directory over http

Suppose I have a url http://example.com/result, which will open a page,has some(number of directory could be one,two,three... any number of directories) directories. I want to traverse each directory and find out the new.txt file,which can be any where inside a dir or sub dir....
http://example.com/result has following dir:
security
major
minor
fails
logs
..
I need to find the new.txt inside every dir and want to read the content.
All the directories (security/major/...etc) might have sub dir also.
I need to find the new.txt inside a dir or sub directory.
If you want to do with python then you have to use urllib.
Check for the headers of each page. For directory and file there will be link tag. Go to that link tag and check for the headers. It might be possible that headers for file and directory will be different.
If its directory then recursive call the same function and check for each file in that directory.

Extract only a single directory from tar (in python)

I am working on a project in python in which I need to extract only a subfolder of tar archive not all the files.
I tried to use
tar = tarfile.open(tarfile)
tar.extract("dirname", targetdir)
But this does not work, it does not extract the given subdirectory also no exception is thrown. I am a beginner in python.
Also if the above function doesn't work for directories whats the difference between this command and tar.extractfile() ?
Building on the second example from the tarfile module documentation, you could extract the contained sub-folder and all of its contents with something like this:
with tarfile.open("sample.tar") as tar:
subdir_and_files = [
tarinfo for tarinfo in tar.getmembers()
if tarinfo.name.startswith("subfolder/")
]
tar.extractall(members=subdir_and_files)
This creates a list of the subfolder and its contents, and then uses the recommended extractall() method to extract just them. Of course, replace "subfolder/" with the actual path (relative to the root of the tar file) of the sub-folder you want to extract.
The other answer will retain the subfolder path, meaning that subfolder/a/b will be extracted to ./subfolder/a/b. To extract a subfolder to the root, so subfolder/a/b would be extracted to ./a/b, you can rewrite the paths with something like this:
def members(tf):
l = len("subfolder/")
for member in tf.getmembers():
if member.path.startswith("subfolder/"):
member.path = member.path[l:]
yield member
with tarfile.open("sample.tar") as tar:
tar.extractall(members=members(tar))

Categories

Resources