I have two tar.gz files, 2014_SRS.tar.gz and 2013_SRS.tar.gz. Each of the files contains a folder called SRS, which is full of text files. I downloaded these from an ftp server. I want to unzip them automatically in Python. This is my code:
import re
import ftplib
import os
import time
import tarfile
import sys
print('1')
tar = tarfile.open('2014_SRS.tar.gz')
tar.extractall()
tar.close()
print('2')
tar = tarfile.open('2013_SRS.tar.gz')
tar.extractall()
tar.close()
print('3')
This code only opens the second file. How do I fix it to open both files?
Also, I tried using a for loop to run through the whole directory. The code is shown below.
for i in os.listdir(os.getcwd()):
if i.endswith(".tar.gz"):
tar = tarfile.open(i, "r:gz")
tar.extractall()
tar.close()
However this gave me an EOFError. In addition, before I ran bit of code, I was able to unzip both files manually. However, after I run it, and after the code gives me an error, I cannot unzip the 2014_SRS file manually anymore. How do I fix this?
While this may not answer your specific question as to why both files could not be unzipped with your code , the following is one way to unzip a list of tar.gz files.
import tarfile, glob
srcDir = "/your/src/directory"
dstDir = "/your/dst/directory"
for f in glob.glob(srcDir + "/*.gz"):
t = tarfile.open(f,"r:gz")
for member in t.getmembers():
t.extract(member,dstDir)
t.close()
Related
I would like to package a folder into a file, I do not need compression. All alternatives I tried were slow.
I have tried:
The zipfile library with ZIP_STORED (no compression)
import zipfile
output_filename="folder.zip"
source_dir = "folder"
with zipfile.ZipFile(output_filename, 'w', zipfile.ZIP_STORED) as zipf:
zipdir(source_dir, zipf)
The tarfile library also using w to open the file for writing
without compression
import tarfile
import os
output_filename="folder.tar"
source_dir = "folder"
with tarfile.open(output_filename, "w") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
But both still take ~3-5 minutes to package a folder that is ~5GB and has < 10 files in it.
I am using a Linux machine.
Is there a faster way?
I am not quite sure if it is that faster but if you are running linux you could try tar command:
import time
import os
start = time.time()
os.system("tar -cvf name.tar /path/to/directory")
end = time.time()
print("Elapsed time: %s"%(end - start,))
If you also need file compression you need to add gzip after the first command:
os.system("gzip name.tar")
I got more than 1000 zip files in the same folder with naming convention output_MOJIBAKE
Example name: output_0aa3199eca63522b520ecfe11a4336eb_20210122_181742
How can I unzip them using Python?
Try this and let me know if it worked.
import os
import zipfile
path = 'path/to/your/zip/files'
os.chdir(path)
for file in os.listdir('.'):
with zipfile.ZipFile(file, 'r') as zip_ref:
zip_ref.extractall('.')
I am interested in getting this script to open an excel file, and save it again as a .csv or .txt file. I'm pretty sure the problem with this is the iteration - I haven't coded it correctly to iterate properly over the contents of the folder. I am new to Python, and I managed to get this code to sucessfully print a copy of the contents of the items in the folder by the commented out part. Can someone please advise what needs to be fixed?
My error is: raise XLRDError('Unsupported format, or corrupt file: ' + msg)
from xlrd import open_workbook
import csv
import glob
import os
import openpyxl
cwd= os.getcwd()
print (cwd)
FileList = glob.glob('*.xlsx')
#print(FileList)
for i in FileList:
rb = open_workbook(i)
wb = copy(rb)
wb.save('new_document.csv')
I would just use:
import pandas as pd
import glob
import os
file_list = glob.glob('*.xlsx')
for file in file_list:
filename = os.path.split(file, )[1]
pd.read_excel(file).to_csv(filename.replace('xlsx', 'csv'), index=False)
It appears that your error is related to the excel files, not because of your code.
Check that your files aren't also open in Excel at the same time.
Check that your files aren't encrypted.
Check that your version of xlrd supports the files you are reading
In the above order. Any of the above could have caused your error.
I'm trying to extract all from a tar.gz file into the same Directory. The following code works to extract all, but the files are stored in the working directory instead of the path I entered as name.
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall()
tar.close()
How do I make sure the extracted files are saved in the directory path where I need them? I've been trying at this for ages, I really can't see why this doesn't work.
You should use:
import tarfile
zip_rw_data = r"P:\Lehmann\Test_Python_Project\RW_data.tar.gz"
tar = tarfile.open(name=zip_rw_data, mode='r')
tar.extractall(path=r"P:\Lehmann\Test_Python_Project")
tar.close()
You can try using shutil.unpack_archive
def extract_all(archives, extract_path):
for filename in archives:
shutil.unpack_archive(filename, extract_path)
I have a series of tar files that I wish to extract their containing data into a new directory. I want this directory to be an edited version of the original tar file name.
import tarfile
import glob
import os
for file in glob.glob("*.tar"):
# Open file
tar = tarfile.open(file, "r:")
# Create new diretory with name of tar file (minus .tar)
new_dir = file[0:-4]
os.makedirs(new_dir)
tar.extractall()
os.chdir(new_dir)
This works fine up until the tar.extractall() part. Is there a way to directly extract the tar file into the target directory or am I forced to extract all and then move the files across?
new_dir = "path"
tar.extractall(path=new_dir)