Open .json.xz files

Open .json.xz files - python

I have a list of directories, in which are contained sub-directories. In each sub-directories there are some 'json.xz' compressed file. If I try to open one of them with my code I get the error:
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
This is my code:
subject = 'AntonioGio'
path = '/home/rootdebian/Scrivania/Socialisys/projects/'+subject+'/competitor/'
for competitors in os.listdir(path):
for f in os.listdir(path+competitors):
if f.endswith('.xz'):
with lzma.open(path+competitors+'/'+f) as f:
json_bytes = f.read()
stri = json_bytes.decode('utf-8')
data = json.loads(stri)
print(data)
what is the best way to fix it? Thank you in advice.

This is probably because the compressed data file you have is incomplete/corrupted. The code you have provided works fine for decompressing json.xz files.

Related

How to print the content of the last saved text file in a folder using Python

I want to print the content of the last saved text file in a folder using Python. I wrote the below code. It is printing out only the path of the file but not the content.
folder_path = r'C:\Users\Siciid\Desktop\restaurant\bill'
file_type = r'\*txt'
files = glob.glob(folder_path + file_type)
max_file = max(files, key=os.path.getctime)
filename=tempfile.mktemp('.txt')
open(filename,'w').write(max_file)
os.startfile(filename,"print")
Is it possible to do this in Python. Any suggestion. I would appreciate your help. Thank you.

You can do that using the following code. Just replace the line where you open and write a file with these two lines:
with open(max_file, "r") as f, open(filename, 'w') as f2:
f2.write(f.read())
The max_file variable contains a file name, not the contents of the file, so writing it to the temp file and printing that will simply print the file name instead of its contents. To put its contents into the temporary file, you need to open the file and then read it. That is what the above two lines of code do.

How to speed up zip file extraction in Python

I'm trying to extract data from a zip file in Python, but it's kind of slow. Could anyone advise me and see if I'm doing something that obviously makes it slower?
def go_through_zip(zipname):
out = {}
with ZipFile(zipname) as z:
for filename in z.namelist():
with z.open(filename) as f:
try:
outdict = make_dict(f)
out.update(outdict)
except:
print("File is not in the correct format")
return out
make_dict(f) just takes the file path and makes a dictionary, and this function is probably also slow, but that's not what I want to speed up right now.

Try using the following code for file extraction. it works fast as long as the size of the file being extracted is reasonable.
# importing required modules
from zipfile import ZipFile
# specifying the zip file name
file_name = "my_python_files.zip"
# opening the zip file in READ mode
with ZipFile(file_name, 'r') as zip:
# printing all the contents of the zip file
zip.printdir()
# extracting all the files
print('Extracting all the files now...')
zip.extractall()
print('Done!')
```

Download bz2, Read compress files in memory (avoid memory overflow)

As title says, I'm downloading a bz2 file which has a folder inside and a lot of text files...
My first version was decompressing in memory, but Although it is only 90mbs when you uncomrpess it, it has 60 files of 750mb each.... Computer goes bum! obviusly cant handle like 40gb of ram XD)
So, The problem is that they are too big to keep all in memory at the same time... so I'm using this code that works but its sucks (Too slow):
response = requests.get('https:/fooweb.com/barfile.bz2')
# save file into disk:
compress_filepath = '{0}/files/sources/{1}'.format(zsets.BASE_DIR, check_time)
with open(compress_filepath, 'wb') as local_file:
local_file.write(response.content)
#We extract the files into folder
extract_folder = compress_filepath + '_ext'
with tarfile.open(compress_filepath, "r:bz2") as tar:
tar.extractall(extract_folder)
# We process one file at a time:
for filename in os.listdir(extract_folder):
filepath = '{0}/{1}'.format(extract_folder,filename)
file = open(filepath, 'r').readlines()
for line in file:
some_processing(line)
Is there a way I could make this without dumping it to disk... and only decompressing and reading one file from the .bz2 at a time?
Thank you very much for your time in advance, I hope somebody knows how to help me with this...

#!/usr/bin/python3
import sys
import requests
import tarfile
got = requests.get(sys.argv[1], stream=True)
with tarfile.open(fileobj=got.raw, mode='r|*') as tar:
for info in tar:
if info.isreg():
ent = tar.extractfile(info)
# now process ent as a file, however you like
print(info.name, len(ent.read()))

I did it this way:
response = requests.get(my_url_to_file)
memfile = io.BytesIO(response.content)
# We extract files in memory, one by one:
tar = tarfile.open(fileobj=memfile, mode="r:bz2")
for member_name in tar.getnames():
filecount+=1
file = tar.extractfile(member_name)
with open(file, 'r') as read_file:
for line in read_file:
process_line(line)

Python: Unable to open and read a file

I am totally new to python.
I was trying to read a file which I already created but getting the below error
File "C:/Python25/Test scripts/Readfile.py", line 1, in <module>
filename = open('C:\Python25\Test scripts\newfile','r')
IOError: [Errno 2] No such file or directory: 'C:\\Python25\\Test scripts\newfile
My code:
filename = open('C:\Python25\Test scripts\newfile','r')
print filename.read()
Also I tried
filename = open('C:\\Python25\\Test scripts\\newfile','r')
print filename.read()
But same errors I am getting.

Try:
fpath = r'C:\Python25\Test scripts\newfile'
if not os.path.exists(fpath):
print 'File does not exist'
return
with open(fpath, 'r') as src:
src.read()
First you validate that file, that it exists.
Then you open it. With wrapper is more usefull, it closes your file, after you finish reading. So you will not stuck with many open descriptors.

I think you're probably having this issue because you didn't include the full filename.
You should try:
filename = open('C:\Python25\Test scripts\newfile.txt','r')
print filename.read()
*Also if you're running this python file in the same location as the target file your are opening, you don't need to give the full directory, you can just call:
filename = open(newfile.txt

I had the same problem. Here's how I got it right.
your code:
filename = open('C:\\Python25\\Test scripts\\newfile','r')
print filename.read()
Try this:
with open('C:\\Python25\\Test scripts\\newfile') as myfile:
print(myfile.read())
Hope it helps.

I am using VS code. If I am not using dent it would not work for the print line. So try to have the format right then you will see the magic.
with open("mytest.txt") as myfile:
print(myfile.read())
or without format like this:
hellofile=open('mytest.txt', 'r')
print(hellofile.read())

How to print the content of zipped gzip'd files

Ok, so I have a zip file that contains gz files (unix gzip).
Here's what I do --
def parseSTS(file):
import zipfile, re, io, gzip
with zipfile.ZipFile(file, 'r') as zfile:
for name in zfile.namelist():
if re.search(r'\.gz$', name) != None:
zfiledata = zfile.open(name)
print("start for file ", name)
with gzip.open(zfiledata,'r') as gzfile:
print("done opening")
filecontent = gzfile.read()
print("done reading")
print(filecontent)
This gives the following result --
>>>
start for file XXXXXX.gz
done opening
done reading
Then stays like that forever until it crashes ...
What can I do with filecontent?
Edit : this is not a duplicate since my gzipped files are in a zipped file and i'm trying to avoid extracting that zip file to disk. It works with zip files in a zip file as per How to read from a zip file within zip file in Python? .

I created a zip file containing a gzip'ed PDF file I grabbed from the web.
I ran this code (with two small changes):
1) Fixed indenting of everything under the def statement (which I also corrected in your Question because I'm sure that it's right on your end or it wouldn't get to the problem you have).
2) I changed:
zfiledata = zfile.open(name)
print("start for file ", name)
with gzip.open(zfiledata,'r') as gzfile:
print("done opening")
filecontent = gzfile.read()
print("done reading")
print(filecontent)
to:
print("start for file ", name)
with gzip.open(name,'rb') as gzfile:
print("done opening")
filecontent = gzfile.read()
print("done reading")
print(filecontent)
Because you were passing a file object to gzip.open instead of a string. I have no idea how your code is executing without that change, but it was crashing for me until I fixed it.
EDIT: Adding link to GZIP docs from James R's answer --
Also, see here for further documentation:
http://docs.python.org/2/library/gzip.html#examples-of-usage
END EDIT
Now, since my gzip'ed file is small, the behavior I observe is that is pauses for about 3 seconds after printing done reading, then outputs what is in filecontent.
I would suggest adding the following debugging line after your print "done reading" -- print len(filecontent). If this number is very, very large, consider not printing the entire file contents in one shot.
I would also suggest reading this for more insight into what I expect is your problem: Why is printing to stdout so slow? Can it be sped up?
EDIT 2 - an alternative if your system does not handle file io on zip files, causing no such file errors in the above:
def parseSTS(afile):
import zipfile
import zlib
import gzip
import io
with zipfile.ZipFile(afile, 'r') as archive:
for name in archive.namelist():
if name.endswith('.gz'):
bfn = archive.read(name)
bfi = io.BytesIO(bfn)
g = gzip.GzipFile(fileobj=bfi,mode='rb')
qqq = g.read()
print qqq
parseSTS('t.zip')

Most likely your problem lies here:
if name.endswith(".gz"): #as goncalopp said in the comments, use endswith
#zfiledata = zfile.open(name) #don't do this
#print("start for file ", name)
with gzip.open(name,'rb') as gzfile: #gz compressed files should be read in binary and gzip opens the files directly
#print("done opening") #trust in your program, luke
filecontent = gzfile.read()
#print("done reading")
print(filecontent)
See here for further documentation:
http://docs.python.org/2/library/gzip.html#examples-of-usage

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Open .json.xz files - python

This is probably because the compressed data file you have is incomplete/corrupted. The code you have provided works fine for decompressing json.xz files.

Related

How to print the content of the last saved text file in a folder using Python

How to speed up zip file extraction in Python

Download bz2, Read compress files in memory (avoid memory overflow)

Python: Unable to open and read a file

How to print the content of zipped gzip'd files

Categories

Resources