Python: Cannot gunzip file made with gzip - python

I'm exploring file compression options, and am confused by the behavior of the gzip module in Python. I can write a gzipped file like this:
with gzip.open('test.txt.gz', 'wb') as out:
for i in range(100):
out.write(bytes(i))
But if I then run gunzip test.txt.gz the output (test.txt) is still binary. What am I missing?

Ah, this works properly in Python 2.7:
import gzip
with gzip.open('test.txt.gz', 'wb') as out:
for i in range(100):
out.write(bytes(i))
In Python 3, we have to do:
import io, gzip
with gzip.open('test.txt.gz', 'wb') as output:
with io.TextIOWrapper(output, encoding='utf-8') as writer:
for i in range(100):
writer.write(str(i))

While the code you posted for 2.7 works fine, A simple way to fix this for 3.X would be:
import gzip
with gzip.open('test.txt.gz', 'wb') as out:
for i in range(100):
out.write(str(i).encode("utf-8"))

Related

Python results output to txt file

I tried this code posted 2 years ago:
import subprocess
with open("output.txt", "wb") as f:
subprocess.check_call(["python", "file.py"], stdout=f)
import sys
import os.path
orig = sys.stdout
with open(os.path.join("dir", "output.txt"), "wb") as f:
sys.stdout = f
try:
execfile("file.py", {})
finally:
sys.stdout = orig
It hangs up the terminal until I ctl-z and then it crashes the terminal but prints the output.
I'm new to coding and am not sure how to resolve. I'm obviously doing something wrong. Thanks for your help.
You can simply open and write to the file with write.
with open('output.txt', 'w') as f:
f.write('output text') # You can use a variable from other data you collect instead if you would like
Since you are new to coding, i'll just let you know that opening a file using with will actually close it automatically after the indented code is ran. Good luck with your project!

How to use json.tool from the shell to validate and pretty-print language files without removing the unicode?

Ubuntu 16.04
Bash 4.4
python 3.5
I received a bunch of language files from the translators at Upwork and noticed none of the files had the same line count. So I decided to validate and pretty-print them since they were in .json format and then see which lines were missing from each file, so I made a simple script to validate and pretty-print:
#!/bin/sh
for file in *.json; do
python -m json.tool "${file}" > "${file}".tmp;
rm -f "${file}";
mv "${file}".tmp "${file}"
done
Now my Russian Langauge file looks like:
"manualdirections": "\u041c\u0430\u0440\u0448\u0440\u0443\u0442",
"moreinformation": "\u0414\u0435\u0442\u0430\u043b\u0438",
"no": "\u041d\u0435\u0442",
I would very much like to keep the content of the files untouched.
You can use the following equivalent Python script instead, which uses a subclass of json.JSONEncoder to override the ensure_ascii option:
import json
import os
import glob
class allow_nonascii(json.JSONEncoder):
def __init__(self, *args, ensure_ascii=False, **kwargs):
super().__init__(*args, ensure_ascii=False, **kwargs)
for file in glob.iglob('*.json'):
with open(file, 'r') as fin, open(file + '.tmp', 'w') as fout:
fout.write(json.dumps(json.load(fin), cls=allow_nonascii, indent=4))
os.remove(file)
os.rename(file + '.tmp', file)
#!/usr/bin/python3
for filename in os.listdir('/path/to/json_files'):
if filename.endswith('.json'):
with open(filename, encoding='utf-8') as f:
data = f.read()
print(json.dumps(data, indent=4))
Notice the encoding used with open(). This SHOULD import the files and display them as necessary. I think.
This is not possible in json.tool:
https://github.com/python/cpython/blob/3.5/Lib/json/tool.py#L45
The call to json.dumps does not allow to pass the keyword argument ensure_ascii=False which would solve your issue here.
You will have to write your own json.tool, monkeypatch it, or use third-party code.
edit: I've proposed PR 9765 to add this feature to json.tool in Python 3.8.

Python2.7 - reading from tempfile

I use python2.7 and i have a question about reading from tempfile. Here is my code:
import tempfile
for i in range(0,10):
f = tempfile.NamedTemporaryFile()
f.write("Hello")
##f.seek(0)
print f.read()
With this code , i get something like this:
Rワ
nize.pyR
゙`Sc
d
Rワ
Rワ
Z
Z
nize.pyR
゙`Sc
what are these?
Thanks!
You are writing string to a file opened in bytes mode. Add the mode parameter to your call to NamedTemporaryFile:
f = tempfile.NamedTemporaryFile("w")
See https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files

How to use subprocess to unzip gz file in python

I dont know how to unzip gz file in python using subprocess.
gzip library is so slow and i was thinking to make the same function above using gnu/linux shell code and subprocess library.
def __unzipGz(filePath):
import gzip
inputFile = gzip.GzipFile(filePath, 'rb')
stream = inputFile.read()
inputFile.close()
outputFile = file(os.path.splitext(filePath)[0], 'wb')
outputFile.write(stream)
outputFile.close()
You can use something like this:
import subprocess
filename = "some.gunzip.file.tar.gz"
output = subprocess.Popen(['tar', '-xzf', filename])
Since there is no much useful output here, You could also use os.system instead of subprocess.Popen like this:
import os
filename = "some.gunzip.file.tar.gz"
exit_code = os.system("tar -xzf {}".format(filename))

pickle module doesn't work for this simple code

when i run this code in Python 3.4.2(win7-64) it doesn't work! it creates file but nothing in it.(0 bytes)
I don't know what is the problem? Help- Thanks
Windo
import pickle
f=open ("G:\\database.txt","wb")
pickle.dump (12345,f)
You have to close the file object that you have opened. So just add the line
f.close()
at the end and it will work.
As an alternative, you can also use the with statement to open the file, then it will automatically close the file for you when it's done:
import pickle
with open("G:\\database.txt", "wb") as f:
pickle.dump( 12345, f )

Categories

Resources