python tempfile + gzip + json dump - python

I want to dump very large dictionary in to a compressed json file using python3 (3.5).
import gzip
import json
import tempfile
data = {"verylargedict": True}
with tempfile.NamedTemporaryFile("w+b", dir="/tmp/", prefix=".json.gz") as fout:
with gzip.GzipFile(mode="wb", fileobj=fout) as gzout:
json.dump(data, gzout)
I got this error though.
Traceback (most recent call last):
File "test.py", line 13, in <module>
json.dump(data, gzout)
File "/usr/lib/python3.5/json/__init__.py", line 179, in dump
fp.write(chunk)
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Any thoughts?

Gzip object has no text mode. So I would create a wrapper to pass as the filehandle object. This wrapper takes data from json and encodes it as binary to write in the gzip file:
class wrapper:
def __init__(self,gzout):
self.__handle = gzout
def write(self,data):
self.__handle.write(data.encode())
use like this:
json.dump(data, wrapper(gzout))
each time json.dump wants to write to the object, the wrapper.write method is called, which converts text to binary and writes to the binary stream
(some built-in wrappers from io module may fit too, but this implementation is simple and work)

Related

How to write a python function to compress a CSV file using LZ4

Newbie here...
I need to read and compress a CSV file using LZ4 and I have run into an expected error, the compress() function reads in bytes and the CSV file is incompatible. Is there a way to use LZ4 to compress an entire file or do I need to convert the CSV file into bit format and then compress it? If so how would I approach this?
import lz4.frame
import csv
file=open("raw_data_files/raw_1.csv")
type(file)
input_data=csv.reader(file)
compressed=lz4.frame.compress(input_data)
Error shows
Traceback (most recent call last):
File "compression.py", line 10, in <module>
compressed=lz4.frame.compress(input_data)
TypeError: a bytes-like object is required, not '_csv.reader'
You could do it like this:-
import lz4.frame
with open('raw_data_files/raw_1.csv', 'rb') as infile:
with open('raw_data_files/raw_1.lz4', 'wb') as outfile:
outfile.write(lz4.frame.compress(infile.read()))

serialize a text file into a protobuf message

I have a serialized protobuf message that I can simply read and save in plain text in python with something like this:
import MyMessage
import sys
FilePath = sys.argv[1]
T = MyMessage.MyType()
f = open(FilePath, 'rb')
T.ParseFromString(f.read())
f.close()
print(T)
I can save this to a plain txt file and do what I want to do.
Now I need to do the inverse operation, i.e. reading the simple plain text file, already formatted in the right way, and save it as a protobuf message
import MyMessage
import sys
FilePath = sys.argv[1]
input = open("./input.txt", 'r')
T = MyMessage.MyType()
T.ParseFrom(inputText.readlines())
output.write(T.SerializeToString())
input.close()
output.close()
This fails with
Traceback (most recent call last):
File "MyFile.py", line 13, in <module>
T.ParseFromString(input.readlines())
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\google\protobuf\message.py", line 199, in ParseFromString
return self.MergeFromString(serialized)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\google\protobuf\internal\python_message.py", line 1142, in MergeFromString
serialized = memoryview(serialized)
TypeError: memoryview: a bytes-like object is required, not 'list'
I am not a python nor a protobuf expert, so I guess I am missing something trivial...
Any help?
Thanks :)
print(x) calls str(x), which for protobufs uses the human-readable "text format" representation.
To read back from that format, you can use the google.protobuf.text_format module:
from google.protobuf import text_format
def parse_my_type(file_path):
with open(file_path, 'r') as f:
return text_format.Parse(f.read(), MyMessage.MyType())

How do I secure my pickle files correctly?

I'm following this guide to secure the pickle files correctly but I'm not getting the same output. Granted I had to do some changes to run it the first time:
import hashlib
import hmac
import pickle
class Dummy:
pass
obj = Dummy()
data = pickle.dumps(obj)
digest = hmac.new(b'unique-key-here', data, hashlib.blake2b).hexdigest()
with open('temp.txt', 'wb') as output:
output.write(str(digest) + ' ' + data)
with open('temp.txt', 'r') as f:
data = f.read()
digest, data = data.split(' ')
expected_digest = hmac.new(b'unique-key-here', data, hashlib.blake2b).hexdigest()
if not secrets.compare_digest(digest, expected_digest):
print('Invalid signature')
exit(1)
obj = pickle.loads(data)
When I run this I get the following stacktrace:
File "test.py", line 21, in <module>
expected_digest = hmac.new(b'unique-key-here', data, hashlib.blake2b).hexdigest()
File "/usr/lib/python3.8/hmac.py", line 153, in new
return HMAC(key, msg, digestmod)
File "/usr/lib/python3.8/hmac.py", line 88, in __init__
self.update(msg)
File "/usr/lib/python3.8/hmac.py", line 96, in update
self.inner.update(msg)
TypeError: Unicode-objects must be encoded before hashing
Your problem is data = f.read(). .read() returns a string and hmac.new() wants bytes. Change the problem line to data = f.read().encode('utf-8') OR read the file in binary mode ('b' flag).
References:
7.2. Reading and Writing Files
open()
hmac.new()
.encode()
I ended up having to use the following methods for it to work:
pickle.loads(codecs.decode(pickle_data.encode(), 'base64'))
# and
codecs.encode(pickle.dumps(pickle_obj), "base64").decode()
Not sure why using .encode() and .decode() was still not working for me.

Python 3 JSON writing to CSV Error

I am trying to write out a csv file from data in JSON format. I can get the fieldnames to write to the csv file but not the item value I need. This is my first time coding in python so any help would be appreciated. The json file can be found below for reference:
https://data.ny.gov/api/views/nqur-w4p7/rows.json?accessType=DOWNLOAD
Here is my error:
Traceback (most recent call last):
File "ChangeDataType.py", line 5, in <module>
data = json.dumps(inputFile)
File "/usr/lib64/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib64/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib64/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/lib64/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_io.TextIOWrapper name='rows.json?accessType=DOWNLOAD' mode='r' encoding='UTF-8'> is not JSON serializable
Here is my code:
import json
import csv
inputFile = open("rows.json?accessType=DOWNLOAD", "r")
data = json.dumps(inputFile)
with open("Data.csv","w") as csvfile:
writer = csv.DictWriter(csvfile, extrasaction='ignore', fieldnames=["date", "new_york_state_average_gal", "albany_average_gal", "binghamton_average_gal",\
"buffalo_average_gal", "nassau_average_gal", "new_york_city_average_gal", "rochester_average_gal", "syracuse_average_gal","utica_average_gal"])
writer.writeheader()
for row in data:
writer.writerow([row["date"], row["new_york_state_average_gal"], row["albany_average_gal"], row["binghamton_average_gal"],\
row["buffalo_average_gal"], row["nassau_average_gal"], row["new_york_city_average_gal"], row["rochester_average_gal"], row["syracuse\
_average_gal"],row["utica_average_gal"]])
If you want to read a JSON file you should use json.load instead of json.dumps:
data = json.load(inputFile)
Seems you're still having problems even opening the file.
Python json to CSV
You were told to use json.load
dumps takes an object to a string. You want to read JSON to a dictionary.
You therefore need to load the JSON file, and you can open two files at once
with open("Data.csv","w") as csvfile, open("rows.json?accessType=DOWNLOAD") as inputfile:
data = json.load(inputfile)
writer = csv.DictWriter(csvfile,...
Also, for example, considering the JSON data looks like "fieldName" : "syracuse_average_gal", and that is the only occurrence of the Syracuse average value, row["syracuse_average_gal"] is not correct.
Carefully inspect your JSON and figure out to parse it from the very top bracket

Python 3 In Memory Zipfile Error. string argument expected, got 'bytes'

I have the following code to create an in memory zip file that throws an error running in Python 3.
from io import StringIO
from pprint import pprint
import zipfile
in_memory_data = StringIO()
in_memory_zip = zipfile.ZipFile(
in_memory_data, "w", zipfile.ZIP_DEFLATED, False)
in_memory_zip.debug = 3
filename_in_zip = 'test_filename.txt'
file_contents = 'asdf'
in_memory_zip.writestr(filename_in_zip, file_contents)
To be clear this is only a Python 3 problem. I can run the code fine on Python 2. To be exact I'm using Python 3.4.3. The stack trace is below:
Traceback (most recent call last):
File "in_memory_zip_debug.py", line 14, in <module>
in_memory_zip.writestr(filename_in_zip, file_contents)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1453, in writestr
self.fp.write(zinfo.FileHeader(zip64))
TypeError: string argument expected, got 'bytes'
Exception ignored in: <bound method ZipFile.__del__ of <zipfile.ZipFile object at 0x1006e1ef0>>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1466, in __del__
self.close()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1573, in close
self.fp.write(endrec)
TypeError: string argument expected, got 'bytes'
ZipFile writes its data as bytes, not strings. This means you'll have to use BytesIO instead of StringIO on Python 3.
The distinction between bytes and strings is new in Python 3. The six compatibility library has a BytesIO class for Python 2 if you want your program to be compatible with both.
The problem is that io.StringIO() is being used as the memory buffer, when it needs to be io.BytesIO. The error is occurring because the zipfile code is eventually calling the StringIO().Write() with bytes when StringIO expects a string.
Once it's changed to BytesIO(), it works:
from io import BytesIO
from pprint import pprint
import zipfile
in_memory_data = BytesIO()
in_memory_zip = zipfile.ZipFile(
in_memory_data, "w", zipfile.ZIP_DEFLATED, False)
in_memory_zip.debug = 3
filename_in_zip = 'test_filename.txt'
file_contents = 'asdf'
in_memory_zip.writestr(filename_in_zip, file_contents)

Categories

Resources