serialize a text file into a protobuf message - python

I have a serialized protobuf message that I can simply read and save in plain text in python with something like this:
import MyMessage
import sys
FilePath = sys.argv[1]
T = MyMessage.MyType()
f = open(FilePath, 'rb')
T.ParseFromString(f.read())
f.close()
print(T)
I can save this to a plain txt file and do what I want to do.
Now I need to do the inverse operation, i.e. reading the simple plain text file, already formatted in the right way, and save it as a protobuf message
import MyMessage
import sys
FilePath = sys.argv[1]
input = open("./input.txt", 'r')
T = MyMessage.MyType()
T.ParseFrom(inputText.readlines())
output.write(T.SerializeToString())
input.close()
output.close()
This fails with
Traceback (most recent call last):
File "MyFile.py", line 13, in <module>
T.ParseFromString(input.readlines())
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\google\protobuf\message.py", line 199, in ParseFromString
return self.MergeFromString(serialized)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\google\protobuf\internal\python_message.py", line 1142, in MergeFromString
serialized = memoryview(serialized)
TypeError: memoryview: a bytes-like object is required, not 'list'
I am not a python nor a protobuf expert, so I guess I am missing something trivial...
Any help?
Thanks :)

print(x) calls str(x), which for protobufs uses the human-readable "text format" representation.
To read back from that format, you can use the google.protobuf.text_format module:
from google.protobuf import text_format
def parse_my_type(file_path):
with open(file_path, 'r') as f:
return text_format.Parse(f.read(), MyMessage.MyType())

Related

How to write a python function to compress a CSV file using LZ4

Newbie here...
I need to read and compress a CSV file using LZ4 and I have run into an expected error, the compress() function reads in bytes and the CSV file is incompatible. Is there a way to use LZ4 to compress an entire file or do I need to convert the CSV file into bit format and then compress it? If so how would I approach this?
import lz4.frame
import csv
file=open("raw_data_files/raw_1.csv")
type(file)
input_data=csv.reader(file)
compressed=lz4.frame.compress(input_data)
Error shows
Traceback (most recent call last):
File "compression.py", line 10, in <module>
compressed=lz4.frame.compress(input_data)
TypeError: a bytes-like object is required, not '_csv.reader'
You could do it like this:-
import lz4.frame
with open('raw_data_files/raw_1.csv', 'rb') as infile:
with open('raw_data_files/raw_1.lz4', 'wb') as outfile:
outfile.write(lz4.frame.compress(infile.read()))

How do I secure my pickle files correctly?

I'm following this guide to secure the pickle files correctly but I'm not getting the same output. Granted I had to do some changes to run it the first time:
import hashlib
import hmac
import pickle
class Dummy:
pass
obj = Dummy()
data = pickle.dumps(obj)
digest = hmac.new(b'unique-key-here', data, hashlib.blake2b).hexdigest()
with open('temp.txt', 'wb') as output:
output.write(str(digest) + ' ' + data)
with open('temp.txt', 'r') as f:
data = f.read()
digest, data = data.split(' ')
expected_digest = hmac.new(b'unique-key-here', data, hashlib.blake2b).hexdigest()
if not secrets.compare_digest(digest, expected_digest):
print('Invalid signature')
exit(1)
obj = pickle.loads(data)
When I run this I get the following stacktrace:
File "test.py", line 21, in <module>
expected_digest = hmac.new(b'unique-key-here', data, hashlib.blake2b).hexdigest()
File "/usr/lib/python3.8/hmac.py", line 153, in new
return HMAC(key, msg, digestmod)
File "/usr/lib/python3.8/hmac.py", line 88, in __init__
self.update(msg)
File "/usr/lib/python3.8/hmac.py", line 96, in update
self.inner.update(msg)
TypeError: Unicode-objects must be encoded before hashing
Your problem is data = f.read(). .read() returns a string and hmac.new() wants bytes. Change the problem line to data = f.read().encode('utf-8') OR read the file in binary mode ('b' flag).
References:
7.2. Reading and Writing Files
open()
hmac.new()
.encode()
I ended up having to use the following methods for it to work:
pickle.loads(codecs.decode(pickle_data.encode(), 'base64'))
# and
codecs.encode(pickle.dumps(pickle_obj), "base64").decode()
Not sure why using .encode() and .decode() was still not working for me.

python tempfile + gzip + json dump

I want to dump very large dictionary in to a compressed json file using python3 (3.5).
import gzip
import json
import tempfile
data = {"verylargedict": True}
with tempfile.NamedTemporaryFile("w+b", dir="/tmp/", prefix=".json.gz") as fout:
with gzip.GzipFile(mode="wb", fileobj=fout) as gzout:
json.dump(data, gzout)
I got this error though.
Traceback (most recent call last):
File "test.py", line 13, in <module>
json.dump(data, gzout)
File "/usr/lib/python3.5/json/__init__.py", line 179, in dump
fp.write(chunk)
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Any thoughts?
Gzip object has no text mode. So I would create a wrapper to pass as the filehandle object. This wrapper takes data from json and encodes it as binary to write in the gzip file:
class wrapper:
def __init__(self,gzout):
self.__handle = gzout
def write(self,data):
self.__handle.write(data.encode())
use like this:
json.dump(data, wrapper(gzout))
each time json.dump wants to write to the object, the wrapper.write method is called, which converts text to binary and writes to the binary stream
(some built-in wrappers from io module may fit too, but this implementation is simple and work)

How do I save a list to a pickle file in a temporary directory and pass that file into a function?

The issue is that I'm pulling data from one source and I want to save it to dropbox as a pickle file. I can't save it in a directory, because I'm running the code on a server (iron.io).
import tempfile
import pickle
def SFDCDropboxSync(Data):
f = tempfile.NamedTemporaryFile(delete=False)
pickle.dump(Data,open(f,'wb'))
client = dropbox.client.DropboxClient(access_token)
client.put_file(filename, f)
This is the error I get:
Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Shippy/RecurringDataDump/SFDCDropboxUpload.py", line 38, in <module>
if __name__ == "__main__": main() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Shippy/RecurringDataDump/SFDCDropboxUpload.py", line 31, in main
print SFDCDropboxUploadDownload().SFDCDropboxSync(lst) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Shippy/RecurringDataDump/SFDCDropboxUpload.py", line 26, in SFDCDropboxSync
pkl = self.SaveListtoPickle(lst) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Shippy/RecurringDataDump/SFDCDropboxUpload.py", line 20, in SaveListtoPickle
pickle.dump(lst,open(f,'wb')) TypeError: coercing to Unicode: need string or buffer, instance found [Finished in 0.7s with exit code 1] [shell_cmd: python -u "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Shippy/RecurringDataDump/SFDCDropboxUpload.py"] [dir: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Shippy/RecurringDataDump] [path: /usr/bin:/bin:/usr/sbin:/sbin]
In your code, the NamedTemporaryFile f is not a string. It is a file object, similar to the output of open(file_path).
From the documentation: This file-like object can be used in a with statement, just like a normal file.
If you want to path to the created file, use tmp_file.name
For example, this works: (tested on python 3.6.2)
def SFDCDropboxSync(Data):
with tempfile.NamedTemporaryFile() as tmp_file:
pickle.dump(Data, tmp_file)
tmp_file.flush()
print(pickle.load(open(tmp_file.name, 'rb')))
This will delete the file when it exits the while (file closes).
Warning for Windows: you might have trouble reading the file while it is open. Instead, use something similar to this:
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
pickle.dump(Data, open(tmp_file.name, 'wb'))
tmp_filename = tmp_file.name
pickle.load(open(tmp_filename, 'rb'))
os.remove(tmp_filename)

How to extract information from json?

i am trying to extract from json data some information. On the following code, I first extract the part of json data that contains the information i want and i store it in a file. Then i am trying to open this file and i get the error that follows my code. Can you help me find where i am wrong?
import json
import re
input_file = 'path'
text = open(input_file).read()
experience = re.findall(r'Experience":{"positionsMpr":{"showSection":true," (.+?),"visible":true,"find_title":"Find others',text)
output_file = open ('/home/evi.nastou/Documenten/LinkedIn_data/Alewijnse/temp', 'w')
output_file.write('{'+experience[0]+'}')
output_file.close()
text = open('path/temp')
input_text = text.read()
data = json.load(input_text)
positions = json.dumps([s['companyName'] for s in data['positions']])
print positions
Error:
Traceback (most recent call last):
File "test.py", line 13, in <module>
data = json.load(input_text)
File "/home/evi.nastou/.pythonbrew/pythons/Python-2.7.2/lib/python2.7/json/__init__.py", line 274, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
You want to use json.loads() (note the s), or pass in the file object instead of the result of .read():
text = open('path/temp')
data = json.load(text)
json.load() takes an open file object, but you were passing it a string; json.loads() takes a string.

Categories

Resources