How to unpickle a python object in Golang - python

I have a python program wherein I am using Pickle to store the object using the following:
pickle.dump(sample, open( "Pickled_files/sample.p", "wb" ))
I can extract and unpickle this object in Python using the following:
sample_extracted= pickle.load(open( "Pickled_files/sample.p", "rb" ))
However, I need to extract this object in a Golang application. Thus I need to know a way by which objects pickled using Python are extracted in Golang.
Is there a way that this can be achieved? And if yes, I would really appreciate if someone can point me to a sample reference or example.

ogórek (2) is Go library for decoding/encoding Python pickles.
A pickle saved to the file can be loaded in Go as follows:
f, err := os.Open("Pickled_files/sample.p")
d := ogórek.NewDecoder(f)
obj, err := d.Decode()

Pickle is Python specific format. AFAIK there are no pickle-parsers outside of Python. You can try to write one for Go but you will most likely only waste lots of time and mental health. On the other hand that would be an interesting project, indeed.
Anyway, instead of pickling use any language independent format, i.e. xml, json, google's protobuf or even a custom one, whatever suits your needs. Always pick tool for a job, never other way around.

Is there a way that this can be achieved?
Depends on your understanding of this. There are a lot of better options than Pickle -- even in pure Python environments. If you understand this as exchanging data between golang and Python, you should consider the following example:
You can serialize everything in Python, like
import msgpack
import msgpack_numpy as m
m.patch()
import numpy as np
data = {'string': 'Hello World',
'number': 42,
'matrix': np.random.randn(2, 3)}
with open('data.msgp', 'wb') as f:
f.write(msgpack.packb(data, use_bin_type=True))
Reading it is pretty simple
// go get -u github.com/vmihailenco/msgpack
package main
import (
"fmt"
"github.com/vmihailenco/msgpack"
"io/ioutil"
)
func main() {
buf, err := ioutil.ReadFile("data.msgp")
if err != nil {
panic(err)
}
var out map[string]interface{}
err = msgpack.Unmarshal(buf, &out)
if err != nil {
panic(err)
}
for k, v := range out {
fmt.Printf("key[%v] value[%v]\n", k, v)
}
}
This gives you
key[matrix] value[map[data:[145 106 174 12 61 187 235 63 128 225 138 214 167 154 231 191 156 205 144 51 50 116 244 191 251 147 235 33 187 149 251 63 207 56 134 174 206 146 220 63 7 23 246 148 34 235 226 63] type:[60 102 56]
kind:[] nd:true shape:[2 3]]]
key[string] value[[72 101 108 108 111 32 87 111 114 108 100]]
key[number] value[42]
All is left is converting the byte sequences into the object you would like to have.

Related

How to convert these (hex) numbers to characters in Golang?

I receive some characters over a radio chip which I try to read out from a serial port. I can read it out fine in this Python code, which gives me this:
received: counter: 2703
received: counter: 2704
received: counter: 2705
So using the go-serial package I wrote some code to do the same in Go:
package main
import "fmt"
import "log"
import "github.com/jacobsa/go-serial/serial"
import "io"
import "encoding/hex"
func main() {
// Set up options.
options := serial.OpenOptions{
PortName: "/dev/ttyUSB0",
BaudRate: 9600,
DataBits: 7,
StopBits: 2,
MinimumReadSize: 4,
}
// Open the port.
port, err := serial.Open(options)
if err != nil {
log.Fatalf("serial.Open: %v", err)
}
defer port.Close()
for {
buf := make([]byte, 32)
n, err := port.Read(buf)
if err != nil {
if err != io.EOF {
fmt.Println("Error reading from serial port: ", err)
}
} else {
buf = buf[:n]
fmt.Println("received: ", buf)
fmt.Println("received: ", hex.EncodeToString(buf))
}
}
}
As you can see I print out the received buffer both raw AND converted from hex to string. The result is this:
received: [99 111 117 110 116 101 114 58 32 51 48 50 52 10]
received: 636f756e7465723a20333032340a
received: [99 111 117 110 116 101 114 58 32 51 48 50 53 10]
received: 636f756e7465723a20333032350a
received: [99 111 117 110 116 101 114 58 32 51 48 50 54 10]
received: 636f756e7465723a20333032360a
I guess those numbers represent counter: 2704, but as you can see the conversion to string doesn't give me the result I expect.
What am I doing wrong here? How can I convert those numbers to a string?
The text that came in is already a valid string. It's just that you have the bytes stored in buf which is []byte. To convert the existing []byte to a value of type string:
asString := string(buf)
While hex.EncodeToString returns a string, it returns the hexadecimal representation of each byte. For instance, the UTF-8 / ASCII for lowercase c, code 99 decimal, is 0x63, so the first two characters of hex.EncodeToString are 6 and 3.
(Meanwhile, you should figure out what to do with actual errors. Your code currently ignores them, after announcing any that are not io.EOF. If your device goes into an error state, you will loop over and over again getting the same error.)
I found that the solution is very simple. Instead of either of these:
fmt.Println("received: ", buf)
fmt.Println("received: ", hex.EncodeToString(buf))
I simply had to do this:
fmt.Println("received: ", string(buf))

Gensim Summarizer throws MemoryError, Any Solution?

I am trying to generate the summary of a large text file using Gensim Summarizer.
I am getting memory error. Have been facing this issue since sometime, any help
would be really appreciated. feel free to ask for more details.
from gensim.summarization.summarizer import summarize
file_read =open("xxxxx.txt",'r')
Content= file_read.read()
def Summary_gen(content):
print(len(Content))
summary_r=summarize(Content,ratio=0.02)
print(summary_r)
Summary_gen(Content)
The length of the document is:
365042
Error messsage:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-6-a91bd71076d1> in <module>()
10
11
---> 12 Summary_gen(Content)
<ipython-input-6-a91bd71076d1> in Summary_gen(content)
6 def Summary_gen(content):
7 print(len(Content))
----> 8 summary_r=summarize(Content,ratio=0.02)
9 print(summary_r)
10
c:\python3.6\lib\site-packages\gensim\summarization\summarizer.py in summarize(text, ratio, word_count, split)
428 corpus = _build_corpus(sentences)
429
--> 430 most_important_docs = summarize_corpus(corpus, ratio=ratio if word_count is None else 1)
431
432 # If couldn't get important docs, the algorithm ends.
c:\python3.6\lib\site-packages\gensim\summarization\summarizer.py in summarize_corpus(corpus, ratio)
367 return []
368
--> 369 pagerank_scores = _pagerank(graph)
370
371 hashable_corpus.sort(key=lambda doc: pagerank_scores.get(doc, 0), reverse=True)
c:\python3.6\lib\site-packages\gensim\summarization\pagerank_weighted.py in pagerank_weighted(graph, damping)
57
58 """
---> 59 adjacency_matrix = build_adjacency_matrix(graph)
60 probability_matrix = build_probability_matrix(graph)
61
c:\python3.6\lib\site-packages\gensim\summarization\pagerank_weighted.py in build_adjacency_matrix(graph)
92 neighbors_sum = sum(graph.edge_weight((current_node, neighbor)) for neighbor in graph.neighbors(current_node))
93 for j in xrange(length):
---> 94 edge_weight = float(graph.edge_weight((current_node, nodes[j])))
95 if i != j and edge_weight != 0.0:
96 row.append(i)
c:\python3.6\lib\site-packages\gensim\summarization\graph.py in edge_weight(self, edge)
255
256 """
--> 257 return self.get_edge_properties(edge).setdefault(self.WEIGHT_ATTRIBUTE_NAME, self.DEFAULT_WEIGHT)
258
259 def neighbors(self, node):
c:\python3.6\lib\site-packages\gensim\summarization\graph.py in get_edge_properties(self, edge)
404
405 """
--> 406 return self.edge_properties.setdefault(edge, {})
407
408 def add_edge_attributes(self, edge, attrs):
MemoryError:
I have tried looking up for this error on the internet, but, couldn't find a workable solution to this.
From the logs, it looks like the code builds an adjacency matrix
---> 59 adjacency_matrix = build_adjacency_matrix(graph)
This probably tries to create a huge adjacency matrix with your 365042 documents which cannot fit in your memory(i.e., RAM).
You could try:
Reducing the document size to fewer files (maybe start with 10000)
and check if it works
Try running it on a system with more RAM
Did you try to use word_count argument instead of ratio?
If the above still doesn't solve the problem, then that's because of gensim's implementation limitations. The only way to use gensim if you still OOM errors is to split documents. That also will speed up your solution (and if the document is really big, it shouldn't be a problem anyway).
What's the problem with summarize:
gensim's summarizer uses TextRank by default, an algorithm that uses PageRank. In gensim it is unfortunately implemented using a Python list of PageRank graph nodes, so it may fail if your graph is too big.
BTW is the document length measured in words, or characters?

"Read_Ncol" exit with error code -1073740791

I am using python 3.5.3 and igraph 0.7.1.
Why the following code finishes with "Process finished with exit code -1073740791 (0xC0000409)" error message.
from igraph import Graph
g = Graph.Read_Ncol('test.csv', directed=False)
test.csv
119 205
119 625
124 133
124 764
124 813
55 86
55 205
55 598
133 764
The Read_Ncol function reads files in NCOL format, as produced by the Large Graph Layout program.
Your example works fine for me, also on Python 3.5.3 with igraph 0.7.1.
>>> g = Graph.Read_Ncol('test.csv', directed=False)
>>> g
<igraph.Graph object at 0x10c4844f8>
>>> print(g)
IGRAPH UN-- 10 9 --
+ attr: name (v)
+ edges (vertex names):
119--205, 119--625, 124--133, 124--764, 124--813, 55--86, 205--55, 55--598,
133--764
It seems the error C0000409 means "Stack Buffer Overrun" on Windows, which probably means that your program is writing outside of the space allocated on the stack (it's different from a stack overflow, according to this Microsoft Technet Blog.)

Python Arduino serial save to file

sorry for my english.
My Arduino serial give 3 values like this, at 300 Hz:
-346 54 -191
-299 12 -123
-497 -214 77
-407 -55 -19
45 129 46
297 123 -197
393 71 -331
544 115 -273
515 -355 -89
510 -183 -47
Whit this python code I read and write correctly serial to file but after the while cycle do not terminate, and the shell remain open, and do not print stop:
...
ard=serial.Serial(portname,baudrate)
print"start"
while True:
x = ard.readline()
#print x
a=open(filename,'ab')
a.write(x)
a.close
print "stop"
...
I a biginner programmer, can you tell me a solution, to write serial to file and go forward.
Tanks
You're never breaking from the while loop. You should:
Add a timeout to the serial reader
When there are no bytes received, break the loop
Taking your code as a base, try something like this:
...
ard=serial.Serial(addr,baud)
ard.timeout = 1 # in seconds
print"start"
while True:
x = ard.readline()
if len(x) == 0:
break
a=open(fname,'ab')
a.write(x)
a.close
print "stop"
...
It works!
I have use ard.timeout and if condition (just if condition alone do not work).
An other question,
My arduino serial start and terminate like this:
Start
-663 -175 76
361 47 157
425 -229 -174
531 -283 -288
518 -40 -28
538 -228 206
581 188 174
445 5 176
end
It's possible to start write file after "Start" string and terminate before "end" string?
I have tried something like this but do not work:
while True:
x = ard.readline()
if x=="end":
break
#print x
a=open(fname,'ab')
a.write(x)
a.close
Blockquote
enter code here

How to read part of binary file with numpy?

I'm converting a matlab script to numpy, but have some problems with reading data from a binary file. Is there an equivelent to fseek when using fromfile to skip the beginning of the file? This is the type of extractions I need to do:
fid = fopen(fname);
fseek(fid, 8, 'bof');
second = fread(fid, 1, 'schar');
fseek(fid, 100, 'bof');
total_cycles = fread(fid, 1, 'uint32', 0, 'l');
start_cycle = fread(fid, 1, 'uint32', 0, 'l');
Thanks!
You can use seek with a file object in the normal way, and then use this file object in fromfile. Here's a full example:
import numpy as np
import os
data = np.arange(100, dtype=np.int)
data.tofile("temp") # save the data
f = open("temp", "rb") # reopen the file
f.seek(256, os.SEEK_SET) # seek
x = np.fromfile(f, dtype=np.int) # read the data into numpy
print x
# [64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
# 89 90 91 92 93 94 95 96 97 98 99]
There probably is a better answer… But when I've been faced with this problem, I had a file that I already wanted to access different parts of separately, which gave me an easy solution to this problem.
For example, say chunkyfoo.bin is a file consisting of a 6-byte header, a 1024-byte numpy array, and another 1024-byte numpy array. You can't just open the file and seek 6 bytes (because the first thing numpy.fromfile does is lseek back to 0). But you can just mmap the file and use fromstring instead:
with open('chunkyfoo.bin', 'rb') as f:
with closing(mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_READ)) as m:
a1 = np.fromstring(m[6:1030])
a2 = np.fromstring(m[1030:])
This sounds like exactly what you want to do. Except, of course, that in real life the offset and length to a1 and a2 probably depend on the header, rather than being fixed comments.
The header is just m[:6], and you can parse that by explicitly pulling it apart, using the struct module, or whatever else you'd do once you read the data. But, if you'd prefer, you can explicitly seek and read from f before constructing m, or after, or even make the same calls on m, and it will work, without affecting a1 and a2.
An alternative, which I've done for a different non-numpy-related project, is to create a wrapper file object, like this:
class SeekedFileWrapper(object):
def __init__(self, fileobj):
self.fileobj = fileobj
self.offset = fileobj.tell()
def seek(self, offset, whence=0):
if whence == 0:
offset += self.offset
return self.fileobj.seek(offset, whence)
# ... delegate everything else unchanged
I did the "delegate everything else unchanged" by generating a list of attributes at construction time and using that in __getattr__, but you probably want something less hacky. numpy only relies on a handful of methods of the file-like object, and I think they're properly documented, so just explicitly delegate those. But I think the mmap solution makes more sense here, unless you're trying to mechanically port over a bunch of explicit seek-based code. (You'd think mmap would also give you the option of leaving it as a numpy.memmap instead of a numpy.array, which lets numpy have more control over/feedback from the paging, etc. But it's actually pretty tricky to get a numpy.memmap and an mmap to work together.)
This is what I do when I have to read arbitrary in an heterogeneous binary file.
Numpy allows to interpret a bit pattern in arbitray way by changing the dtype of the array.
The Matlab code in the question reads a char and two uint.
Read this paper (easy reading on user level, not for scientists) on what one can achieve with changing the dtype, stride, dimensionality of an array.
import numpy as np
data = np.arange(10, dtype=np.int)
data.tofile('f')
x = np.fromfile('f', dtype='u1')
print x.size
# 40
second = x[8]
print 'second', second
# second 2
total_cycles = x[8:12]
print 'total_cycles', total_cycles
total_cycles.dtype = np.dtype('u4')
print 'total_cycles', total_cycles
# total_cycles [2 0 0 0] !endianness
# total_cycles [2]
start_cycle = x[12:16]
start_cycle.dtype = np.dtype('u4')
print 'start_cycle', start_cycle
# start_cycle [3]
x.dtype = np.dtype('u4')
print 'x', x
# x [0 1 2 3 4 5 6 7 8 9]
x[3] = 423
print 'start_cycle', start_cycle
# start_cycle [423]
There is a quite new feature of numpy.fromfile()
offset int
The offset (in bytes) from the file’s current position. Defaults to 0. Only permitted for binary files.
New in version 1.17.0.
import numpy as np
import os
data = np.arange(100, dtype=np.int32)
data.tofile("temp") # save the data
x = np.fromfile("temp", dtype=np.int32, offset=256) # use the offset
print (x)
# [64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
# 89 90 91 92 93 94 95 96 97 98 99]

Categories

Resources