I need to use previously developed Perl code in my new python script I am writing. The issue I am running into is that I cannot import the Perl files into the python code using import because they are all written in Perl. Is there a way I can bypass this and still use the perl code somehow with my python code to not have to rewrite 50+ perl programs.
An Example
in a perl file, I could have something that defines an object car that has the color red, make nissan, and model altima, and I need to be able to implement that into my python code.
You cannot reasonably combine Perl and Python code directly. While there are some ways to achieve interoperability (such as writing glue code in C), it will be far from seamless and probably take more effort than rewriting all the code.
What I try to do in that situation is to wrap the old code with a simple command-line interface. Then, the Python code can run the Perl scripts via subprocess.run() or check_output(). If complex data needs to be transferred between the programs, it's often simplest to use JSON. Alternatively, wrapping the old code with a simple REST API might also work, especially if the Perl code has to carry some state across multiple interactions.
This works fine for reusing self-contained pieces of business logic, but not for reusing class definitions.
If you could modify (instead of rewriting) the Perl code, you could have it dump JSON data so that the Python scripts could read it in. Here's an example.
Original Perl script:
use warnings;
use strict;
use JSON;
my $file = 'data.json';
my %h = (
foo => 'bar',
baz => ['a', 'b', 'c'],
raz => {
ABC => 123,
XYZ => [4, 5, 6],
}
);
my $json = encode_json \%h;
open my $fh, '>', $file or die $!;
print $fh $json;
close $fh or die $!;
Python script reading the data:
#!/usr/bin/python
import json
file = 'data.json';
data = json.loads(open(file).read())
print(data)
Essentially, it's data output translation. Dump the data in an agreed format, and read it back in. This way, you need not have to worry about running one program from a language from the other.
Related
I have a stripped down real-time Linux box that interfaces with some hardware.
The configuration files are *.dbm files and I cannot access them. They seem to be some sort of key-value database but every library I have tried has come up empty.
I have tried the DBM reading libraries from Perl, Python, and Ruby with no luck. Any guidance on these files would be great, I have not seen them before.
This is what happens when I cat one file out.
DBMFILE Aug 31 2004,�
,jy �
�~���"��+�K&��gB��7JJ�
,��GLOBA.PB_COMBI�SMSI���
JG]
,��BUS_DP
PC �
'
xLokalT
J��
,��SL_DP
PC!�
��
#,��PLC_PARAMJPf,��PROJEKT�PROFIBUS new network1.00022.02.2012J,��KBL_HEADER:�JJp,��KBLJ��,��ALI-SETUPB ����
������������������JJ,,��OBJ-DEFJJ��,��ALI_CLIENTTJJ�
,��ALI_SERVERJ J\r�����2, �� ST_OV_00Boolean0Integer8 0Integer16
0Integer32
0Unsigned8
0Unsigned32Floating-Point0igned16
Octet String Jo� ,��DESCRIPT �ABB OyABB Drives RPBA-01ABBSlave1***reserved***�
�
%
So to show what i've tried already, and only come up with empty objects ( no key-values)*edit
perl -
#!/usr/bin/perl -w
use strict;
use DB_File;
use GDBM_File;
my ($filename, %hash, $flags, $mode, $DB_HASH) = #ARGV;
tie %hash, 'DB_File', [$filename, $flags, $mode, $DB_HASH]
or die "Cannot open $filename: $!\n";
while ( my($key, $value) = each %hash ) {
print "$key = $value\n";
}
# these unties happen automatically at program exit
untie %hash;
which returns nothing
python -
db = dbm.open('file', 'c')
ruby -
db = DBM.open('file', 666, DBM::CREATRW)
Every one of these returned empty. I assume they use the same low level library. Some history/context on DBM files would be great as there seems to be some different versions.
**Edit
running file on it returns
$ file abb12mb_uncontrolledsynch_ppo2_1slave.dbm
abb12mb_uncontrolledsynch_ppo2_1slave.dbm: data
and running strings outputs
$ strings abb12mb_uncontrolledsynch_ppo2_1slave.dbm
DBMFILE
Aug 31 2004
GLOBAL
PB_COMBI
SMSI
BUS_DP
Lokal
SL_DP
PLC_PARAM
PROJEKT
PROFIBUS new network
1 .000
22.02.2012
KBL_HEADER
ALI-SETUP
OBJ-DEF
ALI_CLIENT
ALI_SERVER
ST_OV_0
Boolean
Integer8
Integer16
Integer32
Unsigned8
Unsigned16
Unsigned32
Floating-Point
Octet String
DESCRIPT
ABB Oy
ABB Drives RPBA-01
ABBSlave1
***reserved***
Just to make my comment clear, you should try using the default options for DB_File, like this
use strict;
use warnings;
use DB_File;
my ($filename) = #ARGV;
tie my %dbm, 'DB_File', $filename or die qq{Cannot open DBM file "$filename": $!};
print "$_\n" for keys %dbm;
From the documentation for Perl's dbmopen function:
[This function has been largely superseded by the tie function.]
You probably want to try tieing it with DB_File.
use DB_File;
tie %hash, 'DB_File', $filename, $flags, $mode, $DB_HASH;
Then your data is in %hash.
Might also be interesting to run file against the file to see what it actually is.
We're using a python based application which reads a configuration file containing a couple of arrays:
Example layout of config file:
array1 = [
'bob',
'sue',
'jayne'
]
Currently changes to the configuration are done by hand, but I've written a little interface to streamline the process (mainly to avoid errors).
It currently reads in the existing configuration, using a simple "import". However what I'm not sure how to do, is get my script to write it's output in valid python, so that the main application can read it again.
How can I can dump the array back into the file, but in valid python?
Cheers!
I'd suggest JSON or YAML (Less verbose than JSON) for configuration files. That way, the configuration file becomes more readable for the less pythonate ;) It's also easier to throw adequate errors, e.g. if the configuration is incomplete.
To save python objects you can always use pickle.
Generally using repr() will create a string that can be re-avaluated. But pprint does a little nicer output.
from pprint import pprint
outf.write("array1 = "); pprint(array1, outf)
repr(array1) (and write that into the file) would be a very simple solution, but it should work here.
I have a legacy database which contains simple data structures (no CODE refs thank goodness) that have been written using the nfreeze method of the Storable module in Perl.
Now I have a need to load this data into a Python application. Does anyone know of a Python implementation of Storable's thaw? Google hasn't helped me.
If it comes to it, I can reverse engineer the data format from the Storable source, but I'd prefer to avoid that fun if it's been done already.
To express in code: Given a Perl program like this:
#!/usr/bin/perl
use strict;
use warnings;
use MIME::Base64;
use Storable qw/nfreeze/;
my $data = {
'string' => 'something',
'arrayref' => [1, 2, 'three'],
'hashref' => {
'a' => 'b',
},
};
print encode_base64( nfreeze($data) );
I'm after a magic_function such that this Python:
#!/usr/bin/env python
import base64
import pprint
import sys
def magic_function(frozen):
# A miracle happens
return thawed
if __name__ == '__main__':
frozen = base64.b64decode(sys.stdin.read())
data = magic_function(frozen)
pprint.pprint(data)
prints:
{'string': 'something', 'arrayref': [1, 2, 'three'], 'hashref': {'a': 'b'}}
when run against the output of the Perl program.
It's not immediately clear to me how far along this project is, but it appears to aim to do what you want:
https://pypi.org/project/storable/
If your first option doesn't work, another option would be to write a simple perl script to thaw the data, and then write it out in JSON or YAML or some format that you can easily work with in Python.
In Perl, the interpreter kind of stops when it encounters a line with
__END__
in it. This is often used to embed arbitrary data at the end of a perl script. In this way the perl script can fetch and store data that it stores 'in itself', which allows for quite nice opportunities.
In my case I have a pickled object that I want to store somewhere. While I can use a file.pickle file just fine, I was looking for a more compact approach (to distribute the script more easily).
Is there a mechanism that allows for embedding arbitrary data inside a python script somehow?
With pickle you can also work directly on strings.
s = pickle.dumps(obj)
pickle.loads(s)
If you combine that with """ (triple-quoted strings) you can easily store any pickled data in your file.
If the data is not particularly large (many K) I would just .encode('base64') it and include that in a triple-quoted string, with .decode('base64') to get back the binary data, and a pickle.loads() call around it.
In Python, you can use """ (triple-quoted strings) to embed long runs of text data in your program.
In your case, however, don't waste time on this.
If you have an object you've pickled, you'd be much, much happier dumping that object as Python source and simply including the source.
The repr function, applied to most objects, will emit a Python source-code version of the object. If you implement __repr__ for all of your custom classes, you can trivially dump your structure as Python source.
If, on the other hand, your pickled structure started out as Python code, just leave it as Python code.
I made this code. You run something like python comp.py foofile.tar.gz, and it creates decomp.py, with foofile.tar.gz's contents embedded in it. I don't think this is really portable with windows because of the Popen though.
import base64
import sys
import subprocess
inf = open(sys.argv[1],"r+b").read()
outs = base64.b64encode(inf)
decomppy = '''#!/usr/bin/python
import base64
def decomp(data):
fname = "%s"
outf = open(fname,"w+b")
outf.write(base64.b64decode(data))
outf.close()
# You can put the rest of your code here.
#Like this, to unzip an archive
#import subprocess
#subprocess.Popen("tar xzf " + fname, shell=True)
#subprocess.Popen("rm " + fname, shell=True)
''' %(sys.argv[1])
taildata = '''uudata = """%s"""
decomp(uudata)
''' %(outs)
outpy = open("decomp.py","w+b")
outpy.write(decomppy)
outpy.write(taildata)
outpy.close()
subprocess.Popen("chmod +x decomp.py",shell=True)
I have a script that performs BLAST queries (bl2seq)
The script works like this:
Get sequence a, sequence b
write sequence a to filea
write sequence b to fileb
run command 'bl2seq -i filea -j fileb -n blastn'
get output from STDOUT, parse
repeat 20 million times
The program bl2seq does not support piping.
Is there any way to do this and avoid writing/reading to the harddrive?
I'm using Python BTW.
Depending on what OS you're running on, you may be able to use something like bash's process substitution. I'm not sure how you'd set that up in Python, but you're basically using a named pipe (or named file descriptor). That won't work if bl2seq tries to seek within the files, but it should work if it just reads them sequentially.
How do you know bl2seq does not support piping.? By the way, pipes is an OS feature, not the program. If your bl2seq program outputs something, whether to STDOUT or to a file, you should be able to parse the output. Check the help file of bl2seq for options to output to file as well, eg -o option. Then you can parse the file.
Also, since you are using Python, an alternative you can use is BioPython module.
Is this the bl2seq program from BioPerl? If so, it doesn't look like you can do piping to it. You can, however, code your own hack using Bio::Tools::Run::AnalysisFactory::Pise, which is the recommended way of going about it. You'd have to do it in Perl, though.
If this is a different bl2seq, then disregard the message. In any case, you should probably provide some more detail.
Wow. I have it figured out.
The answer is to use python's subprocess module and pipes!
EDIT: forgot to mention that I'm using blast2 which does support piping.
(this is part of a class)
def _query(self):
from subprocess import Popen, PIPE, STDOUT
pipe = Popen([BLAST,
'-p', 'blastn',
'-d', self.database,
'-m', '8'],
stdin=PIPE,
stdout=PIPE)
pipe.stdin.write('%s\n' % self.sequence)
print pipe.communicate()[0]
where self.database is a string containing the database filename, ie 'nt.fa'
self.sequence is a string containing the query sequence
This prints the output to the screen but you can easily just parse it. No slow disk I/O. No slow XML parsing. I'm going to write a module for this and put it on github.
Also, I haven't gotten this far yet but I think you can do multiple queries so that the blast database does not need to be read and loaded into RAM for each query.
I call blast2 using R script:
....
system("mkfifo seq1")
system("mkfifo seq2")
system("echo sequence1 > seq1"), wait = FALSE)
system("echo sequence2 > seq2"), wait = FALSE)
system("blast2 -p blastp -i seq1 -j seq2 -m 8", intern = TRUE)
....
This is 2 times slower(!) vs. writing and reading from hard drive!