I need to use previously developed Perl code in my new python script I am writing. The issue I am running into is that I cannot import the Perl files into the python code using import because they are all written in Perl. Is there a way I can bypass this and still use the perl code somehow with my python code to not have to rewrite 50+ perl programs.
An Example
in a perl file, I could have something that defines an object car that has the color red, make nissan, and model altima, and I need to be able to implement that into my python code.
You cannot reasonably combine Perl and Python code directly. While there are some ways to achieve interoperability (such as writing glue code in C), it will be far from seamless and probably take more effort than rewriting all the code.
What I try to do in that situation is to wrap the old code with a simple command-line interface. Then, the Python code can run the Perl scripts via subprocess.run() or check_output(). If complex data needs to be transferred between the programs, it's often simplest to use JSON. Alternatively, wrapping the old code with a simple REST API might also work, especially if the Perl code has to carry some state across multiple interactions.
This works fine for reusing self-contained pieces of business logic, but not for reusing class definitions.
If you could modify (instead of rewriting) the Perl code, you could have it dump JSON data so that the Python scripts could read it in. Here's an example.
Original Perl script:
use warnings;
use strict;
use JSON;
my $file = 'data.json';
my %h = (
foo => 'bar',
baz => ['a', 'b', 'c'],
raz => {
ABC => 123,
XYZ => [4, 5, 6],
}
);
my $json = encode_json \%h;
open my $fh, '>', $file or die $!;
print $fh $json;
close $fh or die $!;
Python script reading the data:
#!/usr/bin/python
import json
file = 'data.json';
data = json.loads(open(file).read())
print(data)
Essentially, it's data output translation. Dump the data in an agreed format, and read it back in. This way, you need not have to worry about running one program from a language from the other.
Everyone's done this--from the shell, you need some details about a text file (more than just ls -l gives you), in particular, that file's line count, so:
# > wc -l iris.txt
149 iris.txt
i know that i can access shell utilities from python, but i am looking for a python built-in, if there is one.
The crux of my question is getting this information without opening the file (hence my reference to the unix utility *wc -*l)
(is 'sniffing' the correct term for this--i.e., peeking at a file w/o opening it?')
You can always scan through it quickly, right?
lc = sum(1 for l in open('iris.txt'))
No, I would not call this "sniffing". Sniffing typically refers to looking at data as it passes through, like Ethernet packet capture.
You cannot get the number of lines from a file without opening it. This is because the number of lines in the file is actually the number of newline characters ("\n" on linux) in the file, which you have to read after open()ing it.
I am looking for a few scripts which would allow to manipulate generic csv files...
typically something like:
add-row FILENAME INSERT_ROW
get-row FILENAME GREP_ROW
replace-row FILENAME GREP_ROW INSERT_ROW
delete-row FILENAME GREP_ROW
where
FILENAME the name of a csv file, with the first row containing headers, "" used to delimit strings which might contain ','
GREP_ROW a string of pairs field1=value1[,fieldN=valueN,...] used to identify a row based on its fields values in a csv file
INSERT_ROW a string of pairs field1=value1[,fieldN=valueN,...] used to replace(or add) the fields of a row.
peferably in python using the csv package...
ideally leveraging python to associate each field as a variable and allowing more advanced GREP rules like fieldN > XYZ...
Perl has a tradition of in-place editing derived from the unix philosophy.
We could for example write simple add-row-by-num.pl command as follows :
#!/usr/bin/perl -pi
BEGIN { $ln=shift; $line=shift; }
print "$line\n" if $ln==$.;
close ARGV if eof;
Replace the third line by $_="$line\n" if $ln==$.; to replace lines. Eliminate the $line=shift; and replace the third line by $_ = "" if $ln==$.; to delete lines.
We could write a simple add-row-by-regex.pl command as follows :
#!/usr/bin/perl -pi
BEGIN { $regex=shift; $line=shift; }
print "$line\n" if /$regex/;
Or simply the perl command perl -pi -e 'print "LINE\n" if /REGEX/'; FILES. Again, we may replace the print $line by $_="$line\n" or $_ = "" for replace or delete, respectively.
We do not need the close ARGV if eof; line anymore because we need not rest the $. counter after each file is processed.
Is there some reason the ordinary unix grep utility does not suffice? Recall the regular expression (PATERN){n} matches PATERN exactly n times, i.e. (\s*\S+\s*,){6}{\s*777\s*,) demands a 777 in the 7th column.
There is even a perl regular expression to transform your fieldN=value pairs into this regular expression, although I'd use split, map, and join myself.
Btw, File::Inplace provides inplace editing for file handles.
Perl has the DBD::CSV driver, which lets you access a CSV file as if it were an SQL database. I've played with it before, but haven't used it extensively, so I can't give a thorough review of it. If your needs are simple enough, this may work well for you.
App::CCSV does some of that.
The usual way in Python is to use the csv.reader to load the data into a list of tuples, then do your add/replace/get/delete operations on that native python object, and then use csv.writer to write the file back out.
In-place operations on CSV files wouldn't make much sense anyway. Since the records are not typically of fixed length, there is no easy way to insert, delete, or modify a record without moving all the other records at the same time.
That being said, Python's fileinput module has a mode for in-place file updates.
I have a header file in which there is a large struct. I need to read this structure using some program and make some operations on each member of the structure and write them back.
For example I have some structure like
const BYTE Some_Idx[] = {
4,7,10,15,17,19,24,29,
31,32,35,45,49,51,52,54,
55,58,60,64,65,66,67,69,
70,72,76,77,81,82,83,85,
88,93,94,95,97,99,102,103,
105,106,113,115,122,124,125,126,
129,131,137,139,140,149,151,152,
153,155,158,159,160,163,165,169,
174,175,181,182,183,189,190,193,
197,201,204,206,208,210,211,212,
213,214,215,217,218,219,220,223,
225,228,230,234,236,237,240,241,
242,247,249};
Now, I need to read this and apply some operation on each of the member variable and create a new structure with different order, something like:
const BYTE Some_Idx_Mod_mul_2[] = {
8,14,20, ...
...
484,494,498};
Is there any Perl library already available for this? If not Perl, something else like Python is also OK.
Can somebody please help!!!
Keeping your data lying around in a header makes it trickier to get at using other programs like Perl. Another approach you might consider is to keep this data in a database or another file and regenerate your header file as-needed, maybe even as part of your build system. The reason for this is that generating C is much easier than parsing C, it's trivial to write a script that parses a text file and makes a header for you, and such a script could even be invoked from your build system.
Assuming that you want to keep your data in a C header file, you will need one of two things to solve this problem:
a quick one-off script to parse exactly (or close to exactly) the input you describe.
a general, well-written script that can parse arbitrary C and work generally on to lots of different headers.
The first case seems more common than the second to me, but it's hard to tell from your question if this is better solved by a script that needs to parse arbitrary C or a script that needs to parse this specific file. For code that works on your specific case, the following works for me on your input:
#!/usr/bin/perl -w
use strict;
open FILE, "<header.h" or die $!;
my #file = <FILE>;
close FILE or die $!;
my $in_block = 0;
my $regex = 'Some_Idx\[\]';
my $byte_line = '';
my #byte_entries;
foreach my $line (#file) {
chomp $line;
if ( $line =~ /$regex.*\{(.*)/ ) {
$in_block = 1;
my #digits = #{ match_digits($1) };
push #digits, #byte_entries;
next;
}
if ( $in_block ) {
my #digits = #{ match_digits($line) };
push #byte_entries, #digits;
}
if ( $line =~ /\}/ ) {
$in_block = 0;
}
}
print "const BYTE Some_Idx_Mod_mul_2[] = {\n";
print join ",", map { $_ * 2 } #byte_entries;
print "};\n";
sub match_digits {
my $text = shift;
my #digits;
while ( $text =~ /(\d+),*/g ) {
push #digits, $1;
}
return \#digits;
}
Parsing arbitrary C is a little tricky and not worth it for many applications, but maybe you need to actually do this. One trick is to let GCC do the parsing for you and read in GCC's parse tree using a CPAN module named GCC::TranslationUnit.
Here's the GCC command to compile the code, assuming you have a single file named test.c:
gcc -fdump-translation-unit -c test.c
Here's the Perl code to read in the parse tree:
use GCC::TranslationUnit;
# echo '#include <stdio.h>' > stdio.c
# gcc -fdump-translation-unit -c stdio.c
$node = GCC::TranslationUnit::Parser->parsefile('stdio.c.tu')->root;
# list every function/variable name
while($node) {
if($node->isa('GCC::Node::function_decl') or
$node->isa('GCC::Node::var_decl')) {
printf "%s declared in %s\n",
$node->name->identifier, $node->source;
}
} continue {
$node = $node->chain;
}
Sorry if this is a stupid question, but why worry about parsing the file at all? Why not write a C program that #includes the header, processes it as required and then spits out the source for the modified header. I'm sure this would be simpler than the Perl/Python solutions, and it would be much more reliable because the header would be being parsed by the C compilers parser.
You don't really provide much information about how what is to be modified should be determined, but to address your specific example:
$ perl -pi.bak -we'if ( /const BYTE Some_Idx/ .. /;/ ) { s/Some_Idx/Some_Idx_Mod_mul_2/g; s/(\d+)/$1 * 2/ge; }' header.h
Breaking that down, -p says loop through input files, putting each line in $_, running the supplied code, then printing $_. -i.bak enables in-place editing, renaming each original file with a .bak suffix and printing to a new file named whatever the original was. -w enables warnings. -e'....' supplies the code to be run for each input line. header.h is the only input file.
In the perl code, if ( /const BYTE Some_Idx/ .. /;/ ) checks that we are in a range of lines beginning with a line matching /const BYTE Some_Idx/ and ending with a line matching /;/.
s/.../.../g does a substitution as many times as possible. /(\d+)/ matches a series of digits. The /e flag says the result ($1 * 2) is code that should be evaluated to produce a replacement string, instead of simply a replacement string. $1 is the digits that should be replaced.
If all you need to do is to modify structs, you can directly use regex to split and apply changes to each value in the struct, looking for the declaration and the ending }; to know when to stop.
If you really need a more general solution you could use a parser generator, like PyParsing
There is a Perl module called Parse::RecDescent which is a very powerful recursive descent parser generator. It comes with a bunch of examples. One of them is a grammar that can parse C.
Now, I don't think this matters in your case, but the recursive descent parsers using Parse::RecDescent are algorithmically slower (O(n^2), I think) than tools like Parse::Yapp or Parse::EYapp. I haven't checked whether Parse::EYapp comes with such a C-parser example, but if so, that's the tool I'd recommend learning.
Python solution (not full, just a hint ;)) Sorry if any mistakes - not tested
import re
text = open('your file.c').read()
patt = r'(?is)(.*?{)(.*?)(}\s*;)'
m = re.search(patt, text)
g1, g2, g3 = m.group(1), m.group(2), m.group(3)
g2 = [int(i) * 2 for i in g2.split(',')
out = open('your file 2.c', 'w')
out.write(g1, ','.join(g2), g3)
out.close()
There is a really useful Perl module called Convert::Binary::C that parses C header files and converts structs from/to Perl data structures.
You could always use pack / unpack, to read, and write the data.
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
my #data;
{
open( my $file, '<', 'Some_Idx.bin' );
local $/ = \1; # read one byte at a time
while( my $byte = <$file> ){
push #data, unpack('C',$byte);
}
close( $file );
}
print join(',', #data), "\n";
{
open( my $file, '>', 'Some_Idx_Mod_mul_2.bin' );
# You have two options
for my $byte( #data ){
print $file pack 'C', $byte * 2;
}
# or
print $file pack 'C*', map { $_ * 2 } #data;
close( $file );
}
For the GCC::TranslationUnit example see hparse.pl from http://gist.github.com/395160
which will make it into C::DynaLib, and the not yet written Ctypes also.
This parses functions for FFI's, and not bare structs contrary to Convert::Binary::C.
hparse will only add structs if used as func args.
I'm moving a website to Hostmonster and asked where the server log is located so I can automatically scan it for CGI errors. I was told, "We're sorry, but we do not have cgi errors go to any files that you have access to."
For organizational reasons I'm stuck with Hostmonster and this awful policy, so as a workaround I thought maybe I'd modify the CGI scripts to redirect STDERR to a custom log file.
I have a lot of scripts (269) so I need an easy way in both Python and Perl to redirect STDERR to a custom log file.
Something that accounts for file locking either explicitly or implicitly would be great, since a shared CGI error log file could theoretically be written to by more than one script at once if more than one script fails at the same time.
(I want to use a shared error log so I can email its contents to myself nightly and then archive or delete it.)
I know I may have to modify each file (grrr), that's why I'm looking for something elegant that will only be a few lines of code. Thanks.
For Perl, just close and re-open STDERR to point to a file of your choice.
close STDERR;
open STDERR, '>>', '/path/to/your/log.txt'
or die "Couldn't redirect STDERR: $!";
warn "this will go to log.txt";
Alternatively, you could look into a filehandle multiplexer like File::Tee.
Python: cgitb. At the top of your script, before other imports:
import cgitb
cgitb.enable(False, '/home/me/www/myapp/logs/errors')
(‘errors’ being a directory the web server user has write-access to.)
In Perl try CGI::Carp
BEGIN {
use CGI::Carp qw(carpout);
use diagnostics;
open(LOG, ">errors.txt");
carpout(LOG);
close(LOG);
}
use CGI::Carp qw(fatalsToBrowser);
The solution I finally went with was similar to the following, near the top of all my scripts:
Perl:
open(STDERR,">>","/path/to/my/cgi-error.log")
or die "Could not redirect STDERR: $OS_ERROR";
Python:
sys.stderr = open("/path/to/my/cgi-error.log", "a")
Apparently in Perl you don't need to close the STDERR handle before reopening it.
Normally I would close it anyway as a best practice, but as I said in the question, I have 269 scripts and I'm trying to minimize the changes. (Plus it seems more Perlish to just re-open the open filehandle, as awful as that sounds.)
In case anyone else has something similar in the future, here's what I'm going to do for updating all my scripts at once:
Perl:
find . -type f -name "*.pl" -exec perl -pi.bak -e 's%/usr/bin/perl%/usr/bin/perl\nopen(STDERR,">>","/path/to/my/cgi-error.log")\n or die "Could not redirect STDERR: \$OS_ERROR";%' {} \;
Python:
find . -type f -name "*.py" -exec perl -pi.bak -e 's%^(import os, sys.*)%$1\nsys.stderr = open("/path/to/my/cgi-error.log", "a")%' {} \;
The reason I'm posting these commands is that it took me quite a lot of syntactical massaging to get those commands to work (e.g., changing Couldn't to Could not, changing #!/usr/bin/perl to just /usr/bin/perl so the shell wouldn't interpret ! as a history character, using $OS_ERROR instead of $!, etc.)
Thanks to everyone who commented. Since no one answered for both Perl and Python I couldn't really "accept" any of the given answers, but I did give votes to the ones which led me in the right direction. Thanks again!
python:
import sys
sys.stderr = open('file_path_with_write_permission/filename', 'a')
Python has the sys.stderr module that you might want to look into.
>>>help(sys.__stderr__.read)
Help on built-in function read:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
You can store the output of this in a string and write that string to a file.
Hope this helps
In my Perl CGI programs, I usually have
BEGIN {
open(STDERR,'>>','stderr.log');
}
right after shebang line and "use strict;use warnings;". If you want, you may append $0 to file name. But this will not solve multiple programs problem, as several copies of one programs may be run simultaneously. I usually just have several output files, for every program group.