change column numbers in a csv file - python

I have a pipe delimited file with 3 columns
aaa|xyz|pqr
another|column
with
line break | last column
The expected output is :
aaa|xyz|pqr
another|column with line break | last column
If I remove the line breaks then I get a single line like this...
aaa|xyz|pqr another|column with line break | last column
But I need 3 columns on each line.

You can try this awk,
awk -F'|' 'NF!=3{ line=line ? line " " $0 : $0; c=split( line, arr, "|"); if(c == 3){ $0=line; }else{ next } }1' yourfile
More readable awk version:
#!/bin/awk -f
BEGIN{
FS="|";
}
NF!=3{
line=line ? line " " $0 : $0;
c=split( line, arr, "|");
if(c == 3) {
$0=line;
}
else {
next;
}
}1
Test:
$ awk -F'|' 'NF!=3{ line=line ? line " " $0 : $0; c=split( line, arr, "|"); if(c == 3){ $0=line; }else{ next } }1' yourfile
aaa|xyz|pqr
another|column with line break | last column
It is working for your sample input.

Python solution:
import sys
def fix_rows(it, n):
row = ''
for line in it:
if row:
row = row.rstrip('\n') + ' ' + line
else:
row = line
if row.count('|') == n - 1:
yield row
row = ''
if row:
yield row
with open('a.csv') as f:
sys.stdout.writelines(fix_rows(f, 3))
output:
aaa|xyz|pqr
another|column with line break | last column

What you are describing is a three field record following this pattern:
(F1, May have CR) | (F2, May have CR) | (F3, No CR)CR
If F3 ever did have a CR, it would be ambiguous which record is which since you would not know whether the CR terminates the record or is embedded into F3 or the following F1 field.
You can easily parse what I have described with a regex in Perl:
$ perl -e '
$str = do { local $/; <> };
while ($str =~ /^\n?((?:[^|]+\|){2}[^\n]+)/gm){
$_=$1;
s/\n/ /g;
print "$_\n";
}
' /tmp/ac.csv
aaa|xyz|pqr
another|column with line break | last column
Which works by using a regex to separate the records from the stream.
Live regex to show how that works.

Related

parse a structured (structure of machine) text-file (config-file) into a structured table format

main goal is to get from a more or less readable config file into a table format which can be read from everyone witouth deeper understanding of the machine and their configuration standards.
i've got a config file:
******A MANO:111111 ,20190726,001,0914,06621242746
DXS*HAWA776A0A*VA*V0/6*1
ST*001*0001
ID1*HAW250755*VMI1-9900****250755*6*0
CB1*021545*DeBright*7.030.16*3.02*250755
PA1*0*100
PA1*1*60
PA2*2769*166140*210*12600*0*0*0*0
******E MANO:111111 ,20190726,001,0914,06621242746
******A MANO:222222 ,20190726,001,0914,06621242746
DXS*HAWA776A0A*VA*V0/6*1
ST*001*0001
ID1*HAW250755*VMI1-9900****250755*6*0
CB1*021545*DeBright*7.030.16*3.02*250755
PA1*0*100
PA1*1*60
PA2*2769*166140*210*12600*0*0*0*0
******E MANO:222222 ,20190726,001,0914,06621242746
There are several objects in the file always starting with 'A MANO:' and ending with 'E MANO:' followed by the object-number.
all the lines underneath are the attributes of the object (settings of the machine). Not all objects have the same amount of settings. There could be 55 Lines for one object and 199 for another.
what i tried so far:
from pyparsing import *
'''
grammar:
object_nr ::= Word(nums, exact=6)
num ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
'''
path_input = r'\\...\...'
with open(path_input) as input_file:
line = input_file.readline()
cnt = 1
object_nr_parser = Word(nums, exact=6)
for match, start, stop in object_nr_parser.scanString(input_file):
print(match, start, stop)
which gives me the printout:
['201907'] 116 122
['019211'] 172 178
the number it founds and the start and ending points in the string. But this numbers are not what I'm looking for nor correct. i can't even find the second number in the config-file.
is it the right way to solve this with pyparsing or is there a more convenient way to do it? Where did i do the mistake?
At the end it would be astounding if i would have an object for every machine with attributes which would be all the lines between the A MANO: and the E MANO:
expected result would be something like this:
{"object": "111111",
"line1":"DXS*HAWA776A0A*VA*V0/6*1",
"line2":"ST*001*0001",
"line3":"ID1*HAW250755*VMI1-9900****250755*6*0",
"line4":"CB1*021545*DeBright*7.030.16*3.02*250755",
"line5":"PA1*0*100",
"line6":"PA1*1*60",
"line7":"PA2*2769*166140*210*12600*0*0*0*0"},
{"object": "222222",
"line1":"DXS*HAWA776A0A*VA*V0/6*1",
"line2":"ST*001*0001",
"line3":"ID1*HAW250755*VMI1-9900****250755*6*0",
"line4":"CB1*021545*DeBright*7.030.16*3.02*250755",
"line5":"PA1*0*100",
"line6":"PA1*1*60",
"line7":"PA2*2769*166140*210*12600*0*0*0*0",
"line8":"PA2*2769*166140*210*12600*0*0*0*0",
"line9":"PA2*2769*166140*210*12600*0*0*0*0",
"line10":"PA2*2769*166140*210*12600*0*0*0*0"}
Not sure if that is the best solution for the purpose but it's the one that came into mind at this point.
One of the dirtiest ways to get the thing done would be using regex and replace the MANO with line break and all the line breaks with ';'. I don't think that this would be a solution one should use
You can parse it line by line:
import re
with open('file.txt', 'r') as f:
lines = f.readlines()
lines = [x.strip() for x in lines]
result = []
name = ''
i = 1
for line in lines:
if 'A MANO' in line:
name = re.findall('A MANO:(\d+)', line)[0]
result.append({'object': name})
i = 1
elif 'E MANO' not in line:
result[-1][f'line{i}'] = line
i += 1
Output:
[{
'object': '111111',
'line1': 'DXS*HAWA776A0A*VA*V0/6*1',
'line2': 'ST*001*0001',
'line3': 'ID1*HAW250755*VMI1-9900****250755*6*0',
'line4': 'CB1*021545*DeBright*7.030.16*3.02*250755',
'line5': 'PA1*0*100',
'line6': 'PA1*1*60',
'line7': 'PA2*2769*166140*210*12600*0*0*0*0'
}, {
'object': '222222',
'line1': 'DXS*HAWA776A0A*VA*V0/6*1',
'line2': 'ST*001*0001',
'line3': 'ID1*HAW250755*VMI1-9900****250755*6*0',
'line4': 'CB1*021545*DeBright*7.030.16*3.02*250755',
'line5': 'PA1*0*100',
'line6': 'PA1*1*60',
'line7': 'PA2*2769*166140*210*12600*0*0*0*0'
}
]
But I suggest using more compact output format:
import re
with open('file.txt', 'r') as f:
lines = f.readlines()
lines = [x.strip() for x in lines]
result = {}
name = ''
for line in lines:
if 'A MANO' in line:
name = re.findall('A MANO:(\d+)', line)[0]
result[name] = []
elif 'E MANO' not in line:
result[name].append(line)
Output:
{
'111111': ['DXS*HAWA776A0A*VA*V0/6*1', 'ST*001*0001', 'ID1*HAW250755*VMI1-9900****250755*6*0', 'CB1*021545*DeBright*7.030.16*3.02*250755', 'PA1*0*100', 'PA1*1*60', 'PA2*2769*166140*210*12600*0*0*0*0'],
'222222': ['DXS*HAWA776A0A*VA*V0/6*1', 'ST*001*0001', 'ID1*HAW250755*VMI1-9900****250755*6*0', 'CB1*021545*DeBright*7.030.16*3.02*250755', 'PA1*0*100', 'PA1*1*60', 'PA2*2769*166140*210*12600*0*0*0*0']
}

Replace string with custom values and iterate for all occurrences

I am trying to replace some text in a file using either SED, PERL, AWK or a python script. I've tried a couple of things but can't seem to work it out.
I have the following in a text file called data.txt
&st=ALPHA&type=rec&uniId=JIM&acceptCode=123&drainNel=supp&
&st=ALPHA&type=rec&uniId=JIM&acceptCode=167&drainNel=supp&
&st=ALPHA&type=rec&uniId=SARA&acceptCode=231&drainNel=ured&
&st=ALPHA&type=rec&uniId=SARA&acceptCode=344&drainNel=iris&
&st=ALPHA&type=rec&uniId=SARA&acceptCode=349&drainNel=iris&
&st=ALPHA&type=rec&uniId=DAVE&acceptCode=201&drainNel=teef&
1) Script will take an input argument in the form of a number, e.g: 10000
2) I want to replace all the text ALPHA with the given long number as arg and increment by 100 for e.g. if uniId is the same. If it is different it will increment by 5000 for e.g.
3) I want to replace all the acceptCode to change to the first st for all lines with the same uniId
./script 10000
.. still confused? Well, the final result could be this:
&st=10000&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=10100&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=15100&type=rec&uniId=SARA&acceptCode=15100&drainNel=ured&
&st=15200&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=15300&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=20300&type=rec&uniId=DAVE&acceptCode=20300&drainNel=teef&
This ^ should be REPLACED and applied to file data.txt - not just print on screen.
Okay, here's one way, using awk (wrapped in a shell script for convenience because it's a bit too much for a one-liner):
#!/bin/sh
# Usage:
# $./transform.sh [STARTCOUNT] < data.txt > temp.txt
# $ mv -f temp.txt data.txt
awk -F '&' -v "cnt=${1:-10000}" -v 'OFS=&' \
'NR == 1 { ac = cnt; uni = $4; }
NR > 1 && $4 == uni { cnt += 100 }
$4 != uni { cnt += 5000; ac = cnt; uni = $4 }
{ $2 = "st=" cnt; $5 = "acceptCode=" ac; print }'
Running this on a file holding your sample input:
$ ./transform.sh 10000 < data.txt
&st=10000&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=10100&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=15100&type=rec&uniId=SARA&acceptCode=15100&drainNel=ured&
&st=15200&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=15300&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=20300&type=rec&uniId=DAVE&acceptCode=20300&drainNel=teef&
And a perl version that does an in-place edit of the input file:
#!/usr/bin/perl -ani -F'&'
# Usage:
# $ ./transform.pl COUNT datafile
use warnings;
use strict;
use English;
our ($count, $a, $uni);
BEGIN {
$count = shift #ARGV;
die "Missing count argument" unless defined $count and $count =~ /^\d+$/;
$ac = $count;
$uni = "";
$OFS = '&';
}
if ($NR == 1) {
$uni = $F[3];
} elsif ($uni ne $F[3]) {
$count += 5000;
$ac = $count;
$uni = $F[3];
} else {
$count += 100;
}
$F[1] = "st=$count";
$F[4] = "acceptCode=$ac";
print #F;
Running it on your sample input:
$ ./transform.pl 10000 data.txt
$ cat data.txt
&st=10000&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=10100&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=15100&type=rec&uniId=SARA&acceptCode=15100&drainNel=ured&
&st=15200&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=15300&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=20300&type=rec&uniId=DAVE&acceptCode=20300&drainNel=teef&
A few assumptions
Your requirement 2) I want to replace all the text ALPHA with the given long number as arg and increment by 100 for e.g. if uniId is the same. If it is different it will increment by 5000 for e.g._ in conjunction with your example output requires your input data to be sorted on the uniId field. If the file is not sorted, the 100 increments and the 5000 increments will not yield the desired initial values for each uniId
The increment scheme assumes that no one uniId value will have enough records to increment into the next 5000 range set for newly identified uniId values.
#!/usr/bin/env python3
from collections import OrderedDict
import csv
import sys
class TrackingVars(object):
"""
The TrackingVars class manages the business logic for maintaining the
st field counters and the acctCode values for each uniId
"""
def __init__(self, long_number):
self.uniId_table = {}
self.running_counter = long_number
def __initial_value__(self):
"""
The first encounter for a uniId will have st = acctCode
"""
retval = (self.running_counter, self.running_counter)
return retval
def get_uniId(self, id):
"""
A convenience method for returning uniId tracking values
"""
curval, original_value = self.uniId_table.get(id, self.__initial_value__())
return (curval, original_value)
def track(self, uniId):
"""
curval = original_value when a new uniId is encountered.
If the uniId is known, simply increment curval by 100
if the uniId is new and there is at least 1 key in the
tracking table increment curval by 5000
always update tracking variables
"""
curval, original_value = self.get_uniId(uniId)
if uniId in self.uniId_table.keys():
curval = curval + 100
else:
if self.uniId_table:
curval = curval + 5000
original_value = curval
self.running_counter = curval
retval = (curval, original_value)
self.uniId_table[uniId] = retval
return retval
def data_lines(filename):
"""
Read file as input delimited by &
"""
with open(filename, "r", newline=None) as fin:
csvin = csv.reader(fin, delimiter="&")
for row in csvin:
yield row
def transform_data_line(line):
"""
Transform data into key, values pairs
leading and traling & have no valid key, value pairs
"""
head = ("head", None)
tail = ("tail", None)
items = [head]
for field in line[1:-1]:
key, value = field.split("=")
items.append([key, value])
retval = OrderedDict(items)
retval["tail"] = tail
return retval
def process_data_line(record, text_to_replace, tracking_vars):
"""
if st value is ALPHA update record with tracking variables
"""
st = record.get("st")
if st is not None:
if st == text_to_replace:
uniId = record.get("uniId")
curval, original_value = tracking_vars.track(uniId)
record["st"] = curval
record["acceptCode"] = original_value
return record
def process_file():
"""
Get the long number from the command line input.
Initialize the tracking variables.
Process each row of the file.
"""
long_number = sys.argv[1]
tracking_vars = TrackingVars(int(long_number))
for row in data_lines("data.txt"):
record = transform_data_line(row)
retval = process_data_line(record, "ALPHA", tracking_vars)
yield retval
def write(iter_in, filename_out):
"""
Write each row from the iterator to the csv.
make sure the first and last fields are empty.
"""
with open(filename_out, "w", newline=None) as fout:
csvout = csv.writer(fout, delimiter="&")
for row in iter_in:
encoded_row = ["{0}={1}".format(k, v) for k, v in row.items()]
encoded_row[0]=""
encoded_row[-1]=""
csvout.writerow(encoded_row)
if __name__ == "__main__":
write(process_file(), "data.new.txt")
Output
$cat data.net.txt
&st=10000&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=10100&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=15100&type=rec&uniId=SARA&acceptCode=15100&drainNel=ured&
&st=15200&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=15300&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=20300&type=rec&uniId=DAVE&acceptCode=20300&drainNel=teef&
Conclusion
Only you know why the business rules for the incrementing number scheme are the way they are. However having a control break on uniId and the st value dependent upon the previous uniId increment seems problematic to me. You could process unsorted files if each new uniId encountered would start at a new 5000 boundary. For example 15000, 2000, 25000, etc.
P.S
I love the AWK and Perl answers. They are simple and straight forward. They answer the question exactly as it was posed. Now all we need is a SED example :)
just bit more efficient control, in one line gnu awk:
awk -F\& -vi=10000 -vOFS=\& '{if(NR==1) { ac=i; u=$4; } else { if($4==u) i+=100; else { i+=5000; ac=i; u=$4; } }; $2="st=" i; $5 =gensub(/[0-9]+/,ac,1,$5); print } ' data.txt
accept any various string on 5th field.. Thank Shawn.

Python pexpect sendline contents being buffered?

I have the following code in Python using the pexpect module to run a perl script which asks a series of questions. I have used this for quite some time and works well until now.
pexpect python
index = child.expect(['Question 1:'])
os.system('sleep 2')
print 'Question 1:' + 'answer1'
child.sendline('answer1')
index = child.expect(['Question 2:'])
os.system('sleep 2')
print 'Question 1:' + 'answer2'
child.sendline('answer2')
index = child.expect(['Question 3:'])
os.system('sleep 2')
print 'Question 1:' + 'answer2'
child.sendline('answer2')
At this point, i have some code that checks if error will output when Question 2 & Question 3 does not match. I checked the pexpect log and the statements being sent are exactly what i want (number of bytes and string).
However when I check what gets accepted by the perl code its getting the following:
Question 1: 'answer 1' <-- CORRECT
Question 2: 'answer 1' <-- INCORRECT
Question 3: 'answer 2answer 2' <-- its being combined into one for some reason
Any ideas out there? I'm not using anything special when spawning my pexpect object: child=pexpect.spawn(cmd,timeout=140)
Edit: Added the perl code function that asks for Question 2 & 3
sub getpw
sub getpw
{ my ($name, $var, $encoding) = #_;
my $pw = $$var;
PWD: while(1) {
system("/bin/stty -echo");
getvar($name, \$pw, 1);
print "\n";
system("/bin/stty echo");
return if $pw eq $$var && length($pw) == 80;
if (length($pw) > 32) {
print STDERR "ERROR: Password cannot exceed 32 characters, please reenter.\n";
next PWD;
}
return if $pw eq $$var;
my $pw2;
system("/bin/stty -echo");
getvar("Repeat password", \$pw2, 1);
print "\n";
system("/bin/stty echo");
print "#1: ";
print $pw;
print "\n";
print "#2: ";
print $pw2;
if ($pw2 ne $pw) {
print STDERR "ERROR: Passwords do not match, please reenter.\n";
next PWD;
}
last;
}
# Escape dangerous shell characters
$pw =~ s/([ ;\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
my $correctlength=80;
my $encoded=`$AVTAR --quiet --encodepass=$pw`;
chomp($encoded);
if($? == 0 && length($encoded) == $correctlength) {
$$var = $encoded;
} else {
print "Warning: Password could not be encoded.\n";
$$var = $pw;
}
}
sub getvar
sub getvar
{ my ($name, $var, $hide) = #_;
my $default = $$var;
while(1) {
if($default) {
$default = "*****" if $hide;
print "$name [$default]: ";
} else {
print "$name: ";
}
my $val = <STDIN>;
chomp $val;
### $val =~ s/ //g; # do not mess with the password
$$var = $val if $val;
last if $$var;
print "ERROR: You must enter a value\n";
}
}
There is nothing wrong with your code on the python side. Check your perl side of the code. To demonstrate I have created below a python questions and answers scripts that call each other. When you execute answers.py it will produce the expected output as below. Also you don't need to do the sleep statements child.expect will block until the expected output has been received.
questions.py
answers = {}
for i in range(1, 4):
answers[i] = raw_input('Question %s:' % i)
print 'resuls:'
print answers
answers.py
import pexpect
child=pexpect.spawn('./questions.py', timeout=140)
for i in range(1, 4):
index = child.expect('Question %s:' % i)
child.sendline('answer%s' % i)
child.expect('resuls:')
print child.read()
output
{1: 'answer1', 2: 'answer2', 3: 'answer3'}

splitting a diff file using regex in Python

I'm trying to split a diff (unified format) into each section using the re module in python. The format of a diff is like this...
diff --git a/src/core.js b/src/core.js
index 9c8314c..4242903 100644
--- a/src/core.js
+++ b/src/core.js
## -801,7 +801,7 ## jQuery.extend({
return proxy;
},
- // Mutifunctional method to get and set values to a collection
+ // Multifunctional method to get and set values of a collection
// The value/s can optionally be executed if it's a function
access: function( elems, fn, key, value, chainable, emptyGet, pass ) {
var exec,
diff --git a/src/sizzle b/src/sizzle
index fe2f618..feebbd7 160000
--- a/src/sizzle
+++ b/src/sizzle
## -1 +1 ##
-Subproject commit fe2f618106bb76857b229113d6d11653707d0b22
+Subproject commit feebbd7e053bff426444c7b348c776c99c7490ee
diff --git a/test/unit/manipulation.js b/test/unit/manipulation.js
index 18e1b8d..ff31c4d 100644
--- a/test/unit/manipulation.js
+++ b/test/unit/manipulation.js
## -7,7 +7,7 ## var bareObj = function(value) { return value; };
var functionReturningObj = function(value) { return (function() { return value; }); };
test("text()", function() {
- expect(4);
+ expect(5);
var expected = "This link has class=\"blog\": Simon Willison's Weblog";
equal( jQuery("#sap").text(), expected, "Check for merged text of more then one element." );
## -20,6 +20,10 ## test("text()", function() {
frag.appendChild( document.createTextNode("foo") );
equal( jQuery( frag ).text(), "foo", "Document Fragment Text node was retreived from .text().");
+
+ var $newLineTest = jQuery("<div>test<br/>testy</div>").appendTo("#moretests");
+ $newLineTest.find("br").replaceWith("\n");
+ equal( $newLineTest.text(), "test\ntesty", "text() does not remove new lines (#11153)" );
});
test("text(undefined)", function() {
diff --git a/version.txt b/version.txt
index 0a182f2..0330b0e 100644
--- a/version.txt
+++ b/version.txt
## -1 +1 ##
-1.7.2
\ No newline at end of file
+1.7.3pre
\ No newline at end of file
I've tried the following combinations of patterns but can't quite get it right. This is the closest I have come so far...
re.compile(r'(diff.*?[^\rdiff])', flags=re.S|re.M)
but this yields
['diff ', 'diff ', 'diff ', 'diff ']
How would I match all sections in this diff?
This does it:
r=re.compile(r'^(diff.*?)(?=^diff|\Z)', re.M | re.S)
for m in re.findall(r, s):
print '===='
print m
You don't need to use regex, just split the file:
diff_file = open('diff.txt', 'r')
diff_str = diff_file.read()
diff_split = ['diff --git%s' % x for x in diff_str.split('diff --git') \
if x.strip()]
print diff_split
Why are you using regex? How about just iterating over the lines and starting a new section when a line starts with diff?
list_of_diffs = []
temp_diff = ''
for line in patch:
if line.startswith('diff'):
list_of_diffs.append(temp_diff)
temp_diff = ''
else: temp_diff.append(line)
Disclaimer, above code should be considered illustrative pseudocode only and is not expected to actually run.
Regex is a hammer but your problem isn't a nail.
Just split on any linefeed that's followed by the word diff:
result = re.split(r"\n(?=diff\b)", subject)
Though for safety's sake, you probably should try to match \r or \r\n as well:
result = re.split(r"(?:\r\n|[\r\n])(?=diff\b)", subject)

decompress name

what is the easiest way to decompress a data name?
For example, change compressed form:
abc[3:0]
into decompressed form:
abc[3]
abc[2]
abc[1]
abc[0]
preferable 1 liner :)
In Perl:
#!perl -w
use strict;
use 5.010;
my #abc = qw/ a b c d /;
say join( " ", reverse #abc[0..3] );
Or if you wanted them into separate variables:
my( $abc3, $abc2, $abc1, $abc0 ) = reverse #abc[0..3];
Edit: Per your clarification:
my $str = "abc[3:0]";
$str =~ /(abc)\[(\d+):(\d+)\]/;
my $base = $1;
my $from = ( $2 < $3 ? $2 : $3 );
my $to = ( $2 > $3 ? $2 : $3 );
my #strs;
foreach my $num ( $from .. $to ) {
push #strs, $base . '[' . $num . ']';
}
This is a little pyparsing exercise that I've done in the past, adapted to your example (also supports multiple ranges and unpaired indexes, all separated by commas - see the last test case):
from pyparsing import (Suppress, Word, alphas, alphanums, nums, delimitedList,
Combine, Optional, Group)
LBRACK,RBRACK,COLON = map(Suppress,"[]:")
ident = Word(alphas+"_", alphanums+"_")
integer = Combine(Optional('-') + Word(nums))
integer.setParseAction(lambda t : int(t[0]))
intrange = Group(integer + COLON + integer)
rangedIdent = ident("name") + LBRACK + delimitedList(intrange|integer)("indexes") + RBRACK
def expandIndexes(t):
ret = []
for ind in t.indexes:
if isinstance(ind,int):
ret.append("%s[%d]" % (t.name, ind))
else:
offset = (-1,1)[ind[0] < ind[1]]
ret.extend(
"%s[%d]" % (t.name, i) for i in range(ind[0],ind[1]+offset,offset)
)
return ret
rangedIdent.setParseAction(expandIndexes)
print rangedIdent.parseString("abc[0:3]")
print rangedIdent.parseString("abc[3:0]")
print rangedIdent.parseString("abc[0:3,7,14:16,24:20]")
Prints:
['abc[0]', 'abc[1]', 'abc[2]', 'abc[3]']
['abc[3]', 'abc[2]', 'abc[1]', 'abc[0]']
['abc[0]', 'abc[1]', 'abc[2]', 'abc[3]', 'abc[7]', 'abc[14]', 'abc[15]', 'abc[16]', 'abc[24]', 'abc[23]', 'abc[22]', 'abc[21]', 'abc[20]']

Categories

Resources