what is the easiest way to decompress a data name?
For example, change compressed form:
abc[3:0]
into decompressed form:
abc[3]
abc[2]
abc[1]
abc[0]
preferable 1 liner :)
In Perl:
#!perl -w
use strict;
use 5.010;
my #abc = qw/ a b c d /;
say join( " ", reverse #abc[0..3] );
Or if you wanted them into separate variables:
my( $abc3, $abc2, $abc1, $abc0 ) = reverse #abc[0..3];
Edit: Per your clarification:
my $str = "abc[3:0]";
$str =~ /(abc)\[(\d+):(\d+)\]/;
my $base = $1;
my $from = ( $2 < $3 ? $2 : $3 );
my $to = ( $2 > $3 ? $2 : $3 );
my #strs;
foreach my $num ( $from .. $to ) {
push #strs, $base . '[' . $num . ']';
}
This is a little pyparsing exercise that I've done in the past, adapted to your example (also supports multiple ranges and unpaired indexes, all separated by commas - see the last test case):
from pyparsing import (Suppress, Word, alphas, alphanums, nums, delimitedList,
Combine, Optional, Group)
LBRACK,RBRACK,COLON = map(Suppress,"[]:")
ident = Word(alphas+"_", alphanums+"_")
integer = Combine(Optional('-') + Word(nums))
integer.setParseAction(lambda t : int(t[0]))
intrange = Group(integer + COLON + integer)
rangedIdent = ident("name") + LBRACK + delimitedList(intrange|integer)("indexes") + RBRACK
def expandIndexes(t):
ret = []
for ind in t.indexes:
if isinstance(ind,int):
ret.append("%s[%d]" % (t.name, ind))
else:
offset = (-1,1)[ind[0] < ind[1]]
ret.extend(
"%s[%d]" % (t.name, i) for i in range(ind[0],ind[1]+offset,offset)
)
return ret
rangedIdent.setParseAction(expandIndexes)
print rangedIdent.parseString("abc[0:3]")
print rangedIdent.parseString("abc[3:0]")
print rangedIdent.parseString("abc[0:3,7,14:16,24:20]")
Prints:
['abc[0]', 'abc[1]', 'abc[2]', 'abc[3]']
['abc[3]', 'abc[2]', 'abc[1]', 'abc[0]']
['abc[0]', 'abc[1]', 'abc[2]', 'abc[3]', 'abc[7]', 'abc[14]', 'abc[15]', 'abc[16]', 'abc[24]', 'abc[23]', 'abc[22]', 'abc[21]', 'abc[20]']
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I need to merge two sets of files together. I don't mind what language is used as long as it works.
For example:
I have a folder with both sets of files
N-4A-2A_S135_L001_R1_001.fastq.gz
N-4A-2A_S135_L001_R2_001.fastq.gz
N-4A-2A_S135_L001_R1_001-2.fastq.gz
N-4A-2A_S135_L001_R2_001-2.fastq.gz
N-4-Bmp-1_S20_L001_R1_001.fastq.gz
N-4-Bmp-1_S20_L001_R2_001.fastq.gz
N-4-Bmp-1_S20_L001_R1_001-2.fastq.gz
N-4-Bmp-1_S20_L001_R2_001-2.fastq.gz
I need them merged like this:
N-4A-2A-S135-L001_R1_001.fastq.gz + N-4A-2A-S135-L001_R1_001-2.fastq.gz > N-4A-2A-S135-L001_R1_001-cat.fastq.gz
N-4A-2A-S135-L001_R2_001.fastq.gz + N-4A-2A-S135-L001_R2_001-2.fastq.gz > N-4A-2A-S135-L001_R2_001-cat.fastq.gz
N-4-Bmp-1_S20_L001_R1_001.fastq.gz + N-4-Bmp-1_S20_L001_R1_001-2.fastq.gz > N-4-Bmp-1_S20_L001_R1_001-cat.fastq.gz
N-4-Bmp-1_S20_L001_R2_001.fastq.gz + N-4-Bmp-1_S20_L001_R2_001-2.fastq.gz > N-4-Bmp-1_S20_L001_R2_001-cat.fastq.gz
I have a total of ~180 files so any way to automate this would be amazing.
They all follow the same format of:
(sample name) + "R1_001" or "R2_001" + "" or "-2" + .fastq.gz
According to example you provided, your file names look sortable. If this is not the case, this will not work and this answer will be invalid.
Idea of the following snippet is really simple - sort all the filenames and then join adjacent tuples.
This should work in bash:
#!/bin/bash
files=( $( find . -type f | sort -r) )
files_num=${#files[#]}
for (( i=0 ; i < ${files_num} ; i=i+2 )); do
f1="${files[i]}"
f2="${files[i+1]}"
fname=$(basename -s '.gz' "$f1")".cat.gz"
echo "$f1 and $f2 -> $fname"
cat "$f1" "$f2" > "$fname"
done
or simpler version in ZShell (zsh)
#!/bin/zsh
files=( $( find . -type f | sort -r) )
for f1 f2 in $files; do
fname=$(basename -s '.gz' "$f1")".cat.gz"
echo "$f1 and $f2 -> $fname"
cat "$f1" "$f2" > "$fname"
done
You did say any language, so ... I prefer perl
#!/usr/bin/perl
# concatenate files
master(#ARGV);
exit(0);
# master -- master control
sub master
{
my(#argv) = #_;
my($tail,#tails);
local(%tails);
# scan argv for -opt
while (1) {
my($opt) = $argv[0];
last unless (defined($opt));
last unless ($opt =~ s/^-/opt_/);
$$opt = 1;
shift(#argv);
}
# load up the current directory
#tails = dirload(".");
foreach $tail (#tails) {
$tails{$tail} = 0;
}
# look for the base name of a file
foreach $tail (#tails) {
docat($tail)
if ($tail =~ /R[12]_001\.fastq\.gz$/);
}
# default mode is "dry run"
unless ($opt_go) {
printf("\n");
printf("rerun with -go to actually do it\n");
}
}
# docat -- process a pairing
sub docat
{
my($base) = #_;
my($tail);
my($out);
my($cmd);
my($code);
# all commands are joining just two files
$tail = $base;
$tail =~ s/\.fastq/-2.fastq/;
# to an output file
$out = $base;
$out =~ s/\.fastq/-cat.fastq/;
$cmd = "cat $base $tail > $out";
if ($opt_v) {
printf("\n");
printf("IN1: %s\n",$base);
printf("IN2: %s\n",$tail);
printf("OUT: %s\n",$out);
}
else {
printf("%s\n",$cmd);
}
die("docat: duplicate IN1\n")
if ($tails{$base});
$tails{$base} = 1;
die("docat: duplicate IN2\n")
if ($tails{$tail});
$tails{$tail} = 1;
die("docat: duplicate OUT\n")
if ($tails{$out});
$tails{$out} = 1;
{
last unless ($opt_go);
# execute the command and get error code
system($cmd);
$code = $? >> 8;
exit($code) if ($code);
}
}
# dirload -- get list of files in a directory
sub dirload
{
my($dir) = #_;
my($xf);
my($tail);
my(#tails);
# open the directory
opendir($xf,$dir) or
die("dirload: unable to open '$dir' -- $!\n");
# get list of files in the directory excluding "." and ".."
while ($tail = readdir($xf)) {
next if ($tail eq ".");
next if ($tail eq "..");
push(#tails,$tail);
}
closedir($xf);
#tails = sort(#tails);
#tails;
}
Here's the program output with the -v option:
IN1: N-4-Bmp-1_S20_L001_R1_001.fastq.gz
IN2: N-4-Bmp-1_S20_L001_R1_001-2.fastq.gz
OUT: N-4-Bmp-1_S20_L001_R1_001-cat.fastq.gz
IN1: N-4-Bmp-1_S20_L001_R2_001.fastq.gz
IN2: N-4-Bmp-1_S20_L001_R2_001-2.fastq.gz
OUT: N-4-Bmp-1_S20_L001_R2_001-cat.fastq.gz
IN1: N-4A-2A_S135_L001_R1_001.fastq.gz
IN2: N-4A-2A_S135_L001_R1_001-2.fastq.gz
OUT: N-4A-2A_S135_L001_R1_001-cat.fastq.gz
IN1: N-4A-2A_S135_L001_R2_001.fastq.gz
IN2: N-4A-2A_S135_L001_R2_001-2.fastq.gz
OUT: N-4A-2A_S135_L001_R2_001-cat.fastq.gz
rerun with -go to actually do it
I am trying to replace some text in a file using either SED, PERL, AWK or a python script. I've tried a couple of things but can't seem to work it out.
I have the following in a text file called data.txt
&st=ALPHA&type=rec&uniId=JIM&acceptCode=123&drainNel=supp&
&st=ALPHA&type=rec&uniId=JIM&acceptCode=167&drainNel=supp&
&st=ALPHA&type=rec&uniId=SARA&acceptCode=231&drainNel=ured&
&st=ALPHA&type=rec&uniId=SARA&acceptCode=344&drainNel=iris&
&st=ALPHA&type=rec&uniId=SARA&acceptCode=349&drainNel=iris&
&st=ALPHA&type=rec&uniId=DAVE&acceptCode=201&drainNel=teef&
1) Script will take an input argument in the form of a number, e.g: 10000
2) I want to replace all the text ALPHA with the given long number as arg and increment by 100 for e.g. if uniId is the same. If it is different it will increment by 5000 for e.g.
3) I want to replace all the acceptCode to change to the first st for all lines with the same uniId
./script 10000
.. still confused? Well, the final result could be this:
&st=10000&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=10100&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=15100&type=rec&uniId=SARA&acceptCode=15100&drainNel=ured&
&st=15200&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=15300&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=20300&type=rec&uniId=DAVE&acceptCode=20300&drainNel=teef&
This ^ should be REPLACED and applied to file data.txt - not just print on screen.
Okay, here's one way, using awk (wrapped in a shell script for convenience because it's a bit too much for a one-liner):
#!/bin/sh
# Usage:
# $./transform.sh [STARTCOUNT] < data.txt > temp.txt
# $ mv -f temp.txt data.txt
awk -F '&' -v "cnt=${1:-10000}" -v 'OFS=&' \
'NR == 1 { ac = cnt; uni = $4; }
NR > 1 && $4 == uni { cnt += 100 }
$4 != uni { cnt += 5000; ac = cnt; uni = $4 }
{ $2 = "st=" cnt; $5 = "acceptCode=" ac; print }'
Running this on a file holding your sample input:
$ ./transform.sh 10000 < data.txt
&st=10000&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=10100&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=15100&type=rec&uniId=SARA&acceptCode=15100&drainNel=ured&
&st=15200&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=15300&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=20300&type=rec&uniId=DAVE&acceptCode=20300&drainNel=teef&
And a perl version that does an in-place edit of the input file:
#!/usr/bin/perl -ani -F'&'
# Usage:
# $ ./transform.pl COUNT datafile
use warnings;
use strict;
use English;
our ($count, $a, $uni);
BEGIN {
$count = shift #ARGV;
die "Missing count argument" unless defined $count and $count =~ /^\d+$/;
$ac = $count;
$uni = "";
$OFS = '&';
}
if ($NR == 1) {
$uni = $F[3];
} elsif ($uni ne $F[3]) {
$count += 5000;
$ac = $count;
$uni = $F[3];
} else {
$count += 100;
}
$F[1] = "st=$count";
$F[4] = "acceptCode=$ac";
print #F;
Running it on your sample input:
$ ./transform.pl 10000 data.txt
$ cat data.txt
&st=10000&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=10100&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=15100&type=rec&uniId=SARA&acceptCode=15100&drainNel=ured&
&st=15200&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=15300&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=20300&type=rec&uniId=DAVE&acceptCode=20300&drainNel=teef&
A few assumptions
Your requirement 2) I want to replace all the text ALPHA with the given long number as arg and increment by 100 for e.g. if uniId is the same. If it is different it will increment by 5000 for e.g._ in conjunction with your example output requires your input data to be sorted on the uniId field. If the file is not sorted, the 100 increments and the 5000 increments will not yield the desired initial values for each uniId
The increment scheme assumes that no one uniId value will have enough records to increment into the next 5000 range set for newly identified uniId values.
#!/usr/bin/env python3
from collections import OrderedDict
import csv
import sys
class TrackingVars(object):
"""
The TrackingVars class manages the business logic for maintaining the
st field counters and the acctCode values for each uniId
"""
def __init__(self, long_number):
self.uniId_table = {}
self.running_counter = long_number
def __initial_value__(self):
"""
The first encounter for a uniId will have st = acctCode
"""
retval = (self.running_counter, self.running_counter)
return retval
def get_uniId(self, id):
"""
A convenience method for returning uniId tracking values
"""
curval, original_value = self.uniId_table.get(id, self.__initial_value__())
return (curval, original_value)
def track(self, uniId):
"""
curval = original_value when a new uniId is encountered.
If the uniId is known, simply increment curval by 100
if the uniId is new and there is at least 1 key in the
tracking table increment curval by 5000
always update tracking variables
"""
curval, original_value = self.get_uniId(uniId)
if uniId in self.uniId_table.keys():
curval = curval + 100
else:
if self.uniId_table:
curval = curval + 5000
original_value = curval
self.running_counter = curval
retval = (curval, original_value)
self.uniId_table[uniId] = retval
return retval
def data_lines(filename):
"""
Read file as input delimited by &
"""
with open(filename, "r", newline=None) as fin:
csvin = csv.reader(fin, delimiter="&")
for row in csvin:
yield row
def transform_data_line(line):
"""
Transform data into key, values pairs
leading and traling & have no valid key, value pairs
"""
head = ("head", None)
tail = ("tail", None)
items = [head]
for field in line[1:-1]:
key, value = field.split("=")
items.append([key, value])
retval = OrderedDict(items)
retval["tail"] = tail
return retval
def process_data_line(record, text_to_replace, tracking_vars):
"""
if st value is ALPHA update record with tracking variables
"""
st = record.get("st")
if st is not None:
if st == text_to_replace:
uniId = record.get("uniId")
curval, original_value = tracking_vars.track(uniId)
record["st"] = curval
record["acceptCode"] = original_value
return record
def process_file():
"""
Get the long number from the command line input.
Initialize the tracking variables.
Process each row of the file.
"""
long_number = sys.argv[1]
tracking_vars = TrackingVars(int(long_number))
for row in data_lines("data.txt"):
record = transform_data_line(row)
retval = process_data_line(record, "ALPHA", tracking_vars)
yield retval
def write(iter_in, filename_out):
"""
Write each row from the iterator to the csv.
make sure the first and last fields are empty.
"""
with open(filename_out, "w", newline=None) as fout:
csvout = csv.writer(fout, delimiter="&")
for row in iter_in:
encoded_row = ["{0}={1}".format(k, v) for k, v in row.items()]
encoded_row[0]=""
encoded_row[-1]=""
csvout.writerow(encoded_row)
if __name__ == "__main__":
write(process_file(), "data.new.txt")
Output
$cat data.net.txt
&st=10000&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=10100&type=rec&uniId=JIM&acceptCode=10000&drainNel=supp&
&st=15100&type=rec&uniId=SARA&acceptCode=15100&drainNel=ured&
&st=15200&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=15300&type=rec&uniId=SARA&acceptCode=15100&drainNel=iris&
&st=20300&type=rec&uniId=DAVE&acceptCode=20300&drainNel=teef&
Conclusion
Only you know why the business rules for the incrementing number scheme are the way they are. However having a control break on uniId and the st value dependent upon the previous uniId increment seems problematic to me. You could process unsorted files if each new uniId encountered would start at a new 5000 boundary. For example 15000, 2000, 25000, etc.
P.S
I love the AWK and Perl answers. They are simple and straight forward. They answer the question exactly as it was posed. Now all we need is a SED example :)
just bit more efficient control, in one line gnu awk:
awk -F\& -vi=10000 -vOFS=\& '{if(NR==1) { ac=i; u=$4; } else { if($4==u) i+=100; else { i+=5000; ac=i; u=$4; } }; $2="st=" i; $5 =gensub(/[0-9]+/,ac,1,$5); print } ' data.txt
accept any various string on 5th field.. Thank Shawn.
I have a pipe delimited file with 3 columns
aaa|xyz|pqr
another|column
with
line break | last column
The expected output is :
aaa|xyz|pqr
another|column with line break | last column
If I remove the line breaks then I get a single line like this...
aaa|xyz|pqr another|column with line break | last column
But I need 3 columns on each line.
You can try this awk,
awk -F'|' 'NF!=3{ line=line ? line " " $0 : $0; c=split( line, arr, "|"); if(c == 3){ $0=line; }else{ next } }1' yourfile
More readable awk version:
#!/bin/awk -f
BEGIN{
FS="|";
}
NF!=3{
line=line ? line " " $0 : $0;
c=split( line, arr, "|");
if(c == 3) {
$0=line;
}
else {
next;
}
}1
Test:
$ awk -F'|' 'NF!=3{ line=line ? line " " $0 : $0; c=split( line, arr, "|"); if(c == 3){ $0=line; }else{ next } }1' yourfile
aaa|xyz|pqr
another|column with line break | last column
It is working for your sample input.
Python solution:
import sys
def fix_rows(it, n):
row = ''
for line in it:
if row:
row = row.rstrip('\n') + ' ' + line
else:
row = line
if row.count('|') == n - 1:
yield row
row = ''
if row:
yield row
with open('a.csv') as f:
sys.stdout.writelines(fix_rows(f, 3))
output:
aaa|xyz|pqr
another|column with line break | last column
What you are describing is a three field record following this pattern:
(F1, May have CR) | (F2, May have CR) | (F3, No CR)CR
If F3 ever did have a CR, it would be ambiguous which record is which since you would not know whether the CR terminates the record or is embedded into F3 or the following F1 field.
You can easily parse what I have described with a regex in Perl:
$ perl -e '
$str = do { local $/; <> };
while ($str =~ /^\n?((?:[^|]+\|){2}[^\n]+)/gm){
$_=$1;
s/\n/ /g;
print "$_\n";
}
' /tmp/ac.csv
aaa|xyz|pqr
another|column with line break | last column
Which works by using a regex to separate the records from the stream.
Live regex to show how that works.
I would like convert number of files to static string declarations in C. I have trying writing a quick script in Python (shown below), but it doesn't seem exactly simple and a number of issue came up trying to compile the output.
import os, sys
from glob import glob
from re import sub
test_dirs = ('basics', 'float', 'import', 'io', 'misc')
tests = sorted(test_file for test_files in (glob('{}/*.py'.format(dir)) for dir in test_dirs) for test_file in test_files)
def cfunc_name(t):
return sub(r'/|\.|-', '_', t)
for t in tests:
print("void {}(void* data) {{".format(cfunc_name(t)))
with open(t) as f:
lines = ''.join(f.readlines())
cstr = sub('"', '\\"', lines)
cstr = sub('\n', '\"\n\"', cstr)
print(" const char * pystr = \"\"\n\"{}\";".format(cstr))
print("end:\n ;\n}")
print("struct testcase_t core_tests[] = {")
for t in tests:
print(" {{ \"{}\", test_{}_fn, TT_ENABLED_, 0, 0 }},".format(t, cfunc_name(t)))
print("END_OF_TESTCASES };")
Looking for an existing tool is not exactly obvious (may be my search keywords are not quite right)... Is there a simple UNIX tool that does this or has anyone come across something similar?
Does this work for you? https://code.google.com/p/txt2cs/
The main issue I can think of is new lines and escaping if you want to roll your own.
I have used the txt2cs implementation as a reference and ended-up with just a few lines of Python that do all the escaping. As I didn't want to add extra things to the build system, it's easier to have this done in Python. This is going to be integrated in test automation, which is already a complex beast.
The main takeaway is that RE substitutions have to be done in a certain order and aren't the ideal tool for this purpose.
import os, sys
from glob import glob
from re import sub
def escape(s):
lookup = {
'\0': '\\0',
'\t': '\\t',
'\n': '\\n\"\n\"',
'\r': '\\r',
'\\': '\\\\',
'\"': '\\\"',
}
return "\"\"\n\"{}\"".format(''.join([lookup[x] if x in lookup else x for x in s]))
def chew_filename(t):
return { 'func': "test_{}_fn".format(sub(r'/|\.|-', '_', t)), 'desc': t.split('/')[1] }
def script_to_map(t):
r = { 'name': chew_filename(t)['func'] }
with open(t) as f: r['script'] = escape(''.join(f.readlines()))
return r
test_function = (
"void {name}(void* data) {{\n"
" const char * pystr = {script};\n"
" do_str(pystr);\n"
"}}"
)
testcase_struct = (
"struct testcase_t {name}_tests[] = {{\n{body}\n END_OF_TESTCASES\n}};"
)
testcase_member = (
" {{ \"{desc}\", {func}, TT_ENABLED_, 0, 0 }},"
)
testgroup_struct = (
"struct testgroup_t groups[] = {{\n{body}\n END_OF_GROUPS\n}};"
)
testgroup_member = (
" {{ \"{name}/\", {name}_tests }},"
)
test_dirs = ('basics', 'float', 'import', 'io', 'misc')
output = []
for group in test_dirs:
tests = glob('{}/*.py'.format(group))
output.extend([test_function.format(**script_to_map(test)) for test in tests])
testcase_members = [testcase_member.format(**chew_filename(test)) for test in tests]
output.append(testcase_struct.format(name=group, body='\n'.join(testcase_members)))
testgroup_members = [testgroup_member.format(name=group) for group in test_dirs]
output.append(testgroup_struct.format(body='\n'.join(testgroup_members)))
print('\n\n'.join(output))
Below is what the output looks like, as you can see the initial ""\n and '\\n\"\n\"' make it quite a bit more readable:
void test_basics_break_py_fn(void* data) {
const char * pystr = ""
"while True:\n"
" break\n"
"\n"
"for i in range(4):\n"
" print('one', i)\n"
" if i > 2:\n"
" break\n"
" print('two', i)\n"
"\n"
"for i in [1, 2, 3, 4]:\n"
" if i == 3:\n"
" break\n"
" print(i)\n"
"";
do_str(pystr);
}
I'm trying to split a diff (unified format) into each section using the re module in python. The format of a diff is like this...
diff --git a/src/core.js b/src/core.js
index 9c8314c..4242903 100644
--- a/src/core.js
+++ b/src/core.js
## -801,7 +801,7 ## jQuery.extend({
return proxy;
},
- // Mutifunctional method to get and set values to a collection
+ // Multifunctional method to get and set values of a collection
// The value/s can optionally be executed if it's a function
access: function( elems, fn, key, value, chainable, emptyGet, pass ) {
var exec,
diff --git a/src/sizzle b/src/sizzle
index fe2f618..feebbd7 160000
--- a/src/sizzle
+++ b/src/sizzle
## -1 +1 ##
-Subproject commit fe2f618106bb76857b229113d6d11653707d0b22
+Subproject commit feebbd7e053bff426444c7b348c776c99c7490ee
diff --git a/test/unit/manipulation.js b/test/unit/manipulation.js
index 18e1b8d..ff31c4d 100644
--- a/test/unit/manipulation.js
+++ b/test/unit/manipulation.js
## -7,7 +7,7 ## var bareObj = function(value) { return value; };
var functionReturningObj = function(value) { return (function() { return value; }); };
test("text()", function() {
- expect(4);
+ expect(5);
var expected = "This link has class=\"blog\": Simon Willison's Weblog";
equal( jQuery("#sap").text(), expected, "Check for merged text of more then one element." );
## -20,6 +20,10 ## test("text()", function() {
frag.appendChild( document.createTextNode("foo") );
equal( jQuery( frag ).text(), "foo", "Document Fragment Text node was retreived from .text().");
+
+ var $newLineTest = jQuery("<div>test<br/>testy</div>").appendTo("#moretests");
+ $newLineTest.find("br").replaceWith("\n");
+ equal( $newLineTest.text(), "test\ntesty", "text() does not remove new lines (#11153)" );
});
test("text(undefined)", function() {
diff --git a/version.txt b/version.txt
index 0a182f2..0330b0e 100644
--- a/version.txt
+++ b/version.txt
## -1 +1 ##
-1.7.2
\ No newline at end of file
+1.7.3pre
\ No newline at end of file
I've tried the following combinations of patterns but can't quite get it right. This is the closest I have come so far...
re.compile(r'(diff.*?[^\rdiff])', flags=re.S|re.M)
but this yields
['diff ', 'diff ', 'diff ', 'diff ']
How would I match all sections in this diff?
This does it:
r=re.compile(r'^(diff.*?)(?=^diff|\Z)', re.M | re.S)
for m in re.findall(r, s):
print '===='
print m
You don't need to use regex, just split the file:
diff_file = open('diff.txt', 'r')
diff_str = diff_file.read()
diff_split = ['diff --git%s' % x for x in diff_str.split('diff --git') \
if x.strip()]
print diff_split
Why are you using regex? How about just iterating over the lines and starting a new section when a line starts with diff?
list_of_diffs = []
temp_diff = ''
for line in patch:
if line.startswith('diff'):
list_of_diffs.append(temp_diff)
temp_diff = ''
else: temp_diff.append(line)
Disclaimer, above code should be considered illustrative pseudocode only and is not expected to actually run.
Regex is a hammer but your problem isn't a nail.
Just split on any linefeed that's followed by the word diff:
result = re.split(r"\n(?=diff\b)", subject)
Though for safety's sake, you probably should try to match \r or \r\n as well:
result = re.split(r"(?:\r\n|[\r\n])(?=diff\b)", subject)