Python:New line at the same postion - python

In python 2.7 how can you achieve the following feature:
print "some text here"+?+"and then it starts there"
the output on terminal should look like:
some text here
and then it starts here
I have searched around and I think \rshould do the work but I tried it out it does not work. I am confused now.
BTW, is the \r solution portable?
P.S.
In my odd situation, I think knowing the length of prev line is quite difficult for me. so any idea rather then using the length of the line above it?
==================================================================================
Okay the situation is like this, I am writing a tree structure and I want to print it out nicely using the __str__ function
class node:
def __init__(self,key,childern):
self.key = key
self.childern = childern
def __str__(self):
return "Node:"+self.key+"Children:"+str(self.childern)
where Children is a list.
Every time it is printing Children, I want it indented using one more than last line. So I think I cannot predict the length before the line I want to print.

\r is probably not a portable solution, the way it is rendered will depend on whatever text editor or terminal you're using. On older Mac systems, '\r' is was used as the end of line character(On windows it is '\r\n' and on linux and OSX it is '\n'.
You could simply do something like this:
def print_lines_at_same_position(*lines):
prev_len = 0
for line in lines:
print " "*prev_len + line
prev_len += len(line)
Usage example:
>>> print_lines_at_same_position("hello", "world", "this is a test")
hello
world
this is a test
>>>
This will only work if whatever you're outputting to has a font with a fixed character length though. I can't think of anything that will work otherwise
Edit to fit changed question
Okay, so that's an entirely different question. I don't think there's any way to do it with it starting at exactly the position where the last line left off unless self.key has a predictable length. But you can get something pretty close with this:
class node:
def __init__(self,key,children):
self.key = key
self.children = children
self.depth = 0
def set_depth(self, depth):
self.depth = depth
for child in self.children:
child.set_depth(depth+1)
def __str__(self):
indent = " "*4*self.depth
children_str = "\n".join(map(str, self.children))
if children_str:
children_str = "\n" + children_str
return indent + "Node: %s%s" % (self.key, children_str)
Then just set the depth of the root node to 0 and do that again every time you change the structure of the tree. There are more efficient ways if you know exactly how you're changing the tree, you can probably figure those out yourself :)
Usage example:
>>> a = node("leaf", [])
>>> b = node("another leaf", [])
>>> c = node("internal", [a,b])
>>> d = node("root", [c])
>>> d.set_depth(0)
>>> print d
Node: root
Node: internal
Node: leaf
Node: another leaf
>>>

You could use os.linesep to get a more portable linebreak, instead of just \r. I would then use len() to calculate the length of the 1st string in order to calculate whitespace.
>>> import os
>>> my_str = "some text here"
>>> print my_str + os.linesep + ' ' * len(my_str) + 'and then it starts here'
some text here
and then it starts here
The key is ' ' * len(my_str). This will repeat the space character len(my_str) times.

The \r solution is not what you are looking for since it is part of the windows newline, but in mac systems it actually is the newline.
You would need code like the following:
def pretty_print(text):
total = 0
for element in text:
print "{}{}".format(' '*total, element)
total += len(element)
pretty_print(["lol", "apples", "are", "fun"])
Which will print the lines of text the way you want them to.

Try using the len("text") * ' ' to get the amount of white space you want.
To get a portable line break, use os.linesep
>>> import os
>>> os.linesep
'\n'
EDIT
Another option that might be suitable in some cases is to override the stdout stream.
import sys, os
class StreamWrap(object):
TAG = '<br>' # use a string that suits your use case
def __init__(self, stream):
self.stream = stream
def write(self, text):
tokens = text.split(StreamWrap.TAG)
indent = 0
for i, token in enumerate(tokens):
self.stream.write(indent*' ' + token)
if i < len(tokens)-1:
self.stream.write(os.linesep)
indent += len(token)
def flush(self):
self.stream.flush()
sys.stdout = StreamWrap(sys.stdout)
print "some text here"+ StreamWrap.TAG +"and then it starts there"
This will give you a result like this:
>>> python test.py
some text here
and then it starts there

Related

Python - Possibly Regex - How to replace part of a filepath with another filepath based on a match?

I'm new to Python and relatively new to programming. I'm trying to replace part of a file path with a different file path. If possible, I'd like to avoid regex as I don't know it. If not, I understand.
I want an item in the Python list [] before the word PROGRAM to be replaced with the 'replaceWith' variable.
How would you go about doing this?
Current Python List []
item1ToReplace1 = \\server\drive\BusinessFolder\PROGRAM\New\new.vb
item1ToReplace2 = \\server\drive\BusinessFolder\PROGRAM\old\old.vb
Variable to replace part of the Python list path
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
Desired results for Python List []:
item1ToReplace1 = C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb
item1ToReplace2 = C:\ProgramFiles\Micosoft\PROGRAM\old\old.vb
Thank you for your help.
The following code does what you ask, note I updated your '' to '\', you probably need to account for the backslash in your code since it is used as an escape character in python.
import os
item1ToReplace1 = '\\server\\drive\\BusinessFolder\\PROGRAM\\New\\new.vb'
item1ToReplace2 = '\\server\\drive\\BusinessFolder\\PROGRAM\\old\\old.vb'
replaceWith = 'C:\ProgramFiles\Microsoft\PROGRAM'
keyword = "PROGRAM\\"
def replacer(rp, s, kw):
ss = s.split(kw,1)
if (len(ss) > 1):
tail = ss[1]
return os.path.join(rp, tail)
else:
return ""
print(replacer(replaceWith, item1ToReplace1, keyword))
print(replacer(replaceWith, item1ToReplace2, keyword))
The code splits on your keyword and puts that on the back of the string you want.
If your keyword is not in the string, your result will be an empty string.
Result:
C:\ProgramFiles\Microsoft\PROGRAM\New\new.vb
C:\ProgramFiles\Microsoft\PROGRAM\old\old.vb
One way would be:
item_ls = item1ToReplace1.split("\\")
idx = item_ls.index("PROGRAM")
result = ["C:", "ProgramFiles", "Micosoft"] + item_ls[idx:]
result = "\\".join(result)
Resulting in:
>>> item1ToReplace1 = r"\\server\drive\BusinessFolder\PROGRAM\New\new.vb"
... # the above
>>> result
'C:\ProgramFiles\Micosoft\PROGRAM\New\new.vb'
Note the use of r"..." in order to avoid needing to have to 'escape the escape characters' of your input (i.e. the \). Also that the join/split requires you to escape these characters with a double backslash.

unicode table information about a character in python

Is there a way in python to get the technical information for a given character like it's displayed in the Unicode table? (cf.https://unicode-table.com/en/)
Example:
for the letter "Ȅ"
Name > Latin Capital Letter E with Double Grave
Unicode number > U+0204
HTML-code > Ȅ
Bloc > Latin Extended-B
Lowercase > ȅ
What I actually need is to get for any Unicode number (like here U+0204) the corresponding name (Latin Capital Letter E with Double Grave) and the lowercase version (here "ȅ").
Roughly:
input = a Unicode number
output = corresponding information
The closest thing I've been able to find is the fontTools library but I can't seem to find any tutorial/documentation on how to use it to do that.
Thank you.
The standard module unicodedata defines a lot of properties, but not everything. A quick peek at its source confirms this.
Fortunately unicodedata.txt, the data file where this comes from, is not hard to parse. Each line consists of exactly 15 elements, ; separated, which makes it ideal for parsing. Using the description of the elements on ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html, you can create a few classes to encapsulate the data. I've taken the names of the class elements from that list; the meaning of each of the elements is explained on that same page.
Make sure to download ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt and ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt first, and put them inside the same folder as this program.
Code (tested with Python 2.7 and 3.6):
# -*- coding: utf-8 -*-
class UnicodeCharacter:
def __init__(self):
self.code = 0
self.name = 'unnamed'
self.category = ''
self.combining = ''
self.bidirectional = ''
self.decomposition = ''
self.asDecimal = None
self.asDigit = None
self.asNumeric = None
self.mirrored = False
self.uc1Name = None
self.comment = ''
self.uppercase = None
self.lowercase = None
self.titlecase = None
self.block = None
def __getitem__(self, item):
return getattr(self, item)
def __repr__(self):
return '{'+self.name+'}'
class UnicodeBlock:
def __init__(self):
self.first = 0
self.last = 0
self.name = 'unnamed'
def __repr__(self):
return '{'+self.name+'}'
class BlockList:
def __init__(self):
self.blocklist = []
with open('Blocks.txt','r') as uc_f:
for line in uc_f:
line = line.strip(' \r\n')
if '#' in line:
line = line.split('#')[0].strip()
if line != '':
rawdata = line.split(';')
block = UnicodeBlock()
block.name = rawdata[1].strip()
rawdata = rawdata[0].split('..')
block.first = int(rawdata[0],16)
block.last = int(rawdata[1],16)
self.blocklist.append(block)
# make 100% sure it's sorted, for quicker look-up later
# (it is usually sorted in the file, but better make sure)
self.blocklist.sort (key=lambda x: block.first)
def lookup(self,code):
for item in self.blocklist:
if code >= item.first and code <= item.last:
return item.name
return None
class UnicodeList:
"""UnicodeList loads Unicode data from the external files
'UnicodeData.txt' and 'Blocks.txt', both available at unicode.org
These files must appear in the same directory as this program.
UnicodeList is a new interpretation of the standard library
'unicodedata'; you may first want to check if its functionality
suffices.
As UnicodeList loads its data from an external file, it does not depend
on the local build from Python (in which the Unicode data gets frozen
to the then 'current' version).
Initialize with
uclist = UnicodeList()
"""
def __init__(self):
# we need this first
blocklist = BlockList()
bpos = 0
self.codelist = []
with open('UnicodeData.txt','r') as uc_f:
for line in uc_f:
line = line.strip(' \r\n')
if '#' in line:
line = line.split('#')[0].strip()
if line != '':
rawdata = line.strip().split(';')
parsed = UnicodeCharacter()
parsed.code = int(rawdata[0],16)
parsed.characterName = rawdata[1]
parsed.category = rawdata[2]
parsed.combining = rawdata[3]
parsed.bidirectional = rawdata[4]
parsed.decomposition = rawdata[5]
parsed.asDecimal = int(rawdata[6]) if rawdata[6] else None
parsed.asDigit = int(rawdata[7]) if rawdata[7] else None
# the following value may contain a slash:
# ONE QUARTER ... 1/4
# let's make it Python 2.7 compatible :)
if '/' in rawdata[8]:
rawdata[8] = rawdata[8].replace('/','./')
parsed.asNumeric = eval(rawdata[8])
else:
parsed.asNumeric = int(rawdata[8]) if rawdata[8] else None
parsed.mirrored = rawdata[9] == 'Y'
parsed.uc1Name = rawdata[10]
parsed.comment = rawdata[11]
parsed.uppercase = int(rawdata[12],16) if rawdata[12] else None
parsed.lowercase = int(rawdata[13],16) if rawdata[13] else None
parsed.titlecase = int(rawdata[14],16) if rawdata[14] else None
while bpos < len(blocklist.blocklist) and parsed.code > blocklist.blocklist[bpos].last:
bpos += 1
parsed.block = blocklist.blocklist[bpos].name if bpos < len(blocklist.blocklist) and parsed.code >= blocklist.blocklist[bpos].first else None
self.codelist.append(parsed)
def find_code(self,codepoint):
"""Find the Unicode information for a codepoint (as int).
Returns:
a UnicodeCharacter class object or None.
"""
# the list is unlikely to contain duplicates but I have seen Unicode.org
# doing that in similar situations. Again, better make sure.
val = [x for x in self.codelist if codepoint == x.code]
return val[0] if val else None
def find_char(self,str):
"""Find the Unicode information for a codepoint (as character).
Returns:
for a single character: a UnicodeCharacter class object or
None.
for a multicharacter string: a list of the above, one element
per character.
"""
if len(str) > 1:
result = [self.find_code(ord(x)) for x in str]
return result
else:
return self.find_code(ord(str))
When loaded, you can now look up a character code with
>>> ul = UnicodeList() # ONLY NEEDED ONCE!
>>> print (ul.find_code(0x204))
{LATIN CAPITAL LETTER E WITH DOUBLE GRAVE}
which by default is shown as the name of a character (Unicode calls this a 'code point'), but you can retrieve other properties as well:
>>> print ('%04X' % uc.find_code(0x204).lowercase)
0205
>>> print (ul.lookup(0x204).block)
Latin Extended-B
and (as long as you don't get a None) even chain them:
>>> print (ul.find_code(ul.find_code(0x204).lowercase))
{LATIN SMALL LETTER E WITH DOUBLE GRAVE}
It does not rely on your particular build of Python; you can always download an updated list from unicode.org and be assured to get the most recent information:
import unicodedata
>>> print (unicodedata.name('\U0001F903'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: no such name
>>> print (uclist.find_code(0x1f903))
{LEFT HALF CIRCLE WITH FOUR DOTS}
(As tested with Python 3.5.3.)
There are currently two lookup functions defined:
find_code(int) looks up character information by codepoint as an integer.
find_char(string) looks up character information for the character(s) in string. If there is only one character, it returns a UnicodeCharacter object; if there are more, it returns a list of objects.
After import unicodelist (assuming you saved this as unicodelist.py), you can use
>>> ul = UnicodeList()
>>> hex(ul.find_char(u'è').code)
'0xe8'
to look up the hex code for any character, and a list comprehension such as
>>> l = [hex(ul.find_char(x).code) for x in 'Hello']
>>> l
['0x48', '0x65', '0x6c', '0x6c', '0x6f']
for longer strings. Note that you don't actually need all of this if all you want is a hex representation of a string! This suffices:
l = [hex(ord(x)) for x in 'Hello']
The purpose of this module is to give easy access to other Unicode properties. A longer example:
str = 'Héllo...'
dest = ''
for i in str:
dest += chr(ul.find_char(i).uppercase) if ul.find_char(i).uppercase is not None else i
print (dest)
HÉLLO...
and showing a list of properties for a character per your example:
letter = u'Ȅ'
print ('Name > '+ul.find_char(letter).name)
print ('Unicode number > U+%04x' % ul.find_char(letter).code)
print ('Bloc > '+ul.find_char(letter).block)
print ('Lowercase > %s' % chr(ul.find_char(letter).lowercase))
(I left out HTML; these names are not defined in the Unicode standard.)
The unicodedata documentation shows how to do most of this.
The Unicode block name is apparently not available but another Stack Overflow question has a solution of sorts and another has some additional approaches using regex.
The uppercase/lowercase mapping and character number information is not particularly Unicode-specific; just use the regular Python string functions.
So in summary
>>> import unicodedata
>>> unicodedata.name('Ë')
'LATIN CAPITAL LETTER E WITH DIAERESIS'
>>> 'U+%04X' % ord('Ë')
'U+00CB'
>>> '&#%i;' % ord('Ë')
'Ë'
>>> 'Ë'.lower()
'ë'
The U+%04X formatting is sort-of correct, in that it simply avoids padding and prints the whole hex number for code points with a value higher than 65,535. Note that some other formats require the use of %08X padding in this scenario (notably \U00010000 format in Python).
You can do this in some ways :
1- create an API yourself ( I can't find anything that do this )
2- create table in database or excel file
3- load and parse a website to do that
I think the 3rd way is very easy. take a look as This Page. you can find some information there Unicodes.
Get your Unicode number and then, find it in web page using parse tools like LXML , Scrapy , Selenium , etc

Find string and replace the next few lines with something

I am writing a Python script that will ask for a file and a name (e.g. "John").
The file contains a whole bunch of lines like this:
...
Name=John
Age=30
Pay=1000
Married=1
Name=Bob
Age=25
Pay=500
Married=0
Name=John
Age=56
Pay=3000
Married=1
...
I want to open this file, ask the user for a name, and replace the pay value for all entries that match that name. So, for example, the user inputs "John", I want to change the Pay for all "John"s to be, say, 5000. The Pay value for other names don't change.
So far, I've opened up the file and concatenated everything into one long string to make things a bit easier:
for line in file:
file_string += line
At first, I was thinking about some sort of string replace but that didn't pan out since I would search for "John" but I don't want to replace the "John", but rather the Pay value that is two lines down.
I started using regex instead and came up with something like this.
# non-greedy matching
re.findall("Name=(.*?)\nAge=(.*?)\nPay=(.*?)\n", file_string, re.S)
Okay, so that spits out a list of 3-tuples of those groupings and it does seem to find everything fine. Now, to do the actual replacement...
I read on another question here on StackOverflow that I can set the name of a grouping and use that grouping later on...:
re.sub(r'Name=(.*?)\nAge=(.*?)\nPay=', r'5000', file_string, re.S)
I tried that to see if it would work and replace all Names with 5000, but it didn't. If it would then I would probably do a check on the first group to see if it matched the user-inputed name or something.
The other problem is that I read on the Python docs that re.sub only replaces the left-most occurrence. I want to replace all occurrences. How do I do that?
Now I am a bit loss of what to do so if anyone can help me that would be great!
I don't think that regex is the best solution to this problem. I prefer more general solutions. The other answers depend on one or more of the following things:
There are always 4 properties for a person.
Every person has the same properties.
The properties are always in the same order.
If these are true in your case, then regex could be ok.
My solution is more verbose, but it isn't depending on these. It handles mixed/missing properties, mixed order, and able to set and get any property value. You could even extend it a little, and support new property or person insertion if you need.
My code:
# i omitted "data = your string" here
def data_value(person_name, prop_name, new_value = None):
global data
start_person = data.find("Name=" + person_name + "\n")
while start_person != -1:
end_person = data.find("Name=", start_person + 1)
start_value = data.find(prop_name + "=", start_person, end_person)
if start_value != -1:
start_value += len(prop_name) + 1
end_value = data.find("\n", start_value, end_person)
if new_value == None:
return data[start_value:end_value]
else:
data = data[:start_value] + str(new_value) + data[end_value:]
start_person = data.find("Name=" + person_name + "\n", end_person)
return None
print data_value("Mark", "Pay") # Output: None (missing person)
print data_value("Bob", "Weight") # Output: None (missing property)
print data_value("Bob", "Pay") # Output: "500" (current value)
data_value("Bob", "Pay", 1234) # (change it)
print data_value("Bob", "Pay") # Output: "1234" (new value)
data_value("John", "Pay", 555) # (change it in both Johns)
Iterate 4 lines at a time. If the first line contains 'John' edit the line that comes two after.
data = """
Name=John
Age=30
Pay=1000
Married=1
Name=Bob
Age=25
Pay=500
Married=0
Name=John
Age=56
Pay=3000
Married=1
"""
lines = data.split()
for i, value in enumerate(zip(*[iter(lines)]*4)):
if 'John' in value[0]:
lines[i*4 + 2] = "Pay=5000"
print '\n'.join(lines)
The following code will do what you need:
import re
text = """
Name=John
Age=30
Pay=1000
Married=1
Name=Bob
Age=25
Pay=500
Married=0
Name=John
Age=56
Pay=3000
Married=1
"""
# the name you're looking for
name = "John"
# the new payment
pay = 500
print re.sub(r'Name={0}\nAge=(.+?)\nPay=(.+?)\n'.format(re.escape(name)), r'Name=\1\nAge=\2\nPay={0}\n'.format(pay), text)

String formatting without index in python2.6

I've got many thousands of lines of python code that has python2.7+ style string formatting (e.g. without indices in the {}s)
"{} {}".format('foo', 'bar')
I need to run this code under python2.6 which requires the indices.
I'm wondering if anyone knows of a painless way allow python2.6 to run this code. It'd be great if there was a from __future__ import blah solution to the problem. I don't see one. Something along those lines would be my first choice.
A distant second would be some script that can automate the process of adding the indices, at least in the obvious cases:
"{0} {1}".format('foo', 'bar')
It doesn't quite preserve the whitespacing and could probably be made a bit smarter, but it will at least identify Python strings (apostrophes/quotes/multi line) correctly without resorting to a regex or external parser:
import tokenize
from itertools import count
import re
with open('your_file') as fin:
output = []
tokens = tokenize.generate_tokens(fin.readline)
for num, val in (token[:2] for token in tokens):
if num == tokenize.STRING:
val = re.sub('{}', lambda L, c=count(): '{{{0}}}'.format(next(c)), val)
output.append((num, val))
print tokenize.untokenize(output) # write to file instead...
Example input:
s = "{} {}".format('foo', 'bar')
if something:
do_something('{} {} {}'.format(1, 2, 3))
Example output (note slightly iffy whitespacing):
s ="{0} {1}".format ('foo','bar')
if something :
do_something ('{0} {1} {2}'.format (1 ,2 ,3 ))
You could define a function to re-format your format strings:
def reformat(s):
return "".join("".join((x, str(i), "}"))
for i, x in list(enumerate(s.split("}")))[:-1])
Maybe a good old sed-regex like:
sed source.py -e 's/{}/%s/g; s/\.format(/ % (/'
your example would get changed to something like:
"%s %s" % ('foo', 'bar')
Granted you loose the fancy new style .format() but imho it's almost never useful for trivial value insertions.
A conversion script could be pretty simple. You can find strings to replace with regex:
fmt = "['\"][^'\"]*{}.*?['\"]\.format"
str1 = "x; '{} {}'.format(['foo', 'bar'])"
str2 = "This is a function; 'First is {}, second is {}'.format(['x1', 'x2']); some more code"
str3 = 'This doesn't have anything but a format. format(x)'
str4 = "This has an old-style format; '{0} {1}'.format(['some', 'list'])"
str5 = "'{0}'.format(1); '{} {}'.format(['x', 'y'])"
def add_format_indices(instr):
text = instr.group(0)
i = 0
while '{}' in text:
text = text.replace('{}', '{%d}'%i, 1)
i = i+1
return text
def reformat_text(text):
return re.sub(fmt, add_format_indices, text)
reformat_text(str1)
"x; '{0} {1}'.format(['foo', 'bar'])"
reformat_text(str2)
"This is a function; 'First is {0}, second is {1}'.format(['x1', 'x2']); some more code"
reformat_text(str3)
"This doesn't have anything but a format. format(x)"
reformat_text(str4)
"This has an old-style format; '{0} {1}'.format(['some', 'list'])"
reformat_text(str5)
"'{0}'.format(1); '{0} {1}'.format(['x', 'y'])"
I think you could throw a whole file through this. You can probably find a faster implementation of add_format_indices, and obviously it hasn't been tested a whole lot.
Too bad there isn't an import __past__, but in general that's not something usually offered (see the 2to3 script for an example), so this is probably your next best option.

python, string.replace() and \n

(Edit: the script seems to work for others here trying to help. Is it because I'm running python 2.7? I'm really at a loss...)
I have a raw text file of a book I am trying to tag with pages.
Say the text file is:
some words on this line,
1
DOCUMENT TITLE some more words here too.
2
DOCUMENT TITLE and finally still more words.
I am trying to use python to modify the example text to read:
some words on this line,
</pg>
<pg n=2>some more words here too,
</pg>
<pg n=3>and finally still more words.
My strategy is to load the text file as a string. Build search-for and a replace-with strings corresponding to a list of numbers. Replace all instances in string, and write to a new file.
Here is the code I've written:
from sys import argv
script, input, output = argv
textin = open(input,'r')
bookstring = textin.read()
textin.close()
pages = []
x = 1
while x<400:
pages.append(x)
x = x + 1
pagedel = "DOCUMENT TITLE"
for i in pages:
pgdel = "%d\n%s" % (i, pagedel)
nplus = i + 1
htmlpg = "</p>\n<p n=%d>" % nplus
bookstring = bookstring.replace(pgdel, htmlpg)
textout = open(output, 'w')
textout.write(bookstring)
textout.close()
print "Updates to %s printed to %s" % (input, output)
The script runs without error, but it also makes no changes whatsoever to the input text. It simply reprints it character for character.
Does my mistake have to do with the hard return? \n? Any help greatly appreciated.
In python, strings are immutable, and thus replace returns the replaced output instead of replacing the string in place.
You must do:
bookstring = bookstring.replace(pgdel, htmlpg)
You've also forgot to call the function close(). See how you have textin.close? You have to call it with parentheses, like open:
textin.close()
Your code works for me, but I might just add some more tips:
Input is a built-in function, so perhaps try renaming that. Although it works normally, it might not for you.
When running the script, don't forget to put the .txt ending:
$ python myscript.py file1.txt file2.txt
Make sure when testing your script to clear the contents of file2.
I hope these help!
Here's an entirely different approach that uses re(import the re module for this to work):
doctitle = False
newstr = ''
page = 1
for line in bookstring.splitlines():
res = re.match('^\\d+', line)
if doctitle:
newstr += '<pg n=' + str(page) + '>' + re.sub('^DOCUMENT TITLE ', '', line)
doctitle = False
elif res:
doctitle = True
page += 1
newstr += '\n</pg>\n'
else:
newstr += line
print newstr
Since no one knows what's going on, it's worth a try.

Categories

Resources