Python - Remove Last Line From String - python

I'm trying to capture and manipulate data within a Telnet session using telnetlib, things are going fairly well, however my newbness with Python is causing me some headache.
My issue is pretty straight forward, I am able to capture and display the data I want (So far) however I just seem to be cycling through errors whenever I try to remove the last line of data from the data I have captured. My code goes something like this:
... Snipped Boring Stuff ...
tn.write('command to gather data' + '\r')
data = tn.read_very_eager()
print data
... snipped more boring stuff ...
Pretty simple... So, how in the world would I remove the last line of data from either tn.read_very_eager() or data() ?
Any direction would be awesome... Sorry for the really simple question, but the stuff I've read and tried so far have done nothing but frustrate me, my keyboard can't take much more abuse :)

You can remove the last line of a string like this:
def remove_last_line_from_string(s):
return s[:s.rfind('\n')]
string = "String with\nsome\nnewlines to answer\non StackOverflow.\nThis line gets removed"
print string
string = remove_last_line_from_string(string)
print '\n\n'+string
The output will be:
>>>
String with
some
newlines to answer
on StackOverflow.
This line gets removed
String with
some
newlines to answer
on StackOverflow.
>>>

Related

How to delete comments that are on the same line as the string i need with regex

so im trying to separate the twitter ids into a new file to avoid clutter on the main code file but i want to be able to actually know what the heck each id is so i need to keep the comments in.
iv tried a few ways to get rid of this with regex but the best iv done is just print them :/
heres the code
for line in open("userids.txt"):
lin=re.sub(r'(?m)^ *#.*\n?', '', line)
li=re.search(r'?!#(\w+)', lin)
print(li)
the re.sub finds the line as a WHOLE but i want to preferably just ignore the comment entirely in one line rather than 2 like im trying here. i have tried so many different ways iv found on the internet and none work. this is the only code that even prints back the line.
format in the .txt file is
Twitterid #Twitterhandle
but i would like it to just be
Twitterid
i managed to get this working but the code is probably longer than it needs to be and the regex is more than likely unneeded in this case.
heres the code i used instead that returns JUST the twitter id for each line and also ignores comment lines.
lin=re.sub(r'(?m)^ *#.*\n?', '', line)
clin=lin.strip()
cleanlin=clin.split("#",1)
id=cleanlin[0]
print(id)

Formatting string in python

I want to write to a file in tabular format and following is code I have written till now.
file_out=open("testing_string","w")
file_out.write("{0:<12} {1:<20} {2:<30}\n".format("TUPLE","LOGFILE STATUS","FSDB STATUS"))
file_out.write("{0:12}".format("Check"))
file_out.write("{0:12}".format("_5"))
file_out.close()
Testing_string looks like this.
TUPLE LOGFILE STATUS FSDB STATUS
Check _5
Problem is I want _5 to be with check. Please see that I cannot concatenate check with _5 as check is printed first in file then depeding on some logic I fill LOGFILE STATUS FSDB STATUS. If I am unable to fill status then I check if I have to append _5 or not. so due to this I cannot concatenate string.
How to then print _5 right next to Check?
In a perfect world, you wouldn't do what is given in the below answer. It is hacky and error-prone and really weird. In a perfect world you would figure out how to write out what you want before you actually write to disk. I assume the only reason you are even considering this is that you are maintaining some old and crusty legacy code and cannot do things "the right way".
This is not the most elegant answer, but you can use the backspace character to overwrite something previously written.
with open('test.txt', 'w') as file_out:
file_out.write("{0:<12} {1:<20} {2:<30}\n".format("TUPLE","LOGFILE STATUS","FSDB STATUS"))
file_out.write("{0:12}".format("Check"))
backup_amount = 12 - len("Check")
file_out.write("\b" * backup_amount)
file_out.write("{0:12}".format("_5"))
Output:
TUPLE LOGFILE STATUS FSDB STATUS
Check_5
This only works in this specific case because we are completely overwriting the previously written characters with new characters - the backspace nearly backs up the cursor but does not actually overwrite the previously written data. Observe:
with open('test.txt', 'w') as f:
f.write('hello')
f.write('\b\b')
f.write('p')
Output:
helpo
Since we backspaced two characters but only wrote one, the original second character still exists. You would have to manually write ' ' characters to overwrite these.
Because of this caveat, you will probably have to start messing with the length of the format codes (i.e. '{0:12}' might need to become '{0:5}' or something else) when you add '_5'. It's going to become messy.
The problem is that you are specifying 12 characters for Check. Try this:
file_out=open("testing_string","w")
file_out.write("{0:<12} {1:<20} {2:<30}\n".format("TUPLE","LOGFILE STATUS","FSDB STATUS"))
file_out.write("{0:5}".format("Check"))
file_out.write("{0:7}".format("_5"))
file_out.close()

Splitting long line printed to screen the right way in python

This might be a silly question but I'd like to know how other people handle this or if there's a standard/recommended way of going about it.
Below are two approaches to splitting a long text line when printing it to screen in python. Which one should be used?
Option 1
if some_condition: # Senseless indenting.
if another condition: # Senseless indenting.
print 'This is a very long line of text that goes beyond the 80\n\
character limit.'
Option 2
if some_condition: # Senseless indenting.
if another condition: # Senseless indenting.
print 'This is a very long line of text that goes beyond the 80'
print 'character limit.'
I personally find Option 1 ugly but Option 2 seems like it would go against the pythonic way of keeping things simple by using a second print call.
One way to do it can be with parenthesis:
print ('This is a very long line of text that goes beyond the 80\n'
'character limit.')
Of course, there are several ways of doing it. Another way (as suggested in comments) is the triple quote:
print '''This is a very long line of text that goes beyond the 80
character limit.'''
Personally I don't like that one much because it seems like breaking the indentation, but that's just me.
If you have a long string and want to insert line breaks at appropriate points, the textwrap module provides functionality to do just that. Ex:
import textwrap
def format_long_string(long_string):
wrapper = textwrap.TextWrapper()
wrapper.width = 80
return wrapper.fill(long_string)
long_string = ('This is a really long string that is raw and unformatted '
'that may need to be broken up into little bits')
print format_long_string(long_string)
This results in the following being printed:
This is a really long string that is raw and unformatted that may need to be
broken up into little bits

Extract Text from a Binary File (using Python 2.7 on Windows 7)

I have a binary file of size about 5MB.. which has lots of interspersed text.. and control characters..
This is actually an equivalent of an outlook .pst file for SITATEX Application (from SITA).
The file contains all the TEXT MESSAGES sent and received to and from outside world...(but the text has to be extracted through the binary control characters).. all the text messages are clearly available... with line ending ^M characters... etc.
for example: assume ^# ^X are control characters... \xaa with HEX aa, etc. loads of them around my required text extraction.
^#^#^#^#^#^#^#^#^#^#^#BLLBBCC^X^X^X^X^X^X^X^X^X
^X^X^X
MVT^M
EA1123 TEXT TEXT TEXT^M
END^M
\xaa^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
^#^#^#^#^#^#^#^#^#^#^#TTBBTT^X^X^X^X^X^X^X^X^X
^X^X^X blah blah blah... of control characters.. and then the message comes..
MVT MESSAGE 2
ED1123
etc.
and so on.. for several messages.
Using Perl.. it is easy to do:
while (<>) {
use regular expression to split messages
m/ /
}
How would one do this in python easily..
How to read the file? binary and text interspersed
Eliminate unnecessary control characters
parse the messages in between two \xaa USEFUL TEXT INFORMATION \xaa (HEX 'aa')
print out the required stuff
Loop through all the lines.. and more files.
In the text file sample... I am interested in seeing.. BLLBBCC... and MVT and EA1123 and so on.
Please assist... If it is going to be very difficult in python.. I will have to think through the logic in perl itself.. as it (perl) doesn't throw lots of errors at me at least for the looping part of binary and text stuff.. and the regex.
Thanks.
Update 02Jan after reading your answers/comments
After going through S.Lott's comments and others... This is where I am at.. and it is working 80% ok.
import fileinput
import sys
import re
strfile = r'C:\Users\' \
r'\Learn\python\mvt\sitatex_test.msgs'
f = open(strfile, 'rb')
contents = f.read() # read whole file in contents
#extract the string between two \xaaU.. multiline pattern match
#with look ahead assertion
#and this is stored in a list with all msgs
msgs = re.findall(r'\xaaU.*?(?=\xaaU)', contents, re.I|re.DOTALL|re.M)
for msg in msgs:
#loop through msgs.. to find the first msg then next and so on.
print "## NEW MESSAGE STARTS HERE ##"
#for each msg split the lines.. to read line by line
# stored as list in msglines
msglines = msg.splitlines()
line = 0
#then process each msgline with a message
for msgline in msglines:
line += 1
#msgline = re.sub(r'[\x00]+', r' ', msgline)
mystr = msgline
print mystr
textstrings = re.findall(r'[\x00\x20-\x7E]+', msgline)
So far so good.. still I am not completely done.. because I need to parse the text line by line and word by word.. to pickup (as an example) the origin address and headers, subject line, message body... by parsing the message through the control characters.
Now I am stuck with... how to print line by line with the control characters converted to \x00\x02.. etc (using the \xHH format).. but leave the normal readable text alone.
For example.. say I have this: assume ^# and ^X are some control characters
line1 = '^#UG^#^#^#^#^#^#^#^#^#^#BLLBBCC^X^X^X^X^X^X^X^X^X' (on the first line).
When I print the line as it is on IDLE.. print line1.. it prints only say the first 2 or 3 characters.. and ignores the rest due to the control characters get choked.
However, when I print with this: print re.findall(r'.*', line1)
['\xaaUG\x02\x05\x00\x04\x00\x00\x00\x05\x00\x00\x00....
x00\x00\x00..BLLBBCC\x00\x00N\x00N\\x00
002 010 180000 DEC 11', '']
It prints nicely with all the control characters converted to \xHH format.. and ascii text intact.. (just as I want it)..with one catch.. the list has two items.. with '' in the end.
What is the explanation for the empty string in the end?
How to avoid it... I just want the line converted nicely to a string (not a list). i.e. one line of binary/text to be converted to a string with \xHH codes.. leave the ASCII TEXT alone.
Is using re.findall(r'.*', line1) is the only easy solution.. to do this conversion.. or are there any other straightforward method.. to convert a '\x00string' to \xHH and TEXT (where it is a printable character or whitespace).
Also.. any other useful comments to get the lines out nicely.
Thanks.
Update 2Jan2011 - Part 2
I have found out that re.findall(r'.+', line1) strips to
['\xaaUG\x02\x05\x00\x04\x00\x00\x00\x05\x00\x00\x00....
x00\x00\x00..BLLBBCC\x00\x00N\x00N\\x00
002 010 180000 DEC 11']
without the extra blank '' item in the list. This finding after numerous trial and errors.
Still I will need assistance to eliminate the list altogether but return just a string.
like this:
'\xaaUG\x02\x05\x00\x04..BLLBBCC..002 010 180000 DEC 11'
Added Info on 05Jan:
#John Machin
1) \xaaU is the delimiter between messages.. In the example.. I may have just left out in the samples. Please see below for one actual message that ends with \xaaU (but left out).
Following text is obtained from repr(msg between r'\xaaU.*?(?=\xaaU)')
I am trying to understand the binary format.. this is a typical message which is sent out
the first 'JJJOWXH' is the sender address.. anything that follows that has 7 alphanumeric is the receiver addresses.. Based on the sender address.. I can know whether this is a 'SND' or 'RCV'.. as the source is 'JJJOWXH'... This msg is a 'SND' as we are 'JJJOWXH'.
The message is addressed to: JJJKLXH.... JJJKRXH.... and so on.
As soon as all the.. \x00000000 finishes..
the sita header and subject starts
In this particular case... "\x00QN\x00HX\x00180001 \x00" this is the header.. and I am only interested all the stuff between \x00.
and the body comes next.. after the final \x00 or any other control character... In this case... it is:
COR\r\nMVT \r\nHX9136/17.BLNZ.JJJ\r\nAD2309/2314 EA0128
BBB\r\nDLRA/CI/0032/0022\r\nSI EET 02:14 HRS\r\n RA / 0032 DUE TO
LATE ARVL ACFT\r\n CI / 0022 OFFLOAD OVERHANG PALLET DUE INADEQUATE
PACKING LEADING TO \r\n SPACE PROBLEM
once the readable text ends... the first control character that appears until the end \xaaU is to be ignored... In above cases.. "SPACE PROBLEM".. is the last one.. then control characters starts... so to be ignored... sometimes the control characters are not there till the next \xaaU.
This is one complete message.
"\xaaU\x1c\x04\x02\x00\x05\x06\x1f\x00\x19\x00\x00\x00\xc4\x9d\xedN\x1a\x00?\x02\x02\x00B\x02\x02\x00E\x02\x07\x00\xff\xff\x00\x00\xff\xff\x00\x00\xff\xff\x00\x00M\x02\xec\x00\xff\xff\x00\x00\x00\x00?\x02M\x02\xec\x00\xff\xff\x00\x00\xff\xff\x00\x00\xff\xff\x00\x00\xff\xff\x00\x00\xff\xff\x00\x00:\x03\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7f\x00JJJOWXH\x00\x05w\x01x\x01\x00\x01JJJKLXH\x00\x00\x7f\x01\x80\x01\x00\x01JJJKRXH\x00F\x87\x01\x88\x01\x00\x01JJJFFXH\x00\xff\x8f\x01\x90\x01\x00\x01JJJFCXH\x00\xff\x97\x01\x98\x01\x00\x01JJJFAXH\x00\x00\x9f\x01\xa0\x01\x00\x01JJJKPXH\x00\x00\xa7\x01\xa8\x01\x00\x01HAKUOHU\x00\x00\xaf\x01\xb0\x01\x00\x01BBBHRXH\x00\x00\xb7\x01\xb8\x01\x00\x01BBBFFHX\x00\x00\xbf\x01\xc0\x01\x00\x01BBBOMHX\x00\x00\xc7\x01\xc8\x01\x00\x01BBBFMXH\x00\x00\xcf\x01\xd0\x01\x00\x01JJJHBER\x00\x00\xd7\x01\xd8\x01\x00\x01BBBFRUO\x00\x00\xdf\x01\xe0\x01\x00\x01BBBKKHX\x00\x00\xe7\x01\xe8\x01\x00\x01JJJLOTG\x00\x01\xef\x01\xf0\x01\x00\x01JJJLCTG\x00\x00\xf7\x01\xf8\x01\x00\x01HDQOMTG\x005\xff\x01\x00\x02\x00\x01CHACSHX\x00K\x07\x02\x08\x02\x00\x01JJJKZXH\x00F\x0f\x02\x10\x02\x00\x01BBBOMUO\x00
\x17\x02\x18\x02\x00\x01BBBORXH\x00 \x1f\x02
\x02\x00\x01BBBOPXH\x00W'\x02(\x02\x00\x01CHACSHX\x00
/\x020\x02\x00\x01JJJDBXH\x0007\x028\x02\x00010000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00\x00000000\x00QN\x00HX\x00180001
\x00COR\r\nMVT \r\nHX9136/17.BLNZ.JJJ\r\nAD2309/2314 EA0128
BBB\r\nDLRA/CI/0032/0022\r\nSI EET 02:14 HRS\r\n RA / 0032 DUE TO
LATE ARVL ACFT\r\n CI / 0022 OFFLOAD OVERHANG PALLET DUE INADEQUATE
PACKING LEADING TO \r\n SPACE
PROBLEM\x00D-\xedN\x00\x04\x1a\x00t<\x93\x01x\x00M_\x00"
2) I am not using .+ anymore after the 'repr' is known.
3) each Message is multiline.. and i need to preserve all the control characters to make some sense of this proprietary format.. that is why i needed repr to see it up close.
Hope this explains... This is just 1 message out of 1000s with in the file... and some are 'SND' and some are 'RCV'... and for 'RCV' there will not be '000000'.. and occasionally there are minor exceptions to the rule... but usually that is okay.
Any further suggestions anyone.. I am still working with the file.. to retrieve the text out intact... with sender and receiver addresses.
Thank you.
Python supports regexes too. I don't speak Perl, so I don't know exactly what your Perl code does, but this Python program might help you:
import re
with open('yourfile.pst') as f:
contents = f.read()
textstrings = re.findall(r'[\x20-\x7E]+', contents)
That will get you a list of all strings of one or more ASCII printable characters in the file. That may not be exactly what you want, but you might be able to tweak it from there.
Note that if you're using Python 3, then you have to worry about the distinction between binary and textual data and it becomes a bit more complicated. I'm assuming you're in Python 2.
Q: How to read the file? binary and text interspersed
A: Don't bother, just read it as normal text and you'll be able to keep your binary/text dichotomy (otherwise you won't be able to regex it as easily)
fh = open('/path/to/my/file.ext', 'r')
fh.read()
Just in case you want to read binary later for some reason, you just add a b to the second input of the open:
fh = open('/path/to/my/file.ext', 'rb')
Q: Eliminate unnecessary control characters
A: Use the python re module. Your next question sorta ask how
Q: parse the messages in between two \xaa USEFUL TEXT INFORMATION \xaa (HEX 'aa')
A: re module has a findall function that works as you (mostly) expect.
import re
mytext = '\xaaUseful text that I want to keep\xaa^X^X^X\xaaOther text i like\xaa'
usefultext = re.findall('\xaa([a-zA-Z^!-~0-9 ]+)\xaa', mytext)
Q: print out the required stuff
*A: There's a print function...
print usefultext
Q: Loop through all the lines.. and more files.
fh = open('/some/file.ext','r')
for lines in fh.readlines():
#do stuff
I'll let you figure out the os module to figure out what files exist/how to iterate through them.
You say:
Still I will need assistance to eliminate the list altogether but return just a string. like this
In other words, you have foo = [some_string] and you are doing print foo which as a side does repr(some_string) but encloses it in square brackets which you don't want. So just do print repr(foo[0]).
There seem to be several things unexplained:
You say the useful text is bracketed by \xaaU but in the sample file instead of 2 occurrences of that delimiter there is only \xaa (missingU) near the start, and nothing else.
You say
I have found out that re.findall(r'.+', line1) strips to ...
That in effect is stripping out \n (but not \r!!) -- I thought line breaks would be worth preserving when attempting to recover an email message.
>>> re.findall(r'.+', 'abc\r\ndef\r\n\r\n')
['abc\r', 'def\r', '\r']
What you you done with the \r characters? Have you tested a multi-line message? Have you tested a multi-message file?
One is left to guess who or what is intended to consume your output; you write
I need to parse the text line by line and word by word
but you seem overly concerned with printing the message "legibly" with e.g. \xab instead of gibberish.
It looks like the last 6 or so lines in your latest code (for msgline in msglines: etc etc) should be indented one level.
Is it possible to clarify all of the above?

Sensible python source line wrapping for printout

I am working on a latex document that will require typesetting significant amounts of python source code. I'm using pygments (the python module, not the online demo) to encapsulate this python in latex, which works well except in the case of long individual lines - which simply continue off the page. I could manually wrap these lines except that this just doesn't seem that elegant a solution to me, and I prefer spending time puzzling about crazy automated solutions than on repetitive tasks.
What I would like is some way of processing the python source code to wrap the lines to a certain maximum character length, while preserving functionality. I've had a play around with some python and the closest I've come is inserting \\\n in the last whitespace before the maximum line length - but of course, if this ends up in strings and comments, things go wrong. Quite frankly, I'm not sure how to approach this problem.
So, is anyone aware of a module or tool that can process source code so that no lines exceed a certain length - or at least a good way to start to go about coding something like that?
You might want to extend your current approach a bit, but using the tokenize module from the standard library to determine where to put your line breaks. That way you can see the actual tokens (COMMENT, STRING, etc.) of your source code rather than just the whitespace-separated words.
Here is a short example of what tokenize can do:
>>> from cStringIO import StringIO
>>> from tokenize import tokenize
>>>
>>> python_code = '''
... def foo(): # This is a comment
... print 'foo'
... '''
>>>
>>> fp = StringIO(python_code)
>>>
>>> tokenize(fp.readline)
1,0-1,1: NL '\n'
2,0-2,3: NAME 'def'
2,4-2,7: NAME 'foo'
2,7-2,8: OP '('
2,8-2,9: OP ')'
2,9-2,10: OP ':'
2,11-2,30: COMMENT '# This is a comment'
2,30-2,31: NEWLINE '\n'
3,0-3,4: INDENT ' '
3,4-3,9: NAME 'print'
3,10-3,15: STRING "'foo'"
3,15-3,16: NEWLINE '\n'
4,0-4,0: DEDENT ''
4,0-4,0: ENDMARKER ''
I use the listings package in LaTeX to insert source code; it does syntax highlight, linebreaks et al.
Put the following in your preamble:
\usepackage{listings}
%\lstloadlanguages{Python} # Load only these languages
\newcommand{\MyHookSign}{\hbox{\ensuremath\hookleftarrow}}
\lstset{
% Language
language=Python,
% Basic setup
%basicstyle=\footnotesize,
basicstyle=\scriptsize,
keywordstyle=\bfseries,
commentstyle=,
% Looks
frame=single,
% Linebreaks
breaklines,
prebreak={\space\MyHookSign},
% Line numbering
tabsize=4,
stepnumber=5,
numbers=left,
firstnumber=1,
%numberstyle=\scriptsize,
numberstyle=\tiny,
% Above and beyond ASCII!
extendedchars=true
}
The package has hook for inline code, including entire files, showing it as figures, ...
I'd check a reformat tool in an editor like NetBeans.
When you reformat java it properly fixes the lengths of lines both inside and outside of comments, if the same algorithm were applied to Python, it would work.
For Java it allows you to set any wrapping width and a bunch of other parameters. I'd be pretty surprised if that didn't exist either native or as a plugin.
Can't tell for sure just from the description, but it's worth a try:
http://www.netbeans.org/features/python/

Categories

Resources