In Python 3, this works when importing from a file:
myInt = 0
while (myInt < 10):
print (myInt, end='')
myInt += 1
print (' are Numerals.')
producing the expected result: "0123456789 are Numerals."
But if the code is pasted directly into an interpreter, the last line produces an exception. In fact, anything after the while block exits does:
File "<stdin>", line 4
print (' are Numerals.')
^
SyntaxError: invalid syntax
[The following is a reply to François in the form of focusing the question.]
It doesn't work in Python 2 using the trailing "," construct either...
Adding a blank line after the while block doesn't work as it clearly won't produce the desired result, namely "0123456789 are Numerals."
Taking Jean-François' lead, however, this produces a similar result:
myInt = 0
res=""
while (myInt < 10):
res += str (myInt)
myInt += 1
res += ' are my Numerals.'
print (res)
But is there any way of forcing the end of the while block in the interpreter which would allow printing (or string compilation, or whatever) to continue?
Well, I can produce the result, when I use else and enter the following, a line at a time:
>>> myInt = 0
>>> while (myInt < 10):
... print (myInt, end='')
... myInt += 1
... else:
... print (' are Numerals.')
...
0123456789 are Numerals.
but when I copy/paste the whole code into the interpreter, the exception is raised. So what is the difference between copy-pasting into the interpreter and typing it a line at a time? I'm even more curious now!
The interpreter expects input in a very specific form for multi-line statements. You can see that by inputting the lines one at a time.
>>> myInt = 0
>>> while (myInt < 10):
... print (myInt, end='')
... myInt += 1
... print (' are Numerals')
After a multi-line statement, the interpreter expects a blank line to signify the end of the block. When it encounters a new, un-indented statement immediately, it's confused.
Adding a blank line after the end of the while loop will allow the interpreter to understand your block. Remember that the Python interpreter always runs one statement at a time. When you copy-paste multiple lines, you're really running them as completely separate statements, and in the interpreter, a multi-line statement has to be terminated by a newline.
I have a file named sample.txt which looks like below
ServiceProfile.SharediFCList[1].DefaultHandling=1
ServiceProfile.SharediFCList[1].ServiceInformation=
ServiceProfile.SharediFCList[1].IncludeRegisterRequest=n
ServiceProfile.SharediFCList[1].IncludeRegisterResponse=n
Here my requirement is to remove the brackets and the integer and enter os commands with that
ServiceProfile.SharediFCList.DefaultHandling=1
ServiceProfile.SharediFCList.ServiceInformation=
ServiceProfile.SharediFCList.IncludeRegisterRequest=n
ServiceProfile.SharediFCList.IncludeRegisterResponse=n
I am quite a newbie in Python. This is my first attempt. I have used these codes to remove the brackets:
#!/usr/bin/python
import re
import os
import sys
f = os.open("sample.txt", os.O_RDWR)
ret = os.read(f, 10000)
os.close(f)
print ret
var1 = re.sub("[\(\[].*?[\)\]]", "", ret)
print var1f = open("removed.cfg", "w+")
f.write(var1)
f.close()
After this using the file as input I want to form application specific commands which looks like this:
cmcli INS "DefaultHandling=1 ServiceInformation="
and the next set as
cmcli INS "IncludeRegisterRequest=n IncludeRegisterRequest=y"
so basically now I want the all the output to be bunched to a set of two for me to execute the commands on the operating system.
Is there any way that I could bunch them up as set of two?
Reading 10,000 bytes of text into a string is really not necessary when your file is line-oriented text, and isn't scalable either. And you need a very good reason to be using os.open() instead of open().
So, treat your data as the lines of text that it is, and every two lines, compose a single line of output.
from __future__ import print_function
import re
command = [None,None]
cmd_id = 1
bracket_re = re.compile(r".+\[\d\]\.(.+)")
# This doesn't just remove the brackets: what you actually seem to want is
# to pick out everything after [1]. and ignore the rest.
with open("removed_cfg","w") as outfile:
with open("sample.txt") as infile:
for line in infile:
m = bracket_re.match(line)
cmd_id = 1 - cmd_id # gives 0, 1, 0, 1
command[cmd_id] = m.group(1)
if cmd_id == 1: # we have a pair
output_line = """cmcli INS "{0} {1}" """.format(*command)
print (output_line, file=outfile)
This gives the output
cmcli INS "DefaultHandling=1 ServiceInformation="
cmcli INS "IncludeRegisterRequest=n IncludeRegisterResponse=n"
The second line doesn't correspond to your sample output. I don't know how the input IncludeRegisterResponse=n is supposed to become the output IncludeRegisterRequest=y. I assume that's a mistake.
Note that this code depends on your input data being precisely as you describe it and has no error checking whatsoever. So if the format of the input is in reality more variable than that, then you will need to add some validation.
I'd like to test the content of a variable containing a byte in a way like this:
line = []
while True:
for c in self.ser.read(): # read() from pySerial
line.append(c)
if c == binascii.unhexlify('0A').decode('utf8'):
print("Line: " + line)
line = []
break
But this does not work...
I'd like also to test, if a byte is empty:
In this case
print(self.ser.read())
prints: b'' (with two single quotes)
I do not until now succeed to test this
if self.ser.read() == b''
or what ever always shows a syntax error...
I know, very basic, but I don't get it...
Thank you for your help. The first part of the question was answerd by #sisanared:
if self.ser.read():
does the test for an empty byte
The second part of the question (the end-of-line with the hex-value 0A) stil doesn't work, but I think it is whise to close this question since the answer to the title is given.
Thank you all
If you want to verify the contents of your variable or string which you want to read from pySerial, use the repr() function, something like:
import serial
import repr as reprlib
from binascii import unhexlify
self.ser = serial.Serial(self.port_name, self.baudrate, self.bytesize, self.parity, self.stopbits, self.timeout, self.xonxoff, self.rtscts)
line = []
while 1:
for c in self.ser.read(): # read() from pySerial
line.append(c)
if if c == b'\x0A':
print("Line: " + line)
print repr(unhexlify(''.join('0A'.split())).decode('utf8'))
line = []
break
I have a code and I just want to have /X/Y/Z/C, /X/Y/Z/D, /X/Y/Z/E back(whatever comes after -tree).
It should actually reads the file, ignores everything till it sees WFS and then get the information in {}, find tree and just gives me the paths back.
I am a beginner in Python. The match pattern doesn't work cause I think the path changes every day.
any help will be appreciated.
The code:
DEFAULTS
{
FS
{
-A AAA
-B
} -aaaaaa
C
{
}
}
D "FW0"
{
}
WFS "C:" XXXX:"/C"
{
-trees
"/X/Y/Z/C"
"/X/Y/Z/D"
"/X/Y/Z/E"
-A AAA
}
A state machine-based lexical analyzer would do the trick reliably.
It recognizes the file's constructs that interest us: nested curly braces, named sections (an identifier and an opening brace on the following line; this one only cares about top-level sections) and clauses (started by -identifier inside a top-level section, possibly followed by data lines and terminated by another clause or the section's end).
Then it keeps reading the file and prints data lines found if they happen to be in the section and clause we're interested in. It also sets a flag upon finding them in order to quit immediately after that clause ends.
f = open("t.txt")
import re
identifier=None
brace_level=0
section=None
clause=None
req_clause_found=False
def in_req_clause(): return section=='WFS' and clause=='trees'
for l in (l.strip() for l in f):
if req_clause_found and not in_req_clause(): break
m=re.match(r'[A-Z]+',l) #adjust if section names can be different
if m and section is None:
identifier=m.group(0)
continue
m=re.match(r'\{(\s|$)',l)
if m:
brace_level+=1
if identifier is not None and brace_level==1:
section=identifier
identifier=None
continue
else: identifier=None
m=re.match(r'\}(\s|$)',l)
if m:
brace_level-=1
if brace_level==0: section=None
clause=None
continue
m=re.match(r'-([A-Za-z]+)',l) #adjust if clause names can be different
if m and brace_level==1:
clause=m.group(1)
continue
m=re.match(r'"(.*)"$',l)
if m and in_req_clause():
print m.group(1)
req_clause_found=True
continue
On the sample, this outputs
/X/Y/Z/C
/X/Y/Z/D
/X/Y/Z/E
I'm a little confused by the layout of your file but is there any reason not to parse it line-by-line?
def parse():
with open('data.txt') as fptr:
for line in fptr:
if line.startswith('WFS'):
for line in fptr:
if line.strip().startswith('-trees'):
result = []
for line in fptr:
if line.strip().startswith('"'):
result.append(line.strip())
else:
return result
That's not pretty but I think it'll work! Let's try it:
In [1]: !cat temp.txt
DEFAULTS
{
FS
{
-A AAA
-B
} -aaaaaa
C
{
}
}
D "FW0"
{
}
WFS "C:" XXXX:"/C"
{
-trees
"/X/Y/Z/C"
"/X/Y/Z/D"
"/X/Y/Z/E"
-A AAA
}
In [2]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:def parse():
: with open('temp.txt') as fptr:
: for line in fptr:
: if line.startswith('WFS'):
: for line in fptr:
: if line.strip().startswith('-trees'):
: result = []
: for line in fptr:
: if line.strip().startswith('"'):
: result.append(line.strip())
: else:
: return result
:
:--
In [3]: parse()
Out[3]: ['"/X/Y/Z/C"', '"/X/Y/Z/D"', '"/X/Y/Z/E"']
I'm not sure what the exact variations of your patters are, but you could use regex groups:
import re
myjunk = open("t.txt", "r")
for line in myjunk:
if re.match('(/[A-Z])*', line)
print line,
You may need to fiddle with the regex a bit, but the important point here is to invest a bit of time learning regex, and you won't have to deal with some of the unnecessarily complicated solutions suggested elsewhere. Regex is a mini language purpose built for so many things related to to text that it's really essential knowledge, even for the python newbie. You'll be glad you put the time in! And the python community is helpful, so why not join IRC and we'll see you in your favorite python channel for real time help.
Best of luck, let me know if you need more help.
PJ
I have a text file which a lot of random occurrences of the string #STRING_A, and I would be interested in writing a short script which removes only some of them. Particularly one that scans the file and once it finds a line which starts with this string like
#STRING_A
then checks if 3 lines backwards there is another occurrence of a line starting with the same string, like
#STRING_A
#STRING_A
and if it happens, to delete the occurrence 3 lines backward. I was thinking about bash, but I do not know how to "go backwards" with it. So I am sure that this is not possible with bash. I also thought about python, but then I should store all information in memory in order to go backwards and then, for long files it would be unfeasible.
What do you think? Is it possible to do it in bash or python?
Thanks
Funny that after all these hours nobody's yet given a solution to the problem as actually phrased (as #John Machin points out in a comment) -- remove just the leading marker (if followed by another such marker 3 lines down), not the whole line containing it. It's not hard, of course -- here's a tiny mod as needed of #truppo's fun solution, for example:
from itertools import izip, chain
f = "foo.txt"
for third, line in izip(chain(" ", open(f)), open(f)):
if third.startswith("#STRING_A") and line.startswith("#STRING_A"):
line = line[len("#STRING_A"):]
print line,
Of course, in real life, one would use an iterator.tee instead of reading the file twice, have this code in a function, not repeat the marker constant endlessly, &c;-).
Of course Python will work as well. Simply store the last three lines in an array and check if the first element in the array is the same as the value you are currently reading. Then delete the value and print out the current array. You would then move over your elements to make room for the new value and repeat. Of course when the array is filled you'd have to make sure to continue to move values out of the array and put in the newly read values, stopping to check each time to see if the first value in the array matches the value you are currently reading.
Here is a more fun solution, using two iterators with a three element offset :)
from itertools import izip, chain, tee
f1, f2 = tee(open("foo.txt"))
for third, line in izip(chain(" ", f1), f2):
if not (third.startswith("#STRING_A") and line.startswith("#STRING_A")):
print line,
Why shouldn't it possible in bash? You don't need to keep the whole file in memory, just the last three lines (if I understood correctly), and write what's appropriate to standard-out. Redirect that into a temporary file, check that everything worked as expected, and overwrite the source file with the temporary one.
Same goes for Python.
I'd provide a script of my own, but that wouldn't be tested. ;-)
As AlbertoPL said, store lines in a fifo for later use--don't "go backwards". For this I would definitely use python over bash+sed/awk/whatever.
I took a few moments to code this snippet up:
from collections import deque
line_fifo = deque()
for line in open("test"):
line_fifo.append(line)
if len(line_fifo) == 4:
# "look 3 lines backward"
if line_fifo[0] == line_fifo[-1] == "#STRING_A\n":
# get rid of that match
line_fifo.popleft()
else:
# print out the top of the fifo
print line_fifo.popleft(),
# don't forget to print out the fifo when the file ends
for line in line_fifo: print line,
This code will scan through the file, and remove lines starting with the marker. It only keeps only three lines in memory by default:
from collections import deque
def delete(fp, marker, gap=3):
"""Delete lines from *fp* if they with *marker* and are followed
by another line starting with *marker* *gap* lines after.
"""
buf = deque()
for line in fp:
if len(buf) < gap:
buf.append(line)
else:
old = buf.popleft()
if not (line.startswith(marker) and old.startswith(marker)):
yield old
buf.append(line)
for line in buf:
yield line
I've tested it with:
>>> from StringIO import StringIO
>>> fp = StringIO('''a
... b
... xxx 1
... c
... xxx 2
... d
... e
... xxx 3
... f
... g
... h
... xxx 4
... i''')
>>> print ''.join(delete(fp, 'xxx'))
a
b
xxx 1
c
d
e
xxx 3
f
g
h
xxx 4
i
This "answer" is for lyrae ... I'll amend my previous comment: if the needle is in the first 3 lines of the file, your script will either cause an IndexError or access a line that it shouldn't be accessing, sometimes with interesting side-effects.
Example of your script causing IndexError:
>>> lines = "#string line 0\nblah blah\n".splitlines(True)
>>> needle = "#string "
>>> for i,line in enumerate(lines):
... if line.startswith(needle) and lines[i-3].startswith(needle):
... lines[i-3] = lines[i-3].replace(needle, "")
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
IndexError: list index out of range
and this example shows not only that the Earth is round but also why your "fix" to the "don't delete the whole line" problem should have used .replace(needle, "", 1) or [len(needle):] instead of .replace(needle, "")
>>> lines = "NEEDLE x NEEDLE y\nnoddle\nnuddle\n".splitlines(True)
>>> needle = "NEEDLE"
>>> # Expected result: no change to the file
... for i,line in enumerate(lines):
... if line.startswith(needle) and lines[i-3].startswith(needle):
... lines[i-3] = lines[i-3].replace(needle, "")
...
>>> print ''.join(lines)
x y <<<=== whoops!
noddle
nuddle
<<<=== still got unwanted newline in here
>>>
My awk-fu has never been that good... but the following may provide you what you're looking for in a bash-shell/shell-utility form:
sed `awk 'BEGIN{ORS=";"}
/#STRING_A/ {
if(LAST!="" && LAST+3 >= NR) print LAST "d"
LAST = NR
}' test_file` test_file
Basically... awk is producing a command for sed to strip certain lines. I'm sure there's a relatively easy way to make awk do all of the processing, but this does seem to work.
The bad part? It does read the test_file twice.
The good part? It is a bash/shell-utility implementation.
Edit: Alex Martelli points out that the sample file above might have confused me. (my above code deletes the whole line, rather than the #STRING_A flag only)
This is easily remedied by adjusting the command to sed:
sed `awk 'BEGIN{ORS=";"}
/#STRING_A/ {
if(LAST!="" && LAST+3 >= NR) print LAST "s/#STRING_A//"
LAST = NR
}' test_file` test_file
This may be what you're looking for?
lines = open('sample.txt').readlines()
needle = "#string "
for i,line in enumerate(lines):
if line.startswith(needle) and lines[i-3].startswith(needle):
lines[i-3] = lines[i-3].replace(needle, "")
print ''.join(lines)
this outputs:
string 0 extra text
string 1 extra text
string 2 extra text
string 3 extra text
--replaced -- 4 extra text
string 5 extra text
string 6 extra text
#string 7 extra text
string 8 extra text
string 9 extra text
string 10 extra text
In bash you can use sort -r filename and tail -n filename to read the file backwards.
$LINES=`tail -n filename | sort -r`
# now iterate through the lines and do your checking
I would consider using sed. gnu sed supports definition of line ranges. if sed would fail, then there is another beast - awk and I'm sure you can do it with awk.
O.K. I feel I should put my awk POC. I could not figure out to use sed addresses. I have not tried combination of awk+sed, but it seems to me it's overkill.
my awk script works as follows:
It reads lines and stores them into 3 line buffer
once desired pattern is found (/^data.*/ in my case), the 3-line buffer is looked up to check, whether desired pattern has been seen three lines ago
if pattern has been seen, then 3 lines are scratched
to be honest, I would probably go with python also, given that awk is really awkward.
the AWK code follows:
function max(a, b)
{
if (a > b)
return a;
else
return b;
}
BEGIN {
w = 0; #write index
r = 0; #read index
buf[0, 1, 2]; #buffer
}
END {
# flush buffer
# start at read index and print out up to w index
for (k = r % 3; k r - max(r - 3, 0); k--) {
#search in 3 line history buf
if (match(buf[k % 3], /^data.*/) != 0) {
# found -> remove lines from history
# by rewriting them -> adjust write index
w -= max(r, 3);
}
}
buf[w % 3] = $0;
w++;
}
/^.*/ {
# store line into buffer, if the history
# is full, print out the oldest one.
if (w > 2) {
print buf[r % 3];
r++;
buf[w % 3] = $0;
}
else {
buf[w] = $0;
}
w++;
}