Delete last character in stdout buffer - python

In Python 3.7, I am trying to write a JSON array to stdout and I would like to remove the final comma in the array
sys.stdout.write("[")
[sys.stdout.write(json.dumps(x, separators=(',', ': ')) + ",") for x in list]
sys.stdout.write("\b]") # I want to remove the final ',' from above.
I know sys.stdout is buffered, so what I'd like to do is remove the last character in that buffer before the flush. The only problem is I don't know how to properly access that buffer as well as ensure the final byte is not written.
I messed with the \b character however that does nothing, all that happens is the \b character becomes part of the output.
As a background, the stdout is going into an Apache NiFi flow (not to a console window). I'd much rather use stdout and not a secondary in-memory buffer as that feels like such a waste of memory. It'd be great if I could remove the last byte of the stdout buffer before flushing.
EDIT:
Some folks in the comments are suggesting my use of list comprehensions isn't the way to go here and instead run json.dumps on the list. If anyone has an example of how to use this and ensure the last element doesn't have a trailing comma, please show it!

The simplest solution is just to dump the whole list at once:
sys.stdout.write(json.dumps(list, separators=(',', ': '))
But if you really need to write each element separately you could make the comma conditional:
last_index = len(list) - 1
sys.stdout.write("[")
for i, x in enumerate(list):
sys.stdout.write(json.dumps(x, separators=(',', ': '))
if i < last_index:
sys.stdout.write(',')
sys.stdout.write("]")

Related

Python - \n appearing in concatenated strings

I've been having an issue with my Python code. I am trying to concatenate the value of two string objects, but when I run the code it keeps printing a '\n' between the two strings.
My code:
while i < len(valList):
curVal = valList[i]
print(curVal)
markupConstant = 'markup.txt'
markupFileName = curVal + markupConstant
markupFile = open(markupFileName)
Now when I run this, it gives me this error:
OSError: [Errno 22] Invalid argument: 'cornWhiteTrimmed\nmarkup.txt'
See that \n between the two strings? I've dissected the code a bit by printing each string individually, and neither one contains a \n on its own. Any ideas as to what I'm doing wrong?
Thanks in advance!
The concatenation itself doesn't add the \n for sure. valList is probably the result of calling readlines() on a file object, so each element in it will have a trailing \n. Call strip on each element before using it:
while i < len(valList):
curVal = valList[i].strip()
print(curVal)
markupConstant = 'markup.txt'
markupFileName = curVal + markupConstant
markupFile = open(markupFileName)
The reason you are not seeing the \n when you actually print out the python statements is because \n is technically the newline character. You will not see this when you actually print, it will only skip to a new line. The problem is when you have this in the middle of your two strings, it is going to cause problems. The solution to your issue is the strip method. You can read into its documentation here (https://www.tutorialspoint.com/python/string_strip.htm) but basically you can use this method to strip the newline character off of any of your strings.
Just to make an addition to the other answers explaining why this came about:
When you need to actually inspect what characters a string contains, you can't simply print it. Many characters are "invisible" when printed.
Turn the string into a list first:
list(curVal)
Or my personal favorite:
[c for c in curVal]
These will create lists that properly show all hard to see characters.

Python 3 print to file creating errant new lines

I wrote code to read in url and IP data with IP as the key for urls visited. I am attempting to print the IP key then the number of url visits for each.
The problem is that when printing to my file there is a new line after some IPs.
Here is the output section of code:
`for key, value in ipVisit.items():
outputF.write(key + " " + str(len(ipVisit[key]))+ '\n' )`
Even if I increase or decrease the number of spaces between key and # of visits the third output is always the only one to be on one line. Here is the output:
194.33.212.111
28
12.65.4.100
28
205.23.104.49 31
205.23.104.49
29
Did I do something stupid with my loop? How can I fix this?
One thing I've found to be very helpful when writing to files is to ignore the write method entirely:
for key, value in ipVisit.items():
print(key + " " + str(len(ipVisit[key])), file=outputF)
This has the possibly-great side effect of outputting to stdout if outputF==None, which I've taken advantage of for command line programs in the past (passing in the output file vs. - or something).
Using print, you'll get the newline semantics that you're familiar with and the commenter's suggestion of .rstrip() will take care of any leftover errant newline characters.
EDIT: It might also be wise to avoid string building with the + operator and instead use the format method. Also, you have the value already form your for loop, there's no need to index into ipVisit again:
for key, value in ipVisit.items():
print('{} {}'.format(key, len(value)), file=outputF)
# or rstrip if there's still extra newlines
print('{} {}'.format(key.rstrip(), len(value)), file=outputF) # this will only work if you're sure `key` is a str

write() versus writelines() and concatenated strings

So I'm learning Python. I am going through the lessons and ran into a problem where I had to condense a great many target.write() into a single write(), while having a "\n" between each user input variable(the object of write()).
I came up with:
nl = "\n"
lines = line1, nl, line2, nl, line3, nl
textdoc.writelines(lines)
If I try to do:
textdoc.write(lines)
I get an error. But if I type:
textdoc.write(line1 + "\n" + line2 + ....)
Then it works fine. Why am I unable to use a string for a newline in write() but I can use it in writelines()?
Python 2.7
writelines expects an iterable of strings
write expects a single string.
line1 + "\n" + line2 merges those strings together into a single string before passing it to write.
Note that if you have many lines, you may want to use "\n".join(list_of_lines).
Why am I unable to use a string for a newline in write() but I can use it in writelines()?
The idea is the following: if you want to write a single string you can do this with write(). If you have a sequence of strings you can write them all using writelines().
write(arg) expects a string as argument and writes it to the file. If you provide a list of strings, it will raise an exception (by the way, show errors to us!).
writelines(arg) expects an iterable as argument (an iterable object can be a tuple, a list, a string, or an iterator in the most general sense). Each item contained in the iterator is expected to be a string. A tuple of strings is what you provided, so things worked.
The nature of the string(s) does not matter to both of the functions, i.e. they just write to the file whatever you provide them. The interesting part is that writelines() does not add newline characters on its own, so the method name can actually be quite confusing. It actually behaves like an imaginary method called write_all_of_these_strings(sequence).
What follows is an idiomatic way in Python to write a list of strings to a file while keeping each string in its own line:
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
f.write('\n'.join(lines))
This takes care of closing the file for you. The construct '\n'.join(lines) concatenates (connects) the strings in the list lines and uses the character '\n' as glue. It is more efficient than using the + operator.
Starting from the same lines sequence, ending up with the same output, but using writelines():
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
f.writelines("%s\n" % l for l in lines)
This makes use of a generator expression and dynamically creates newline-terminated strings. writelines() iterates over this sequence of strings and writes every item.
Edit: Another point you should be aware of:
write() and readlines() existed before writelines() was introduced. writelines() was introduced later as a counterpart of readlines(), so that one could easily write the file content that was just read via readlines():
outfile.writelines(infile.readlines())
Really, this is the main reason why writelines has such a confusing name. Also, today, we do not really want to use this method anymore. readlines() reads the entire file to the memory of your machine before writelines() starts to write the data. First of all, this may waste time. Why not start writing parts of data while reading other parts? But, most importantly, this approach can be very memory consuming. In an extreme scenario, where the input file is larger than the memory of your machine, this approach won't even work. The solution to this problem is to use iterators only. A working example:
with open('inputfile') as infile:
with open('outputfile') as outfile:
for line in infile:
outfile.write(line)
This reads the input file line by line. As soon as one line is read, this line is written to the output file. Schematically spoken, there always is only one single line in memory (compared to the entire file content being in memory in case of the readlines/writelines approach).
Actually, I think the problem is that your variable "lines" is bad. You defined lines as a tuple, but I believe that write() requires a string. All you have to change is your commas into pluses (+).
nl = "\n"
lines = line1+nl+line2+nl+line3+nl
textdoc.writelines(lines)
should work.
if you just want to save and load a list try Pickle
Pickle saving:
with open("yourFile","wb")as file:
pickle.dump(YourList,file)
and loading:
with open("yourFile","rb")as file:
YourList=pickle.load(file)
Exercise 16 from Zed Shaw's book? You can use escape characters as follows:
paragraph1 = "%s \n %s \n %s \n" % (line1, line2, line3)
target.write(paragraph1)
target.close()

getting rid of trailing output

How do i get rid of the extra character at the end of a line when i flush output?
Output:
{Fifth Level} Last Key Ran: 7 Output: -7 =
That '=' is what i want to get rid of.
code:
for number in str(fourth_level):
x=int(number)
x=x^(priv_key-pub_key)
print "\r{Fifth Level} Last Key Ran:",str(number),"Output:",x,
sys.stdout.flush()
time.sleep(sleep_time)
fifth_level.append(x)
Also is there any way to get multiple lines outputting data at the same time without going down one line or changing format? Using flush it gets rid of the second line output.
As a side note, check the ,x, part of the print statement. That 'x' is fishy.
For string manipulations, try writing everything into a temporary string first. You can then edit that string. This will give you more control over editing it.
Also, rstrip might do the trick if the characters being displayed are consistent.
Reference:
* http://docs.python.org/library/string.html
"string.rstrip(s[, chars]) Return a copy of the string with trailing characters removed."

Python help - Parsing Packet Logs

I'm writing a simple program that's going to parse a logfile of a packet dump from wireshark into a more readable form. I'm doing this with python.
Currently I'm stuck on this part:
for i in range(len(linelist)):
if '### SERVER' in linelist[i]:
#do server parsing stuff
packet = linelist[i:find("\n\n", i, len(linelist))]
linelist is a list created using the readlines() method, so every line in the file is an element in the list. I'm iterating through it for all occurances of "### SERVER", then grabbing all lines after it until the next empty line(which signifies the end of the packet). I must be doing something wrong, because not only is find() not working, but I have a feeling there's a better way to grab everything between ### SERVER and the next occurance of a blank line.
Any ideas?
Looking at thefile.readlines() doc:
file.readlines([sizehint])
Read until EOF using readline() and return a list containing the lines thus read. If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read. Objects implementing a file-like interface may choose to ignore sizehint if it cannot be implemented, or cannot be implemented efficiently.
and the file.readline() doc:
file.readline([size])
Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). [6] If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. An empty string is returned only when EOF is encountered immediately.
A trailing newline character is kept in the string - means that each line in linelist will contain at most one newline. That is why you cannot find a "\n\n" substring in any of the lines - look for a whole blank line (or an empty one at EOF):
if myline in ("\n", ""):
handle_empty_line()
Note: I tried to explain find behavior, but a pythonic solution looks very different from your code snippet.
General idea is:
inpacket = False
packets = []
for line in open("logfile"):
if inpacket:
content += line
if line in ("\n", ""): # empty line
inpacket = False
packets.append(content)
elif '### SERVER' in line:
inpacket = True
content = line
# put here packets.append on eof if needed
This works well with an explicit iterator, also. That way, nested loops can update the iterator's state by consuming lines.
fileIter= iter(theFile)
for x in fileIter:
if "### SERVER" in x:
block = [x]
for y in fileIter:
if len(y.strip()) == 0: # empty line
break
block.append(y)
print block # Or whatever
# elif some other pattern:
This has the pleasant property of finding blocks that are at the tail end of the file, and don't have a blank line terminating them.
Also, this is quite easy to generalize, since there's no explicit state-change variables, you just go into another loop to soak up lines in other kinds of blocks.
best way - use generators
read presentation Generator Tricks for Systems Programmers
This best that I saw about parsing log ;)

Categories

Resources