write() versus writelines() and concatenated strings - python

So I'm learning Python. I am going through the lessons and ran into a problem where I had to condense a great many target.write() into a single write(), while having a "\n" between each user input variable(the object of write()).
I came up with:
nl = "\n"
lines = line1, nl, line2, nl, line3, nl
textdoc.writelines(lines)
If I try to do:
textdoc.write(lines)
I get an error. But if I type:
textdoc.write(line1 + "\n" + line2 + ....)
Then it works fine. Why am I unable to use a string for a newline in write() but I can use it in writelines()?
Python 2.7

writelines expects an iterable of strings
write expects a single string.
line1 + "\n" + line2 merges those strings together into a single string before passing it to write.
Note that if you have many lines, you may want to use "\n".join(list_of_lines).

Why am I unable to use a string for a newline in write() but I can use it in writelines()?
The idea is the following: if you want to write a single string you can do this with write(). If you have a sequence of strings you can write them all using writelines().
write(arg) expects a string as argument and writes it to the file. If you provide a list of strings, it will raise an exception (by the way, show errors to us!).
writelines(arg) expects an iterable as argument (an iterable object can be a tuple, a list, a string, or an iterator in the most general sense). Each item contained in the iterator is expected to be a string. A tuple of strings is what you provided, so things worked.
The nature of the string(s) does not matter to both of the functions, i.e. they just write to the file whatever you provide them. The interesting part is that writelines() does not add newline characters on its own, so the method name can actually be quite confusing. It actually behaves like an imaginary method called write_all_of_these_strings(sequence).
What follows is an idiomatic way in Python to write a list of strings to a file while keeping each string in its own line:
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
f.write('\n'.join(lines))
This takes care of closing the file for you. The construct '\n'.join(lines) concatenates (connects) the strings in the list lines and uses the character '\n' as glue. It is more efficient than using the + operator.
Starting from the same lines sequence, ending up with the same output, but using writelines():
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
f.writelines("%s\n" % l for l in lines)
This makes use of a generator expression and dynamically creates newline-terminated strings. writelines() iterates over this sequence of strings and writes every item.
Edit: Another point you should be aware of:
write() and readlines() existed before writelines() was introduced. writelines() was introduced later as a counterpart of readlines(), so that one could easily write the file content that was just read via readlines():
outfile.writelines(infile.readlines())
Really, this is the main reason why writelines has such a confusing name. Also, today, we do not really want to use this method anymore. readlines() reads the entire file to the memory of your machine before writelines() starts to write the data. First of all, this may waste time. Why not start writing parts of data while reading other parts? But, most importantly, this approach can be very memory consuming. In an extreme scenario, where the input file is larger than the memory of your machine, this approach won't even work. The solution to this problem is to use iterators only. A working example:
with open('inputfile') as infile:
with open('outputfile') as outfile:
for line in infile:
outfile.write(line)
This reads the input file line by line. As soon as one line is read, this line is written to the output file. Schematically spoken, there always is only one single line in memory (compared to the entire file content being in memory in case of the readlines/writelines approach).

Actually, I think the problem is that your variable "lines" is bad. You defined lines as a tuple, but I believe that write() requires a string. All you have to change is your commas into pluses (+).
nl = "\n"
lines = line1+nl+line2+nl+line3+nl
textdoc.writelines(lines)
should work.

if you just want to save and load a list try Pickle
Pickle saving:
with open("yourFile","wb")as file:
pickle.dump(YourList,file)
and loading:
with open("yourFile","rb")as file:
YourList=pickle.load(file)

Exercise 16 from Zed Shaw's book? You can use escape characters as follows:
paragraph1 = "%s \n %s \n %s \n" % (line1, line2, line3)
target.write(paragraph1)
target.close()

Related

Replace words in list that later will be used in variable

I have a file which currently stores a string eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
which I am trying to pass into as a variable to my subprocess command.
My current code looks like this
with open(logfilnavn, 'r') as t:
test = t.readlines()
print(test)
But this prints ['eeb39d3e-dd4f-11e8-acf7-a6389e8e7978\n'] and I don't want the part with ['\n'] to be passed into my command, so i'm trying to remove them by using replace.
with open(logfilnavn, 'r') as t:
test = t.readlines()
removestrings = test.replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
print(removestrings)
I get an exception value saying this so how can I replace these with nothing and store them as a string for my subprocess command?
'list' object has no attribute 'replace'
so how can I replace these with nothing and store them as a string for my subprocess command?
readline() returns a list. Try print(test[0].strip())
You can read the whole file and split lines using str.splitlines:
test = t.read().splitlines()
Your test variable is a list, because readlines() returns a list of all lines read.
Since you said the file only contains this one line, you probably wish to perform the replace on only the first line that you read:
removestrings = test[0].replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
Where you went wrong...
file.readlines() in python returns an array (collection or grouping of the same variable type) of the lines in the file -- arrays in python are called lists. you, here are treating the list as a string. you must first target the string inside it, then apply that string-only function.
In this case however, this would not work as you are trying to change the way the python interpretter has displayed it for one to understand.
Further information...
In code it would not be a string - we just can't easily understand the stack, heap and memory addresses easily. The example below would work for any number of lines (but it will only print the first element) you will need to change that and
this may be useful...
you could perhaps make the variables globally available (so that other parts of the program can read them
more useless stuff
before they go out of scope - the word used to mean the points at which the interpreter (what runs the program) believes the variable is useful - so that it can remove it from memory, or in much larger programs only worry about the locality of variables e.g. when using for loops i is used a lot without scope there would need to be a different name for each variable in the whole project. scopes however get specialised (meaning that if a scope contains the re-declaration of a variable this would fail as it is already seen as being one. an easy way to understand this might be to think of them being branches and the connections between the tips of branches. they don't touch along with their variables.
solution?
e.g:
with open(logfilenavn, 'r') as file:
lines = file.readlines() # creates a list
# an in-line for loop that goes through each item and takes off the last character: \n - the newline character
#this will work with any number of lines
strippedLines = [line[:-1] for line in lines]
#or
strippedLines = [line.replace('\n', '') for line in lines]
#you can now print the string stored within the list
print(strippedLines[0]) # this prints the first element in the list
I hope this helped!
You get the error because readlines returns a list object. Since you mentioned in the comment that there is just one line in the file, its better to use readline() instead,
line = "" # so you can use it as a variable outside `with` scope,
with open("logfilnavn", 'r') as t:
line = t.readline()
print(line)
# output,
eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
readlines will return a list of lines, and you can't use replace with a list.
If you really want to use readlines, you should know that it doesn't remove the newline character from the end, you'll have to do it yourself.
lines = [line.rstrip('\n') for line in t.readlines()]
But still, after removing the newline character yourself from the end of each line, you'll have a list of lines. And from the question, it looks like, you only have one line, you can just access first line lines[0].
Or you can just leave out readlines, and just use read, it'll read all of the contents from the file. And then just do rstrip.
contents = t.read().rstrip('\n')

Write list variable to file

I have a .txt file of words I want to 'clean' of swear words, so I have written a program which checks each position, one-by-one, of the word list, and if the word appears anywhere within the list of censorable words, it removes it with var.remove(arg). It's worked fine, and the list is clean, but it can't be written to any file.
wordlist is the clean list.
newlist = open("lists.txt", "w")
newlist.write(wordlist)
newlist.close()
This returns this error:
newlist.write(wordlist)
TypeError: expected a string or other character buffer object
I'm guessing this is because I'm trying to write to a file with a variable or a list, but there really is no alternate; there are 3526 items in the list.
Any ideas why it can't write to a file with a list variable?
Note: lists.txt does not exist, it is created by the write mode.
write writes a string. You can not write a list variable, because, even though to humans it is clear that it should be written with spaces or semicolons between words, computers do not have the free hand for such assumptions, and should be supplied with the exact data (byte wise) that you want to write.
So you need to convert this list to string - explicitly - and then write it into the file. For that goal,
newlist.write('\n'.join(wordlist))
would suffice (and provide a file where every line contains a single word).
For certain tasks, converting the list with str(wordlist) (which will return something like ['hi', 'there']) and writing it would work (and allow retrieving via eval methods), but this would be very expensive use of space considering long lists (adds about 4 bytes per word) and would probably take more time.
If you want a better formatting for structural data you can use built-in json module.
text_file.write(json.dumps(list_data, separators=(',\n', ':')))
The list will work as a python variable too. So you can even import this later.
So this could look something like this:
var_name = 'newlist'
with open(path, "r+", encoding='utf-8') as text_file:
text_file.write(f"{var_name} = [\n")
text_file.write(json.dumps(list_data, separators=(',\n', ':')))
text_file.write("\n]\n")

How does python read lines from file

Consider the following simple python code:
f=open('raw1', 'r')
i=1
for line in f:
line1=line.split()
for word in line1:
print word,
print '\n'
In the first for loop i.e "for line in f:", how does python know that I want to read a line and not a word or a character?
The second loop is clearer as line1 is a list. So the second loop will iterate over the list elemnts.
Python has a notation of what are called "iterables". They're things that know how to let you traverse some data they hold. Some common iterators are lists, sets, dicts, pretty much every data structure. Files are no exception to this.
The way things become iterable is by defining a method to return an object with a next method. This next method is meant to be called repeatedly and return the next piece of data each time. The for foo in bar loops actually are just calling the next method repeatedly behind the scenes.
For files, the next method returns lines, that's it. It doesn't "know" that you want lines, it's just always going to return lines. The reason for this is that ~50% of cases involving file traversal are by line, and if you want words,
for word in (word for line in f for word in line.split(' ')):
...
works just fine.
In python the for..in syntax is used over iterables (elements tht can be iterated upon). For a file object, the iterator is the file itself.
Please refer here to the documentation of next() method - excerpt pasted below:
A file object is its own iterator, for example iter(f) returns f
(unless f is closed). When a file is used as an iterator, typically in
a for loop (for example, for line in f: print line), the next() method
is called repeatedly. This method returns the next input line, or
raises StopIteration when EOF is hit when the file is open for reading
(behavior is undefined when the file is open for writing). In order to
make a for loop the most efficient way of looping over the lines of a
file (a very common operation), the next() method uses a hidden
read-ahead buffer. As a consequence of using a read-ahead buffer,
combining next() with other file methods (like readline()) does not
work right. However, using seek() to reposition the file to an
absolute position will flush the read-ahead buffer. New in version
2.3.

Using text in one file to search for match in second file

I'm using python 2.6 on linux.
I have two text files
first.txt has a single string of text on each line. So it looks like
lorem
ipus
asfd
The second file doesn't quite have the same format.
it would look more like this
1231 lorem
1311 assss 31 1
etc
I want to take each line of text from first.txt and determine if there's a match in the second text. If there isn't a match then I would like to save the missing text to a third file. I would like to ignore case but not completely necessary. This is why I was looking at regex but didn't have much luck.
So I'm opening the files, using readlines() to create a list.
Iterating through the lists and printing out the matches.
Here's my code
first_file=open('first.txt', "r")
first=first_file.readlines()
first_file.close()
second_file=open('second.txt',"r")
second=second_file.readlines()
second_file.close()
while i < len(first):
j=search[i]
while k < len(second):
m=compare[k]
if not j.find(m):
print m
i=i+1
k=k+1
exit()
It's definitely not elegant. Anyone have suggestions how to fix this or a better solution?
My approach is this: Read the second file, convert it into lowercase and then create a list of the words it contains. Then convert this list into a set, for better performance with large files.
Then go through each line in the first file, and if it (also converted to lowercase, and with extra whitespace removed) is not in the set we created, write it to the third file.
with open("second.txt") as second_file:
second_values = set(second_file.read().lower().split())
with open("first.txt") as first_file:
with open("third.txt", "wt") as third_file:
for line in first_file:
if line.lower().strip() not in second_values:
third_file.write(line + "\n")
set objects are a simple container type that is unordered and cannot contain duplicate value. It is designed to allow you to quickly add or remove items, or tell if an item is already in the set.
with statements are a convenient way to ensure that a file is closed, even if an exception occurs. They are enabled by default from Python 2.6 onwards, in Python 2.5 they require that you put the line from __future__ import with_statements at the top of your file.
The in operator does what it sounds like: tell you if a value can be found in a collection. When used with a list it just iterates through, like your code does, but when used with a set object it uses hashes to perform much faster. not in does the opposite. (Possible point of confusion: in is also used when defining a for loop (for x in [1, 2, 3]), but this is unrelated.)
Assuming that you're looking for the entire line in the second file:
second_file=open('second.txt',"r")
second=second_file.readlines()
second_file.close()
first_file=open('first.txt', "r")
for line in first_file:
if line not in second:
print line
first_file.close()

Python help - Parsing Packet Logs

I'm writing a simple program that's going to parse a logfile of a packet dump from wireshark into a more readable form. I'm doing this with python.
Currently I'm stuck on this part:
for i in range(len(linelist)):
if '### SERVER' in linelist[i]:
#do server parsing stuff
packet = linelist[i:find("\n\n", i, len(linelist))]
linelist is a list created using the readlines() method, so every line in the file is an element in the list. I'm iterating through it for all occurances of "### SERVER", then grabbing all lines after it until the next empty line(which signifies the end of the packet). I must be doing something wrong, because not only is find() not working, but I have a feeling there's a better way to grab everything between ### SERVER and the next occurance of a blank line.
Any ideas?
Looking at thefile.readlines() doc:
file.readlines([sizehint])
Read until EOF using readline() and return a list containing the lines thus read. If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read. Objects implementing a file-like interface may choose to ignore sizehint if it cannot be implemented, or cannot be implemented efficiently.
and the file.readline() doc:
file.readline([size])
Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). [6] If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. An empty string is returned only when EOF is encountered immediately.
A trailing newline character is kept in the string - means that each line in linelist will contain at most one newline. That is why you cannot find a "\n\n" substring in any of the lines - look for a whole blank line (or an empty one at EOF):
if myline in ("\n", ""):
handle_empty_line()
Note: I tried to explain find behavior, but a pythonic solution looks very different from your code snippet.
General idea is:
inpacket = False
packets = []
for line in open("logfile"):
if inpacket:
content += line
if line in ("\n", ""): # empty line
inpacket = False
packets.append(content)
elif '### SERVER' in line:
inpacket = True
content = line
# put here packets.append on eof if needed
This works well with an explicit iterator, also. That way, nested loops can update the iterator's state by consuming lines.
fileIter= iter(theFile)
for x in fileIter:
if "### SERVER" in x:
block = [x]
for y in fileIter:
if len(y.strip()) == 0: # empty line
break
block.append(y)
print block # Or whatever
# elif some other pattern:
This has the pleasant property of finding blocks that are at the tail end of the file, and don't have a blank line terminating them.
Also, this is quite easy to generalize, since there's no explicit state-change variables, you just go into another loop to soak up lines in other kinds of blocks.
best way - use generators
read presentation Generator Tricks for Systems Programmers
This best that I saw about parsing log ;)

Categories

Resources