understanding multiple python 'with open' file functions - python

i'm having a difficult time understanding what the second 'with open' function does here.
so, in the first 'with open' part, we've essentially said out = open(save_as_file, 'wb+') , right? (still new to using 'with open'). we later write to it and then 'with open' automatically closes the 'out' file.That part i get - we're writing this response object from Requests as a binary in a specified save_as_file location until we hit the 81920th character aka our buffer #.
what's going on in the second 'with open'? breaking it down the same way as above, it's pretty much fp = open(save_as_file, 'r') , right? What does that make fp, which was already assigned the request response object earlier? We're just opening the save_as_file to use it for reading but not reading or extracting anything from it, so I don't see the reason for it. If someone could explain in english just what's taking place and the purpose of the second 'with open' part, that would be much appreciated.
(don't worry about the load_from_file function at the end, that's just another function under the class)
def load_from_url(self, url, save_as_file=None):
fp = requests.get(url, stream=True,
headers={'Accept-Encoding': None}).raw
if save_as_file is None:
return self.load_from_file(fp)
else:
with open(save_as_file, 'wb+') as out:
while True:
buffer = fp.read(81920)
if not buffer:
break
out.write(buffer)
with open(save_as_file) as fp:
return self.load_from_file(fp)

I'm the original author of the code that you're referring to; I agree it's a bit unclear.
If we hit the particular code at the else statement, this means that we want to save the data that we originally get from calling the URL to a file. Here, fp is actually the response text from the URL call.
We'll hit that else statement if, when ran from the command line, we pass in --cpi-file=foobar.txt and that file it doesn't actually exist yet; it acts as a target file as mentioned here. If you don't pass in --cpi-file=foobar.txt, then the program will not write to a file, it will just go straight to reading the response data (from fp) via load_from_file.
So then, if that file does not exist but we did pass it in the command line, we will grab data from the URL (fp), and write that data to the target file (save_as_file). It now exists for our reference (it will be on your file system), if we want to use it again in this script.
Then, we will open that exact file again and call load_from_file to actually read and parse the data that we originally got from the response (fp).
Now - if we run this script two times, both with --cpi-file=foobar.txt and foobar.txt doesn't exist yet, the first time the script runs, it will create the file and save the CPI data. The second time the script runs, it will actually avoid calling the CPI URL to re-downloaded the data again, and just go straight to parsing the CPI data from the file.
load_from_file is a bit of a misleading name, it should probably be load_from_stream as it could be reading the response data from our api call or from a file.
Hopefully that makes sense. In the next release of newcoder.io, I'll be sure to clear this language & code up a bit.

You are correct that the second with statement opens the file for reading.
What happens is this:
Load the response from the URL
If save_as_file is None:
Call load_from_file on the response and return the result
Else:
Store the contents of the response to save_as_file
Call load_from_file on the contents of the file and return the result
So essentialy, if save_as_file is set it stores the response body in a file, processes it and then returns the processed result. Otherwise it just processes the response body and returns the result.
The way it is implemented here is likely because load_from_file expects a file-like object and the easiest way the programmer saw of obtaining that was to read the file back.
It could be done by keeping the response body in memory and using Python 3's io module or Python 2's StringIO to provide a file-like object that uses the response body from memory, thereby avoiding the need to read the file again.
fp is reassigned in the second with statement in the same way as any other variable would be if you assigned it another value.

I tried with below code to simulate your case:
fp = open("/Users/example1.py",'wb+')
print "first fp",fp
with open("/Users/example2.py") as fp:
print "second fp",fp
The output is:
first fp <open file '/Users/example1.py', mode 'wb+' at 0x10b200390>
second fp <open file '/Users/example2.py', mode 'r' at 0x10b200420>
So second fp is a local variable inside with block.
Your code seem want to first read data from the URL, and write it to save_as_file, and then read data from save_as_file again and do something with load_from_file, like validating the content.

Here is a piece of code that describe it:
__with__ provides a block that "cleans up" when existed
Can handle exceptions that occur within the block
Can also execute code when entered
class MyClass(object):
def __enter__(self):
print("entering the myclass %s")
return self
def __exit__(self, type, value, traceback):
print("Exitinstance %s" %(id(self)))
print("error type {0}".format(type))
print("error value {0}".format(value))
print("error traceback {0}".format(traceback))
print("exiting the myclass")
def sayhi(self):
print("Sayhi instance %s" %(id(self)))
with MyClass() as cc:
cc.sayhi()
print("after the block ends")

Related

Refresh variable when reading from a txt file

I have a file in my python folder called data.txt and i have another file read.py trying to read text from data.txt but when i change something in data.txt my read doesn't show anything new i put
Something else i tried wasn't working and i found something that read, but when i changed it to something that was actually meaningful it didn't print the new text.
Can someone explain why it doesn't refresh, or what i need to do to fix it?
with open("data.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
First and foremost, strings are immutable in Python - once you use file.read(), that returned object cannot change.
That being said, you must re-read the file at any given point the file contents may change.
For example
read.py
def get_contents(filepath):
with open(filepath) as f:
return f.read().rstrip("\n")
main.py
from read import get_contents
import time
print(get_contents("data.txt"))
time.sleep(30)
# .. change file somehow
print(get_contents("data.txt"))
Now, you could setup an infinite loop that watches the file's last modification timestamp from the OS, then always have the latest changes, but that seems like a waste of resources unless you have a specific need for that (e.g. tailing a log file), however there are arguably better tools for that
It was unclear from your question if you do the read once or multiple times. So here are steps to do:
Make sure you call the read function repeatedly with a certain interval
Check if you actually save file after modification
Make sure there are no file usage conflicts
So here is a description of each step:
When you read a file the way you shared it gets closed, meaning it is read only once, you need to read it multiple times if you want to see changes, so make it with some kind of interval in another thread or async or whatever suits your application best.
This step is obvious, remember to hit ctrl+c
It may happen that a single file is being accessed by multiple processes, for example your editor and the script, now to prevent errors try the following code:
def read_file(file_name: str):
while True:
try:
with open(file_name) as f:
return f.read().rstrip("\n")
except IOError:
pass

Confusion about with statement python

I have a question regarding the use of with statements in python, as given below:
with open(fname) as f:
np.save(f,MyData)
If I'm not mistaken this opens the file fname in a secure manner, such that if an exception occurs the file is closed properly. then it writes MyData to the file. But what I would do is simply:
np.save(fname,MyData)
This would result in the same, MyData gets written to fname. I'm not sure I understand correctly why the former is better. I don't understand how this one-liner could keep the file "open" after it has ran the line. Therefore I also don't see how this could create issues when my code would crash afterwards.
Maybe this is a stupid/basic question, but I always thought that a cleaner code is a nicer code, so not having the extra with-loop seems just better to me.
numpy.save() handles the opening and closing in its code, however if you supply a file descriptor, it'll leave it open because it assumes you want to do something else with the file and if it closes the file it'll break the functionality for you.
Try this:
f = open(<file>)
f.close()
f.read() # boom
See also the hasattr(file, "write") ("file" as in descriptor or "handle" from file, buffer, or other IO) check, that checks if it's an object with a write() method and judging by that Numpy only assumes it's true.
However NumPy doesn't guarantee misusing its API e.g. if you create a custom structure that's a buffer and doesn't include write(), it'll be treated as a path and thus crash in the open() call.

How can I take the html file of a website

I am trying to take the html of my website and see if it is the same as what I have on an offline version.
I have been researching this, and all I can find is either parsing or something that deals with only http://
So far I have this:
import urllib
url = "https://www.mywebsite.com/"
onlinepage = urllib.urlopen(url)
print(onlinepage.read())
offlinepage = open("offline.txt", "w+")
print(offlinepage.read())
if onlinepage.read() == offlinepage.read():
print("same") # for debugging
else:
print("different")
This always says that they are the same, even when I put in a different website entirely.
When you first print your online and offline pages with these lines:
print(onlinepage.read())
print(offlinepage.read())
...you have now consumed all of the text in each file object. Subsequent reads on either object will return an empty string. Two empty strings are equal, therefore your if condition will always evaluate to True.
If you were purely working with files, you could seek to the beginning of both files and read again. Since there is no seek method on the file object from urlopen, you'll need to either re-fetch the page with a new urlopen command or, better, save the original text in a variable and use that for your subsequent comparisons:
online = onlinepage.read()
print(online)
offline = offlinepage.read()
print(offline)
...
if online == offline:
...
As others have noted, you can't read the request object twice (and can't read the file twice without seeking); once read, the data you got back is no longer available, so you need to store it.
But they missed another problem: You opened the file with mode w+. w+ allows both reading and writing, but, just like mode w, it truncates the file on open. So your local file is always empty when you read it, which means you're both corrupting the local file and never getting a match (unless the online file is empty too).
You need to use mode r+ or a+ to get a read/write handle that doesn't truncate the existing file (r+ requires that the file already exist, a+ does not, but puts the write position at end of file, and on some systems, all writes are put at the end of the file).
So fixing both bugs, you get:
import urllib
url = "https://www.mywebsite.com/"
# Using with statements properly for safe resource cleanup
with urllib.urlopen(url) as onlinepage:
onlinedata = onlinepage.read()
print(onlinedata)
with open("offline.txt", "r+") as offlinepage: # DOES NOT TRUNCATE EXISTING FILE!
offlinedata = offlinepage.read()
print(offlinedata)
if onlinedata == offlinedata:
print("same") # for debugging
else:
print("different")
# I assume you want to rewrite the local page, or you wouldn't open with +
# so this is what you'd do to ensure you replace the existing data correctly
offlinepage.seek(0) # Ensure you're seeked to beginning of file for write
offlinepage.write(onlinedata)
offlinepage.truncate() # If online data smaller, don't keep offline extra data
You use .read() twice on each file.
>>> f.read()
'This is the entire file.\n'
>>> f.read()
''
"If the end of the file has been reached, f.read() will return an empty string ("")." (7.2.1 Docs).
Therefore, when two results are compared, they are equal because each is an empty string.

Python and Flask - Trying to have a function return a file content

I am struggling to return a file content back to the user. Have a Flask code that receives a txt file from an user, then the Python function transform() is called in order to parse the infile, both codes are doing the job.
The issue is happening when I am trying to send (return) the new file (outfile) back to the user, the Flask code for that is also working OK.
But I donĀ“t know how to have this Python transform function() "return" that file content, have tested several options already.
Following more details:
def transform(filename):
with open(os.path.join(app.config['UPLOAD_FOLDER'],filename), "r") as infile:
with open(os.path.join(app.config['UPLOAD_FOLDER'], 'file_parsed_1st.txt'), "w") as file_parsed_1st:
p = CiscoConfParse(infile)
'''
parsing the file uploaded by the user and
generating the result in a new file(file_parsed_1st.txt)
that is working OK
'''
with open (os.path.join(app.config['UPLOAD_FOLDER'], 'file_parsed_1st.txt'), "r") as file_parsed_2nd:
with open(os.path.join(app.config['UPLOAD_FOLDER'], 'file_parsed_2nd.txt'), "w") as outfile:
'''
file_parsed_1st.txt is a temp file, then it creates a new file (file_parsed_2nd.txt)
That part is also working OK, the new file (file_parsed_2nd.txt)
has the results I want after all the parsing;
Now I want this new file(file_parsed_2nd.txt) to "return" to the user
'''
#Editing -
#Here is where I was having a hard time, and that now is Working OK
#using the follwing line:
return send_file(os.path.join(app.config['UPLOAD_FOLDER'], 'file_parsed_2nd.txt'))
You do need to use the flask.send_file() callable to produce a proper response, but need to pass in a filename or a file object that isn't already closed or about to be closed. So passing in the full path would do:
return send_file(os.path.join(app.config['UPLOAD_FOLDER'], 'file_parsed_2nd.txt'))
When you pass in a file object you cannot use the with statement, as it'll close the file object the moment you return from your view; it'll only be actually read when the response object is processed as a WSGI response, outside of your view function.
You may want to pass in a attachment_filename parameter if you want to suggest a filename to the browser to save the file as; it'll also help determine the mimetype. You also may want to specify the mimetype explicitly, using the mimetype parameter.
You could also use the flask.send_from_directory() function; it does the same but takes a filename and a directory:
return send_from_directory(app.config['UPLOAD_FOLDER'], 'file_parsed_2nd.txt')
The same caveat about a mimetype applies; for .txt the default mimetype would be text/plain. The function essentially joins the directory and filename (with flask.safe_join() which applies additional safety checks to prevent breaking out of the directory using .. constructs) and passes that on to flask.send_file().

Is there a special trick to downloading a zip file and writing it to disk with Python?

I am FTPing a zip file from a remote FTP site using Python's ftplib. I then attempt to write it to disk. The file write works, however most attempts to open the zip using WinZip or WinRar fail; both apps claim the file is corrupted. Oddly however, when right clicking and attempting to extract the file using WinRar, the file will extract.
So to be clear, the file write will work, but will not open inside the popular zip apps, but will decompress using those same apps. Note that the Python zipfile module never fails to extract the zips.
Here is the code that I'm using to get the zip file from the FTP site (please ignore the bad tabbing, that's not the issue).
filedata = None
def appender(chunk):
global filedata
filedata += chunk
def getfile(filename):
try:
ftp = None
try:
ftp = FTP(address)
ftp.login('user', 'password')
except Exception, e:
print e
command = 'RETR ' + filename
idx = filename.rfind('/')
path = filename[0:idx]
ftp.cwd(path)
fileonly = filename[idx+1:len(filename)]
ftp.retrbinary('RETR ' + filename, appender)
global filedata
data = filedata
ftp.close()
filedata = ''
return data
except Exception, e:
print e
data = getfile('/archives/myfile.zip')
file = open(pathtoNTFileShare, 'wb')
file.write(data)
file.close()
Pass file.write directly inside the retrbinary function instead of passing appender. This will work and it will also not use that much RAM when you are downloading a big file.
If you'd like the data stored inside a variable though, you can also have a variable named:
blocks = []
Then pass to retrbinary instead of appender:
blocks.append
Your current appender function is wrong. += will not work correctly when there is binary data because it will try to do a string append and stop at the first NULL it sees.
As mentioned by #Lee B you can also use urllib2 or Curl. But your current code is almost correct if you make the small modifications I mentioned above.
I've never used that library, but urllib2 works fine, and is more straightforward. Curl is even better.
Looking at your code, I can see a couple of things wrong. Your exception catching only prints the exception, then continues. For fatal errors like not getting an FTP connection, they need to print the message and then exit. Also, your filedata starts off as None, then your appender uses += to add to that, so you're trying to append a string + None, which gives a TypeError when I try it here. I'm surprised it's working at all; I would have guessed that the appender would throw an exception, and so the FTP copy would abort.
While re-reading, I just noticed another answer about use of += on binary data. That could well be it; python tries to be smart sometimes, and could be "helping" when you join strings with whitespace or NULs in them, or something like that. Your best bet there is to have the file open (let's call it outfile), and use your appender to just outfile.write(chunk).

Categories

Resources