I'm unable to modify the content of a NamedTemporaryFile after having created it initially.
As per my example below, I create a NamedTemporaryFile from the content of a URL (JSON data).
Then, what I aim to do is re-access that file, modify some of the content of the JSON in the file, and save it. The code below is my attempt to do so.
import json
import requests
from tempfile import NamedTemporaryFile
def create_temp_file_from_url(url):
response = requests.get(url)
temp_file = NamedTemporaryFile(mode='w+t', delete=False)
temp_file.write(response.text)
temp_file.close()
return temp_file.name
def add_content_to_json_file(json_filepath):
file = open(json_filepath)
content = json.loads(file.read())
# Add a custom_key : custom_value pair in each dict item
for repo in content:
if isinstance(repo, dict):
repo['custom_key'] = 'custom_value'
# Close file back ... if needed?
file.close()
# Write my changes to content back into the file
f = open(json_filepath, 'w') # Contents of the file disappears...?
json.dumps(content, f, indent=4) # Issue: Nothing is written to f
f.close()
if __name__ == '__main__':
sample_url = 'https://api.github.com/users/mralexgray/repos'
tempf = create_temp_file_from_url(sample_url)
# Add extra content to Temporary file
add_content_to_json_file(tempf)
try:
updated_file = json.loads(tempf)
except Exception as e:
raise e
Thanks for the help!
1: This line:
json.dumps(content, f, indent=4) # Issue: Nothing is written to f
doesn't dump content to f. It makes a string from content, with skipkeys value f, and then does nothing with it.
You probably wanted json.dump, with no s..
2: This line
updated_file = json.loads(tempf)
tries to load a JSON object from the temp filename, which isn't going to work. You'll have to either read the file in as a string and then use loads, or re-open the file and use json.load.
Related
The goal is to download GTFS data through python web scraping, starting with https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download
Currently, I'm using requests like so:
def download(url):
fpath = "prov/city/GTFS"
r = requests.get(url)
if r.ok:
print("Saving file.")
open(fpath, "wb").write(r.content)
else:
print("Download failed.")
The results of requests.content of the above url unfortunately renders the following:
You can see the files of interest within the output (e.g. stops.txt) but how might I access them to read/write?
I fear you're trying to read a zip file with a text editor, perhaps you should try using the "zipfile" module.
The following worked:
def download(url):
fpath = "path/to/output/"
f = requests.get(url, stream = True, headers = headers)
if f.ok:
print("Saving to {}".format(fpath))
g=open(fpath+'output.zip','wb')
g.write(f.content)
g.close()
else:
print("Download failed with error code: ", f.status_code)
You need to write this file into a zip.
import requests
url = "https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download"
fname = "gtfs.zip"
r = requests.get(url)
open(fname, "wb").write(r.content)
Now fname exists and has several text files inside. If you want to programmatically extract this zip and then read the content of a file, for example stops.txt, then you need first to extract a single file, or simply extractall.
import zipfile
# this will extract only a single file, and
# raise a KeyError if the file is missing from the archive
zipfile.ZipFile(fname).extract("stops.txt")
# this will extract all the files found from the archive,
# overwriting files in the process
zipfile.ZipFile(fname).extractall()
Now you just need to work with your file(s).
thefile = "stops.txt"
# just plain text
text = open(thefile).read()
# csv file
import csv
reader = csv.reader(open(thefile))
for row in reader:
...
I have a zip file that I receive when the user uploads a file. The zip essentially contains a json file which I want to read and process without having to create the zip file first, then unzipping it and then reading the content of the inner file.
Currently I only the longer process which is something like below
import json
import zipfile
#csrf_exempt
def get_zip(request):
try:
if request.method == "POST":
try:
client_file = request.FILES['file']
file_path = "/some/path/"
# first dump the zip file to a directory
with open(file_path + '%s' % client_file.name, 'wb+') as dest:
for chunk in client_file.chunks():
dest.write(chunk)
# unzip the zip file to the same directory
with zipfile.ZipFile(file_path + client_file.name, 'r') as zip_ref:
zip_ref.extractall(file_path)
# at this point we get a json file from the zip say `test.json`
# read the json file content
with open(file_path + "test.json", "r") as fo:
json_content = json.load(fo)
doSomething(json_content)
return HttpResponse(0)
except Exception as e:
return HttpResponse(1)
As you can see, this involves 3 steps to finally get the content from the zip file into memory. What I want is get the content of the zip file and load directly into memory.
I did find some similar questions in stack overflow like this one https://stackoverflow.com/a/2463819 . But I am not sure at what point do I invoke this operation mentioned in the post
How can I achieve this?
Note: I am using django in backend.
There will always be one json file in the zip.
From what I understand, what #jason is trying to say here is to first open a zipFile just like you have done here with zipfile.ZipFile(file_path + client_file.name, 'r') as zip_ref:.
class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])
Open a ZIP file, where file can be either a path to a file (a string) or a file-like object.
And then use BytesIO read in the bytes of a file-like object. But from above you are reading in r mode and not rb mode. So change it as follows.
with open(filename, 'rb') as file_data:
bytes_content = file_data.read()
file_like_object = io.BytesIO(bytes_content)
zipfile_ob = zipfile.ZipFile(file_like_object)
Now zipfile_ob can be accessed from memory.
The first argument to zipfile.ZipFile() can be a file object rather than a pathname. I think the Django UploadedFile object supports this use, so you can read directly from that rather than having to copy into a file.
You can also open the file directly from the zip archive rather than extracting that into a file.
import json
import zipfile
#csrf_exempt
def get_zip(request):
try:
if request.method == "POST":
try:
client_file = request.FILES['file']
# unzip the zip file to the same directory
with zipfile.ZipFile(client_file, 'r') as zip_ref:
first = zip_ref.infolist()[0]
with zip_ref.open(first, "r") as fo:
json_content = json.load(fo)
doSomething(json_content)
return HttpResponse(0)
except Exception as e:
return HttpResponse(1)
I have a script that gets all of the .zip files from a folder, then one by one, opens the zip file, loads the content of the JSON file inside and imports this to MongoDB.
The error I am getting is the JSON object must be str, bytes or bytearray, not 'TextIOWrapper'
The code is:
import json
import logging
import logging.handlers
import os
from logging.config import fileConfig
from pymongo import MongoClient
def import_json():
try:
client = MongoClient('5.57.62.97', 27017)
db = client['vuln_sets']
coll = db['vulnerabilities']
basepath = os.path.dirname(__file__)
filepath = os.path.abspath(os.path.join(basepath, ".."))
archive_filepath = filepath + '/vuln_files/'
filedir = os.chdir(archive_filepath)
for item in os.listdir(filedir):
if item.endswith('.json'):
file_name = os.path.abspath(item)
fp = open(file_name, 'r')
json_data = json.loads(fp)
for vuln in json_data:
print(vuln)
coll.insert(vuln)
os.remove(file_name)
except Exception as e:
logging.exception(e)
I can get this working to use a single file but not multiple, i.e. to do one file I wrote:
from zipfile import ZipFile
import json
import pymongo
archive = ZipFile("vulners_collections/cve.zip")
archived_file = archive.open(archive.namelist()[0])
archive_content = archived_file.read()
archived_file.close()
connection = pymongo.MongoClient("mongodb://localhost")
db=connection.vulnerability
vuln1 = db.vulnerability_collection
vulners_objects = json.loads(archive_content)
for item in vulners_objects:
vuln1.insert(item)
From my comment above:
I have no experience with glob, but from skimming the doc I get the impression your archive_files is a simple list of file-paths as strings, correct? You can not perform actions like .open on string (thus your error), so try changing your code to this:
...
archive_filepath = filepath + '/vuln_files/'
archive_files = glob.glob(archive_filepath + "/*.zip")
for file in archive_files:
with open(file, "r") as currentFile:
file_content = currentFile.read()
vuln_content = json.loads(file_content)
for item in vuln_content:
coll.insert(item)
...
file is NOT a file object or anything but just a simple string. So you cant perform methods on it that are not supported by string.
You are redefining your iterator by setting it to the result of the namelist method. You need a for loop within the for to go through the contents of the zip file and of course a new iterator variable.
Isn't file.close wrong and the correct call is file.close().
U can use json.load() to load file directly, instead of json.loads()
fp = open(file_name, 'r')
json_data = json.load(fp)
fp.close()
I am using a script to strip exif data from uploaded JPGs in Python, before writing them to disk. I'm using Flask, and the file is brought in through requests
file = request.files['file']
strip the exif data, and then save it
f = open(file)
image = f.read()
f.close()
outputimage = stripExif(image)
f = ('output.jpg', 'w')
f.write(outputimage)
f.close()
f.save(os.path.join(app.config['IMAGE_FOLDER'], filename))
Open isn't working because it only takes a string as an argument, and if I try to just set f=file, it throws an error about tuple objects not having a write attribute. How can I pass the current file into this function before it is read?
file is a FileStorage, described in http://werkzeug.pocoo.org/docs/datastructures/#werkzeug.datastructures.FileStorage
As the doc says, stream represents the stream of data for this file, usually under the form of a pointer to a temporary file, and most function are proxied.
You probably can do something like:
file = request.files['file']
image = file.read()
outputimage = stripExif(image)
f = open(os.path.join(app.config['IMAGE_FOLDER'], 'output.jpg'), 'w')
f.write(outputimage)
f.close()
Try the io package, which has a BufferedReader(), ala:
import io
f = io.BufferedReader(request.files['file'])
...
file = request.files['file']
image = stripExif(file.read())
file.close()
filename = 'whatever' # maybe you want to use request.files['file'].filename
dest_path = os.path.join(app.config['IMAGE_FOLDER'], filename)
with open(dest_path, 'wb') as f:
f.write(image)
I need to download a zip archive of text files, dispatch each text file in the archive to other handlers for processing, and finally write the unzipped text file to disk.
I have the following code. It uses multiple open/close on the same file, which does not seem elegant. How do I make it more elegant and efficient?
zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
logfile = unzipped.open(f_info)
handler1(logfile)
logfile.close() ## Cannot seek(0). The file like obj does not support seek()
logfile = unzipped.open(f_info)
handler2(logfile)
logfile.close()
unzipped.extract(f_info)
Your answer is in your example code. Just use StringIO to buffer the logfile:
zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
logfile = unzipped.open(f_info)
# Here's where we buffer:
logbuffer = cStringIO.StringIO(logfile.read())
logfile.close()
for handler in [handler1, handler2]:
handler(logbuffer)
# StringIO objects support seek():
logbuffer.seek(0)
unzipped.extract(f_info)
You could say something like:
handler_dispatch(logfile)
and
def handler_dispatch(file):
for line in file:
handler1(line)
handler2(line)
or even make it more dynamic by constructing a Handler class with multiple handlerN functions, and applying each of them inside handler_dispatch. Like
class Handler:
def __init__(self:)
self.handlers = []
def add_handler(handler):
self.handlers.append(handler)
def handler_dispatch(self, file):
for line in file:
for handler in self.handlers:
handler.handle(line)
Open the zip file once, loop through all the names, extract the file for each name and process it, then write it to disk.
Like so:
for f_info in unzipped.info_list():
file = unzipped.open(f_info)
data = file.read()
# If you need a file like object, wrap it in a cStringIO
fobj = cStringIO.StringIO(data)
handler1(fobj)
handler2(fobj)
with open(filename,"w") as fp:
fp.write(data)
You get the idea