Python: Reading fortran file from url

Python: Reading fortran file from url - python

I would like to do the following in Python 3: Read in a FortranFile, but from an URL rather than a local file. The reason is that for my concrete example there are a lot of files and I want to avoid having to download them all first.
I have managed to
a) read in a simple .txt file from an URL
import urllib
from urllib.request import urlopen
url='http://www.deus-consortium.org/deus-library/filelist/deus_file_list_501.txt'
data=urllib.request.urlopen(url)
i=0
for line in data: # files are iterable
print(i,line)
i+=1
#alternative: data.read()
b) read in a local FortranFile (binary little endian unformated Fortran file) like this:
The file is from: http://www.deus-consortium.org/deus-library/efiler1/Babel_le/boxlen648_n2048_lcdmw7/post/fof/output_00090/fof_boxlen648_n2048_lcdmw7_masst_00000
from scipy.io import FortranFile
filename='../../Downloads/fof_boxlen648_n2048_rpcdmw7_masst_00000'
ff = FortranFile(filename, 'r')
nhalos=ff.read_ints(dtype=np.int32)[0]
print('number of halos in file',nhalos)
Is there any way to avoid downloading and reading FortranFiles directly from the URL? I tried
import urllib
from urllib.request import urlopen
url='http://www.deus-consortium.org/deus-library/efiler1/Babel_le/boxlen648_n2048_lcdmw7/cube_00090/fof_boxlen648_n2048_lcdmw7_cube_00000'
pathname = urllib.request.urlopen(url)
ff = FortranFile(pathname, 'r')
ff.read_ints()
gives "OSError: obtaining file position failed". pathname.read() doesn't work either because it's a fortran file.
Any ideas? Thanks in advance!

Maybe you can use tempfile module to download and read the data?
For example:
import urllib
import tempfile
from scipy.io import FortranFile
from urllib.request import urlopen
url='http://www.deus-consortium.org/deus-library/efiler1/Babel_le/boxlen648_n2048_lcdmw7/cube_00090/fof_boxlen648_n2048_lcdmw7_cube_00000'
with tempfile.TemporaryFile() as fp:
fp.write(urllib.request.urlopen(url).read())
fp.seek(0)
ff = FortranFile(fp, 'r')
info = ff.read_ints()
print(info)
Prints:
[12808737]

Related

Getting code from a .txt on a website and pasting it in a tempfile PYTHON

I was trying to make a script that gets a .txt from a websites, pastes the code into a python executable temp file but its not working. Here is the code:
from urllib.request import urlopen as urlopen
import os
import subprocess
import os
import tempfile
filename = urlopen("https://randomsiteeeee.000webhostapp.com/script.txt")
temp = open(filename)
temp.close()
# Clean up the temporary file yourself
os.remove(filename)
temp = tempfile.TemporaryFile()
temp.close()
If you know a fix to this please let me know. The error is :
File "test.py", line 9, in <module>
temp = open(filename)
TypeError: expected str, bytes or os.PathLike object, not HTTPResponse
I tried everything such as a request to the url and pasting it but didnt work as well. I tried the code that i pasted here and didnt work as well.
And as i said, i was expecting it getting the code from the .txt from the website, and making it a temp executable python script

you are missing a read:
from urllib.request import urlopen as urlopen
import os
import subprocess
import os
import tempfile
filename = urlopen("https://randomsiteeeee.000webhostapp.com/script.txt").read() # <-- here
temp = open(filename)
temp.close()
# Clean up the temporary file yourself
os.remove(filename)
temp = tempfile.TemporaryFile()
temp.close()
But if the script.txt contains the script and not the filename, you need to create a temporary file and write the content:
from urllib.request import urlopen as urlopen
import os
import subprocess
import os
import tempfile
content = urlopen("https://randomsiteeeee.000webhostapp.com/script.txt").read() #
with tempfile.TemporaryFile() as fp:
name = fp.name
fp.write(content)
If you want to execute the code you fetch from the url, you may also use exec or eval instead of writing a new script file.
eval and exec are EVIL, they should only be used if you 100% trust the input and there is no other way!
EDIT: How do i use exec?
Using exec, you could do something like this (also, I use requests instead of urllib here. If you prefer urllib, you can do this too):
import requests
exec(requests.get("https://randomsiteeeee.000webhostapp.com/script.txt").text)

Your trying to open a file that is named "the content of a website".
filename = "path/to/my/output/file.txt"
httpresponse = urlopen("https://randomsiteeeee.000webhostapp.com/script.txt").read()
temp = open(filename)
temp.write(httpresponse)
temp.close()
Is probably more like what you are intending

Python iterate file content returned by urlopen [duplicate]

I use the following code to stream large files from the Internet into a local file:
fp = open(file, 'wb')
req = urllib2.urlopen(url)
for line in req:
fp.write(line)
fp.close()
This works but it downloads quite slowly. Is there a faster way? (The files are large so I don't want to keep them in memory.)

No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:
# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3
response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
while True:
chunk = response.read(CHUNK)
if not chunk:
break
f.write(chunk)
Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.

You can also use shutil:
import shutil
try:
from urllib.request import urlopen # Python 3
except ImportError:
from urllib2 import urlopen # Python 2
def get_large_file(url, file, length=16*1024):
req = urlopen(url)
with open(file, 'wb') as fp:
shutil.copyfileobj(req, fp, length)

I used to use mechanize module and its Browser.retrieve() method. In the past it took 100% CPU and downloaded things very slowly, but some recent release fixed this bug and works very quickly.
Example:
import mechanize
browser = mechanize.Browser()
browser.retrieve('http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.32-rc1.tar.bz2', 'Downloads/my-new-kernel.tar.bz2')
Mechanize is based on urllib2, so urllib2 can also have similar method... but I can't find any now.

You can use urllib.retrieve() to download files:
Example:
try:
from urllib import urlretrieve # Python 2
except ImportError:
from urllib.request import urlretrieve # Python 3
url = "http://www.examplesite.com/myfile"
urlretrieve(url,"./local_file")

Read pptx file content from a url

I found this solution to read word file content from a url
from urllib.request import urlopen
from bs4 import BeautifulSoup
from io import BytesIO
from zipfile import ZipFile
file = urlopen(url).read()
file = BytesIO(file)
document = ZipFile(file)
content = document.read('word/document.xml')
word_obj = BeautifulSoup(content.decode('utf-8'))
text_document = word_obj.findAll('w:t')
for t in text_document:
print(t.text)
Anyone know a similar way to process pptx files? I have seen several solutions but to read the file directly, not from a url.

i don't know if it can help you but with urllib you obtain the content of the pptx (variable file), use cStringIO.StringIO(file) in function that read a pptx file path to simulate a file.

Downloading and unzipping a .zip file without writing to disk

I have managed to get my first python script to work which downloads a list of .ZIP files from a URL and then proceeds to extract the ZIP files and writes them to disk.
I am now at a loss to achieve the next step.
My primary goal is to download and extract the zip file and pass the contents (CSV data) via a TCP stream. I would prefer not to actually write any of the zip or extracted files to disk if I could get away with it.
Here is my current script which works but unfortunately has to write the files to disk.
import urllib, urllister
import zipfile
import urllib2
import os
import time
import pickle
# check for extraction directories existence
if not os.path.isdir('downloaded'):
os.makedirs('downloaded')
if not os.path.isdir('extracted'):
os.makedirs('extracted')
# open logfile for downloaded data and save to local variable
if os.path.isfile('downloaded.pickle'):
downloadedLog = pickle.load(open('downloaded.pickle'))
else:
downloadedLog = {'key':'value'}
# remove entries older than 5 days (to maintain speed)
# path of zip files
zipFileURL = "http://www.thewebserver.com/that/contains/a/directory/of/zip/files"
# retrieve list of URLs from the webservers
usock = urllib.urlopen(zipFileURL)
parser = urllister.URLLister()
parser.feed(usock.read())
usock.close()
parser.close()
# only parse urls
for url in parser.urls:
if "PUBLIC_P5MIN" in url:
# download the file
downloadURL = zipFileURL + url
outputFilename = "downloaded/" + url
# check if file already exists on disk
if url in downloadedLog or os.path.isfile(outputFilename):
print "Skipping " + downloadURL
continue
print "Downloading ",downloadURL
response = urllib2.urlopen(downloadURL)
zippedData = response.read()
# save data to disk
print "Saving to ",outputFilename
output = open(outputFilename,'wb')
output.write(zippedData)
output.close()
# extract the data
zfobj = zipfile.ZipFile(outputFilename)
for name in zfobj.namelist():
uncompressed = zfobj.read(name)
# save uncompressed data to disk
outputFilename = "extracted/" + name
print "Saving extracted file to ",outputFilename
output = open(outputFilename,'wb')
output.write(uncompressed)
output.close()
# send data via tcp stream
# file successfully downloaded and extracted store into local log and filesystem log
downloadedLog[url] = time.time();
pickle.dump(downloadedLog, open('downloaded.pickle', "wb" ))

Below is a code snippet I used to fetch zipped csv file, please have a look:
Python 2:
from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
resp = urlopen("http://www.test.com/file.zip")
myzip = ZipFile(StringIO(resp.read()))
for line in myzip.open(file).readlines():
print line
Python 3:
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
# or: requests.get(url).content
resp = urlopen("http://www.test.com/file.zip")
myzip = ZipFile(BytesIO(resp.read()))
for line in myzip.open(file).readlines():
print(line.decode('utf-8'))
Here file is a string. To get the actual string that you want to pass, you can use zipfile.namelist(). For instance,
resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip')
myzip = ZipFile(BytesIO(resp.read()))
myzip.namelist()
# ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']

My suggestion would be to use a StringIO object. They emulate files, but reside in memory. So you could do something like this:
# get_zip_data() gets a zip archive containing 'foo.txt', reading 'hey, foo'
import zipfile
from StringIO import StringIO
zipdata = StringIO()
zipdata.write(get_zip_data())
myzipfile = zipfile.ZipFile(zipdata)
foofile = myzipfile.open('foo.txt')
print foofile.read()
# output: "hey, foo"
Or more simply (apologies to Vishal):
myzipfile = zipfile.ZipFile(StringIO(get_zip_data()))
for name in myzipfile.namelist():
[ ... ]
In Python 3 use BytesIO instead of StringIO:
import zipfile
from io import BytesIO
filebytes = BytesIO(get_zip_data())
myzipfile = zipfile.ZipFile(filebytes)
for name in myzipfile.namelist():
[ ... ]

I'd like to offer an updated Python 3 version of Vishal's excellent answer, which was using Python 2, along with some explanation of the adaptations / changes, which may have been already mentioned.
from io import BytesIO
from zipfile import ZipFile
import urllib.request
url = urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/loc162txt.zip")
with ZipFile(BytesIO(url.read())) as my_zip_file:
for contained_file in my_zip_file.namelist():
# with open(("unzipped_and_read_" + contained_file + ".file"), "wb") as output:
for line in my_zip_file.open(contained_file).readlines():
print(line)
# output.write(line)
Necessary changes:
There's no StringIO module in Python 3 (it's been moved to io.StringIO). Instead, I use io.BytesIO]2, because we will be handling a bytestream -- Docs, also this thread.
urlopen:
"The legacy urllib.urlopen function from Python 2.6 and earlier has been discontinued; urllib.request.urlopen() corresponds to the old urllib2.urlopen.", Docs and this thread.
Note:
In Python 3, the printed output lines will look like so: b'some text'. This is expected, as they aren't strings - remember, we're reading a bytestream. Have a look at Dan04's excellent answer.
A few minor changes I made:
I use with ... as instead of zipfile = ... according to the Docs.
The script now uses .namelist() to cycle through all the files in the zip and print their contents.
I moved the creation of the ZipFile object into the with statement, although I'm not sure if that's better.
I added (and commented out) an option to write the bytestream to file (per file in the zip), in response to NumenorForLife's comment; it adds "unzipped_and_read_" to the beginning of the filename and a ".file" extension (I prefer not to use ".txt" for files with bytestrings). The indenting of the code will, of course, need to be adjusted if you want to use it.
Need to be careful here -- because we have a byte string, we use binary mode, so "wb"; I have a feeling that writing binary opens a can of worms anyway...
I am using an example file, the UN/LOCODE text archive:
What I didn't do:
NumenorForLife asked about saving the zip to disk. I'm not sure what he meant by it -- downloading the zip file? That's a different task; see Oleh Prypin's excellent answer.
Here's a way:
import urllib.request
import shutil
with urllib.request.urlopen("http://www.unece.org/fileadmin/DAM/cefact/locode/2015-2_UNLOCODE_SecretariatNotes.pdf") as response, open("downloaded_file.pdf", 'w') as out_file:
shutil.copyfileobj(response, out_file)

I'd like to add my Python3 answer for completeness:
from io import BytesIO
from zipfile import ZipFile
import requests
def get_zip(file_url):
url = requests.get(file_url)
zipfile = ZipFile(BytesIO(url.content))
files = [zipfile.open(file_name) for file_name in zipfile.namelist()]
return files.pop() if len(files) == 1 else files

write to a temporary file which resides in RAM
it turns out the tempfile module ( http://docs.python.org/library/tempfile.html ) has just the thing:
tempfile.SpooledTemporaryFile([max_size=0[,
mode='w+b'[, bufsize=-1[, suffix=''[,
prefix='tmp'[, dir=None]]]]]])
This
function operates exactly as
TemporaryFile() does, except that data
is spooled in memory until the file
size exceeds max_size, or until the
file’s fileno() method is called, at
which point the contents are written
to disk and operation proceeds as with
TemporaryFile().
The resulting file has one additional
method, rollover(), which causes the
file to roll over to an on-disk file
regardless of its size.
The returned object is a file-like
object whose _file attribute is either
a StringIO object or a true file
object, depending on whether
rollover() has been called. This
file-like object can be used in a with
statement, just like a normal file.
New in version 2.6.
or if you're lazy and you have a tmpfs-mounted /tmp on Linux, you can just make a file there, but you have to delete it yourself and deal with naming

Adding on to the other answers using requests:
# download from web
import requests
url = 'http://mlg.ucd.ie/files/datasets/bbc.zip'
content = requests.get(url)
# unzip the content
from io import BytesIO
from zipfile import ZipFile
f = ZipFile(BytesIO(content.content))
print(f.namelist())
# outputs ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']
Use help(f) to get more functions details for e.g. extractall() which extracts the contents in zip file which later can be used with with open.

All of these answers appear too bulky and long. Use requests to shorten the code, e.g.:
import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/directory")

Vishal's example, however great, confuses when it comes to the file name, and I do not see the merit of redefing 'zipfile'.
Here is my example that downloads a zip that contains some files, one of which is a csv file that I subsequently read into a pandas DataFrame:
from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
import pandas
url = urlopen("https://www.federalreserve.gov/apps/mdrm/pdf/MDRM.zip")
zf = ZipFile(StringIO(url.read()))
for item in zf.namelist():
print("File in zip: "+ item)
# find the first matching csv file in the zip:
match = [s for s in zf.namelist() if ".csv" in s][0]
# the first line of the file contains a string - that line shall de ignored, hence skiprows
df = pandas.read_csv(zf.open(match), low_memory=False, skiprows=[0])
(Note, I use Python 2.7.13)
This is the exact solution that worked for me. I just tweaked it a little bit for Python 3 version by removing StringIO and adding IO library
Python 3 Version
from io import BytesIO
from zipfile import ZipFile
import pandas
import requests
url = "https://www.nseindia.com/content/indices/mcwb_jun19.zip"
content = requests.get(url)
zf = ZipFile(BytesIO(content.content))
for item in zf.namelist():
print("File in zip: "+ item)
# find the first matching csv file in the zip:
match = [s for s in zf.namelist() if ".csv" in s][0]
# the first line of the file contains a string - that line shall de ignored, hence skiprows
df = pandas.read_csv(zf.open(match), low_memory=False, skiprows=[0])

It wasn't obvious in Vishal's answer what the file name was supposed to be in cases where there is no file on disk. I've modified his answer to work without modification for most needs.
from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
def unzip_string(zipped_string):
unzipped_string = ''
zipfile = ZipFile(StringIO(zipped_string))
for name in zipfile.namelist():
unzipped_string += zipfile.open(name).read()
return unzipped_string

Use the zipfile module. To extract a file from a URL, you'll need to wrap the result of a urlopen call in a BytesIO object. This is because the result of a web request returned by urlopen doesn't support seeking:
from urllib.request import urlopen
from io import BytesIO
from zipfile import ZipFile
zip_url = 'http://example.com/my_file.zip'
with urlopen(zip_url) as f:
with BytesIO(f.read()) as b, ZipFile(b) as myzipfile:
foofile = myzipfile.open('foo.txt')
print(foofile.read())
If you already have the file downloaded locally, you don't need BytesIO, just open it in binary mode and pass to ZipFile directly:
from zipfile import ZipFile
zip_filename = 'my_file.zip'
with open(zip_filename, 'rb') as f:
with ZipFile(f) as myzipfile:
foofile = myzipfile.open('foo.txt')
print(foofile.read().decode('utf-8'))
Again, note that you have to open the file in binary ('rb') mode, not as text or you'll get a zipfile.BadZipFile: File is not a zip file error.
It's good practice to use all these things as context managers with the with statement, so that they'll be closed properly.

Stream large binary files with urllib2 to file

I use the following code to stream large files from the Internet into a local file:
fp = open(file, 'wb')
req = urllib2.urlopen(url)
for line in req:
fp.write(line)
fp.close()
This works but it downloads quite slowly. Is there a faster way? (The files are large so I don't want to keep them in memory.)

No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:
# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3
response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
while True:
chunk = response.read(CHUNK)
if not chunk:
break
f.write(chunk)
Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.

You can also use shutil:
import shutil
try:
from urllib.request import urlopen # Python 3
except ImportError:
from urllib2 import urlopen # Python 2
def get_large_file(url, file, length=16*1024):
req = urlopen(url)
with open(file, 'wb') as fp:
shutil.copyfileobj(req, fp, length)

I used to use mechanize module and its Browser.retrieve() method. In the past it took 100% CPU and downloaded things very slowly, but some recent release fixed this bug and works very quickly.
Example:
import mechanize
browser = mechanize.Browser()
browser.retrieve('http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.32-rc1.tar.bz2', 'Downloads/my-new-kernel.tar.bz2')
Mechanize is based on urllib2, so urllib2 can also have similar method... but I can't find any now.

You can use urllib.retrieve() to download files:
Example:
try:
from urllib import urlretrieve # Python 2
except ImportError:
from urllib.request import urlretrieve # Python 3
url = "http://www.examplesite.com/myfile"
urlretrieve(url,"./local_file")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Reading fortran file from url - python

Related

Getting code from a .txt on a website and pasting it in a tempfile PYTHON

Python iterate file content returned by urlopen [duplicate]

Read pptx file content from a url

Downloading and unzipping a .zip file without writing to disk

Stream large binary files with urllib2 to file

Categories

Resources