Bad Zip File error when rewriting code in Python 3.4 - python

I am trying to rewrite code previously written for Python 2.7 into Python 3.4. I get the error zipfile.BadZipFile: File is not a zip file in the line zipfile = ZipFile(StringIO(zipdata)) in the code below.
import csv
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
import pandas as pd
import os
from zipfile import ZipFile
from pprint import pprint, pformat
import urllib.request
import urllib.parse
try:
import urllib.request as urllib2
except ImportError:
import urllib2
my_url = 'http://www.bankofcanada.ca/stats/results/csv'
data = urllib.parse.urlencode({"lookupPage": "lookup_yield_curve.php",
"startRange": "1986-01-01",
"searchRange": "all"})
# request = urllib2.Request(my_url, data)
# result = urllib2.urlopen(request)
binary_data = data.encode('utf-8')
req = urllib.request.Request(my_url, binary_data)
result = urllib.request.urlopen(req)
zipdata = result.read().decode("utf-8",errors="ignore")
zipfile = ZipFile(StringIO(zipdata))
df = pd.read_csv(zipfile.open(zipfile.namelist()[0]))
df = pd.melt(df, id_vars=['Date'])
df.rename(columns={'variable': 'Maturity'}, inplace=True)
Thank You

You shouldn't be decoding the data you get back in the result. The data is the bytes for the ZipFile, not bytes which are the encoding of a unicode string. I think your confusion arises because in Python 2 there is no distinction, but here in Python 3 you need a BytesIO not a StringIO.
So that part of your code should read:
zipdata = result.read()
zipfile = ZipFile(BytesIO(zipdata))
df = pd.read_csv(zipfile.open(zipfile.namelist()[0]))
The data you are getting back is not utf-8 encoded so you can't decode it that way. You would have found that more easily if you hadn't specified errors = "ignore", which is seldom a good idea ...

Related

Getting code from a .txt on a website and pasting it in a tempfile PYTHON

I was trying to make a script that gets a .txt from a websites, pastes the code into a python executable temp file but its not working. Here is the code:
from urllib.request import urlopen as urlopen
import os
import subprocess
import os
import tempfile
filename = urlopen("https://randomsiteeeee.000webhostapp.com/script.txt")
temp = open(filename)
temp.close()
# Clean up the temporary file yourself
os.remove(filename)
temp = tempfile.TemporaryFile()
temp.close()
If you know a fix to this please let me know. The error is :
File "test.py", line 9, in <module>
temp = open(filename)
TypeError: expected str, bytes or os.PathLike object, not HTTPResponse
I tried everything such as a request to the url and pasting it but didnt work as well. I tried the code that i pasted here and didnt work as well.
And as i said, i was expecting it getting the code from the .txt from the website, and making it a temp executable python script
you are missing a read:
from urllib.request import urlopen as urlopen
import os
import subprocess
import os
import tempfile
filename = urlopen("https://randomsiteeeee.000webhostapp.com/script.txt").read() # <-- here
temp = open(filename)
temp.close()
# Clean up the temporary file yourself
os.remove(filename)
temp = tempfile.TemporaryFile()
temp.close()
But if the script.txt contains the script and not the filename, you need to create a temporary file and write the content:
from urllib.request import urlopen as urlopen
import os
import subprocess
import os
import tempfile
content = urlopen("https://randomsiteeeee.000webhostapp.com/script.txt").read() #
with tempfile.TemporaryFile() as fp:
name = fp.name
fp.write(content)
If you want to execute the code you fetch from the url, you may also use exec or eval instead of writing a new script file.
eval and exec are EVIL, they should only be used if you 100% trust the input and there is no other way!
EDIT: How do i use exec?
Using exec, you could do something like this (also, I use requests instead of urllib here. If you prefer urllib, you can do this too):
import requests
exec(requests.get("https://randomsiteeeee.000webhostapp.com/script.txt").text)
Your trying to open a file that is named "the content of a website".
filename = "path/to/my/output/file.txt"
httpresponse = urlopen("https://randomsiteeeee.000webhostapp.com/script.txt").read()
temp = open(filename)
temp.write(httpresponse)
temp.close()
Is probably more like what you are intending

Python: Reading fortran file from url

I would like to do the following in Python 3: Read in a FortranFile, but from an URL rather than a local file. The reason is that for my concrete example there are a lot of files and I want to avoid having to download them all first.
I have managed to
a) read in a simple .txt file from an URL
import urllib
from urllib.request import urlopen
url='http://www.deus-consortium.org/deus-library/filelist/deus_file_list_501.txt'
data=urllib.request.urlopen(url)
i=0
for line in data: # files are iterable
print(i,line)
i+=1
#alternative: data.read()
b) read in a local FortranFile (binary little endian unformated Fortran file) like this:
The file is from: http://www.deus-consortium.org/deus-library/efiler1/Babel_le/boxlen648_n2048_lcdmw7/post/fof/output_00090/fof_boxlen648_n2048_lcdmw7_masst_00000
from scipy.io import FortranFile
filename='../../Downloads/fof_boxlen648_n2048_rpcdmw7_masst_00000'
ff = FortranFile(filename, 'r')
nhalos=ff.read_ints(dtype=np.int32)[0]
print('number of halos in file',nhalos)
Is there any way to avoid downloading and reading FortranFiles directly from the URL? I tried
import urllib
from urllib.request import urlopen
url='http://www.deus-consortium.org/deus-library/efiler1/Babel_le/boxlen648_n2048_lcdmw7/cube_00090/fof_boxlen648_n2048_lcdmw7_cube_00000'
pathname = urllib.request.urlopen(url)
ff = FortranFile(pathname, 'r')
ff.read_ints()
gives "OSError: obtaining file position failed". pathname.read() doesn't work either because it's a fortran file.
Any ideas? Thanks in advance!
Maybe you can use tempfile module to download and read the data?
For example:
import urllib
import tempfile
from scipy.io import FortranFile
from urllib.request import urlopen
url='http://www.deus-consortium.org/deus-library/efiler1/Babel_le/boxlen648_n2048_lcdmw7/cube_00090/fof_boxlen648_n2048_lcdmw7_cube_00000'
with tempfile.TemporaryFile() as fp:
fp.write(urllib.request.urlopen(url).read())
fp.seek(0)
ff = FortranFile(fp, 'r')
info = ff.read_ints()
print(info)
Prints:
[12808737]

TypeError when trying to convert Python 2.7 code to Python 3.4 code

I am having issues converting the code below which was written for Python 2.7 to code compatible in Python 3.4. I get the error TypeError: can't concat bytes to str in the line outfile.write(decompressedFile.read()). So I replaced the line with outfile.write(decompressedFile.read().decode("utf-8", errors="ignore")), but this resulted in the error same error.
import os
import gzip
try:
from StirngIO import StringIO
except ImportError:
from io import StringIO
import pandas as pd
import urllib.request
baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz"
outFilePath = filename.split('/')[1][:-3]
response = urllib.request.urlopen(baseURL + filename)
compressedFile = StringIO()
compressedFile.write(response.read().decode("utf-8", errors="ignore"))
compressedFile.seek(0)
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read()) #Error
The problem is that GzipFile needs to wrap a bytes-oriented file object, but you're passing a StringIO, which is text-oriented. Use io.BytesIO instead:
from io import BytesIO # Works even in 2.x
# snip
response = urllib.request.urlopen(baseURL + filename)
compressedFile = BytesIO() # change this
compressedFile.write(response.read()) # and this
compressedFile.seek(0)
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read().decode("utf-8", errors="ignore"))
# change this too

TypeError when using urllib.request in place of urllib2

I am trying to convert code that was previously written for Python 2.7 to code that will work in Python 3.4. The code is below and I had to change urllib2.urlopen() to urllib.request.urlopen(). However, this change resulted in the error TypeError: string argument expected, got 'bytes' in the line compressedFile.write(response.read()).
import os
import urllib2
import gzip
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
import pandas as pd
baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz"
outFilePath = filename.split('/')[1][:-3]
response = urllib2.urlopen(baseURL + filename) #Changed this to urllib.request.urlopen()
compressedFile = StringIO()
compressedFile.write(response.read())
You should decode the bytes before passing it to the write function
compressedFile = StringIO()
compressedFile.write(response.read().decode("utf-8"))
Also see the docs. "utf-8" may be omitted because it's the default, but explicit is better than implicit ;-)
Append a call to decode() to decode the bytes into a str.
compressedFile.write(response.read().decode())

Reading this type of Json with Python 3 Urllib

My json url has this:
{years=["2014","2015","2016"]}
How can I get this strings from URL with Python 3? I know this method but Python 3 has no urllib2 module.
import urllib2
import json
response = urllib2.urlopen('http://127.0.0.1/years.php')
data = json.load(response)
print (data)
ImportError: No module named 'urllib2'
Try changing the import to urllib, and use urllib.request instead. For the reason being, please refer to this SO Answer
import urllib
import json
response = urllib.request.urlopen('http://127.0.0.1/years.php')
data = json.load(response)
print (data)

Categories

Resources