Trying to download data from URL with CSV File - python

I'm slightly new to Python and have a question as to why the following code doesn't produce any output in the csv file. The code is as follows:
import csv
import urllib2
url = 'http://www.rba.gov.au/statistics/tables/csv/f17-yields.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
with open("AusCentralbank.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(row)
Cheers.
Edit:
Brien and Albert solved the initial issue I had. However, I now have one further question. When I download the CSV File which I have listed above which is in "http://www.rba.gov.au/statistics/tables/#interest-rates" under Zero-coupon "Interest Rates - Analytical Series - 2009 to Current - F17" and is the F-17 Yields CSV I see that it has 5 workbooks and I actually just want to gather the data in the 5th Workbook. Is there a way I could do this? Cheers.

I could only test my code using Python 3. However, the only diffence should be urllib2, hence I am using urllib.respose for opening the desired url.
The variable html is type bytes and can generally be written to a file in binary mode. Additionally, your source is a csv-file already, so there should be no need to convert it somehow:
#!/usr/bin/env python3
# coding: utf-8
import urllib
url = 'http://www.rba.gov.au/statistics/tables/csv/f17-yields.csv'
response = urllib.request.urlopen(url)
html = response.read()
with open('output.csv', 'wb') as f:
f.write(html)

It is probably because of your opening mode.
According to documentation:
'w' for only writing (an existing file with the same name will be
erased)
You should use append(a) mode to append it to the end of the file.
'a' opens the file for appending; any data written to the file is
automatically added to the end.
Also, since the file you are trying to download is csv file, you don't need to convert it.

#albert had a great answer. I've gone ahead and converted it to the equivalent Python 2.x code. You were doing a bit too much work in your original program; since the file was already a csv you didn't need to do any special work to turn it into a csv.
import urllib2
url = 'http://www.rba.gov.au/statistics/tables/csv/f17-yields.csv'
response = urllib2.urlopen(url)
html = response.read()
with open('AusCentralbank.csv', 'wb') as f:
f.write(html)

Related

Download xml-file and save it to a text file

I'm very new at programming and have a problem. I need to create a Python function that uses the request external module to download an XML-file, and then saves the text of the response to a text file.
So far i've tried this:
import requests
def downloading_xml():
r = requests.get('https://www.w3schools.com/xml/simplexsl.xml')
print(r.text)
But I don't get it quite right. I think my main problem is the last part, I don't know how to save the text of the response to a text file. Any ideas? Thanks in advance!
Here you go. If you want to know more about Python file operations follow this link
Python I/O
import requests
def downloading_xml():
r = requests.get('https://www.w3schools.com/xml/simplexsl.xml')
print(r.text)
with open("filename.txt", "w+") as f:
f.write(r.text)
f.close()
Now call the function
downloading_xml()
To save to a text file you can do something like this:
textfile=open("anyname.xml",'w')
textfile.write(r.text)
textfile.close()
you may need to include the path to the file as well

unicode issue in python when writing to file

I have an csv sheet that i read it like this:
with open(csvFilePath, 'rU') as csvFile:
reader = csv.reader(csvFile, delimiter= '|')
numberOfMovies = 0
for row in reader:
title = row[1:2][0]
as you see, i am taking the value of title
Then i surf the internet for some info about that value and then i write to a file, the writing is like this:
def writeRDFToFile(rdf, fileName):
f = open("movies/" + fileName + '.ttl','a')
try:
#rdf = rdf.encode('UTF-8')
f.write(rdf) # python will convert \n to os.linesep
except:
print "exception happened for movie " + movieTitle
f.close()
In that function, i am writing the rdf variable to a file.
As you see there is a commetted line
If the value of rdf variable contains unicode char and that line was not commeted, that code doesn't write anything to the file.
However, if I just commet that line, that code writes to a file.
Okay you can say that: commit that line and everything will be fine, but that is not correct, because i have another java process (which is Fuseki server) that reads the file and if the file contains unicode chars, it throws an error.
so i need to solve the file myself, i need to encode that data to ut8,
help please
The normal csv library can have difficulty writing unicode to files. I suggest you use the unicodecsv library instead of the csv library. It supports writing unicode to CSVs.
Practically speaking, just write:
import unicodecsv as csv

File size increasing after extraction?

This is a pretty general question, and I don't even know whether this is the correct community for the question, if not just tell me.
I have recently had an html file from which I was extracting ~90 lines of HTML code (total lines were ~8000). I did this with a simple Python script. I stored my output (the shortened html code) into a text file. Now I am curious because the file size has increased? what could cause the file to get bigger after I extracted some part out of it?
File size before: 319.374 Bytes
File size after: 321.516 Bytes
Is this because of the different file formats html and txt?
Any help or suggestions appreciated!
Code:
import glob
import os
import re
def extractor():
os.chdir(r"F:\Test") # the directory containing my html
for file in glob.iglob("*.html"): # iterates over all files in the directory ending in .html
with open(file, encoding="utf8") as f, open((file.rsplit(".", 1)[0]) + ".txt", "w", encoding="utf8") as out:
contents = f.read()
extract = re.compile(r'StartTag.*?EndTag', re.S)
cut = extract.sub('', contents)
if re.search(extract, contents) is not None:
out.write(cut)
out.close()
extractor()
EDIT: I also tried using ".html" instead of ".txt" as filem format for my output file. However the difference still remains.
This code does not write to the original HTML file. Something else must be causing the increased file size .

unexpected line breaks in python after writing to csv

i have a code that updates CSVs from a server. it gets data using:
a = urllib.urlopen(url)
data = a.read().strip()
then i append the data to the csv by
f = open(filename+".csv", "ab")
f.write(ndata)
f.close()
the problem is that randomly, a line in the csv gets written like this (or gets a line break somewhere along the csv):
2,,,,,
015-04-21 13:00:00,18,998,50,31,2293
instead of its usual form:
2015-04-21 13:00:00,6,1007,29,25,2394
2015-04-21 13:00:00,7,1004,47,26,2522
i tried printing my data in shell after the program ran, and it would show that the broken csv entry actually appears to be normal.
hope you guys can help me out. thanks.
running python 2.7.9 on win8.1
What actions are performed on your "ndata" variable ?
You should use the csv module to manage CSV files : https://docs.python.org/2/library/csv.html
Edit after comment :
If you do not want to use the "csv" module I linked to you, instead of
a = urllib.urlopen(url)
data = a.read().strip()
ndata = data.split('\n')
f.write('\n'.join(ndata[1:]))
you should do this :
a = urllib.urlopen(url)
f.writelines(a.readlines()[1:])
I don't see any reason explaining your randomly unwanted "\n" if you are sure that your incoming data is correct. Do you manage very long lines ?
I recommand you to use the csv module to read your input : you'll be sure to have a valid CSV content if your input is correct.

Python urllib2 Images Distorted

I'm making a program using the website http://placekitten.com, but I've run into a bit of a problem. Using this:
im = urllib2.urlopen(url).read()
f = open('kitten.jpeg', 'w')
f.write(im)
f.close()
The image turns out distorted with mismatched colors, like this:
http://imgur.com/zVg64Kn.jpeg
I was wondering if there was an alternative to extracting images with urllib2. If anyone could help, that would be great!
You need to open the file in binary mode:
f = open('kitten.jpeg', 'wb')
Python will otherwise translate line endings to the native platform form, a transformation that breaks binary data, as documented for the open() function:
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability.
When copying data from a URL to a file, you could use shutil.copyfileob() to handle streaming efficiently:
from shutil import copyfileobj
im = urllib2.urlopen(url)
with open('kitten.jpeg', 'wb') as out:
copyfileobj(im, out)
This will read data in chunks, avoiding filling memory with large blobs of binary data. The with statement handles closing the file object for you.
Change
f = open('kitten.jpeg', 'w')
to read
f = open('kitten.jpeg', 'wb')
See http://docs.python.org/2/library/functions.html#open for more information. What's happening is that the newlines in the jpeg are getting modified in the process of saving, and opening as a binary file will prevent this.
If you're using Windows, you have to open the file in binary mode:
f = open('kitten.jpeg', 'wb')
Or more Pythonically:
import urllib2
url = 'http://placekitten.com.s3.amazonaws.com/homepage-samples/200/140.jpg'
image = urllib2.urlopen(url).read()
with open('kitten.jpg', 'wb') as handle:
handle.write(image)

Categories

Resources