CSV file taken through API won't print properly into individual lines - python

I'm quite new to Python and am trying to learn as much as I can by watching videos/reading tutorials.
I was following this video on how to take data from Quandl. I know there is a specific module for python already, but I wanted to learn how to take it from the website if necessary. My issue is that when I try to emulate the code around 9:50 and print the result, python doesn't split the lines in the CSV file. I understand he's using python 2.x, while I'm using 3.4.
Here's the code I use:
import urllib
from urllib.request import urlopen
def grabQuandl(ticker):
endLink = 'sort_order=desc'#without authtoken
try:
salesRev = urllib.request.urlopen('https://www.quandl.com/api/v1/datasets/SEC/'+ticker+'_SALESREVENUENET_Q.csv?&'+endLink).read()
print (salesRev)
except Exception as e:
print ('failed the main quandl loop for reason of', str(e))
grabQuandl('AAPL')
And this is what gets printed:
b'Date,Value\n2009-06-27,8337000000.0\n2009-12-26,15683000000.0\n2010-03-27,13499000000.0\n2010-06-26,15700000000.0\n2010-09-25,20343000000.0\n2010-12-25,26741000000.0\n2011-03-26,24667000000.0\n2011-06-25,28571000000.0\n2011-09-24,28270000000.0\n2011-12-31,46333000000.0\n2012-03-31,39186000000.0\n2012-06-30,35023000000.0\n2012-09-29,35966000000.0\n2012-12-29,54512000000.0\n2013-03-30,43603000000.0\n2013-06-29,35323000000.0\n2013-09-28,37472000000.0\n2013-12-28,57594000000.0\n2014-03-29,45646000000.0\n2014-06-28,37432000000.0\n2014-09-27,42123000000.0\n2014-12-27,74599000000.0\n2015-03-28,58010000000.0\n'
I get that the \n is some sort of line splitter, but it's not working like in the video. I've googled for possible solutions, such as doing a for loop, using read().split(), but at best they simply remove the \n. I can't get the output into a table like in the video. What am I doing wrong?

.read() gives you back a byte-string , when you directly print it, you get the result you got.You can notice the b at the starting before the quote, it indicates byte-string.
You should decode the string you get, before printing (or directly while using .read() . An example -
import urllib
from urllib.request import urlopen
def grabQuandl(ticker):
endLink = 'sort_order=desc'#without authtoken
try:
salesRev = urllib.request.urlopen('https://www.quandl.com/api/v1/datasets/SEC/'+ticker+'_SALESREVENUENET_Q.csv?&'+endLink).read().decode('utf-8')
print (salesRev)
except Exception as e:
print ('failed the main quandl loop for reason of', str(e))
grabQuandl('AAPL')
The above decodes the returned data using utf-8 encoding, you can use whatever encoding you want (whatever encoding the data is).
Example to show the print behavior -
>>> s = b'asd\nbcd\n'
>>> print(s)
b'asd\nbcd\n'
>>> print(s.decode('utf-8'))
asd
bcd
>>> type(s)
<class 'bytes'>

Related

Handling ` ` in Python

Problem Background:
I have an XML file that I'm importing into BeautifulSoup and parsing through. One node has the following:
<DIAttribute name="ObjectDesc" value="Line1
Line2
Line3"/>
Notice that the value has 
 and
within the text. I understand those are the XML representation of carriage return and line feed.
When I import into BeautifulSoup, the value gets converted into the following:
<DIAttribute name="ObjectDesc" value="Line1
Line2
Line3"/>
You'll notice that the
gets converted to a newline.
My use case requires that the value remains as the original. Any idea how to get that to stay? Or convert it back?
Source Code:
python: (2.7.11)
from bs4 import BeautifulSoup #version 4.4.0
s = BeautifulSoup(open('test.xml'),'lxml-xml',from_encoding="ansi")
print s.DIAttribute
#XML file looks like
'''
<?xml version="1.0" encoding="UTF-8" ?>
<DIAttribute name="ObjectDesc" value="Line1
Line2
Line3"/>
'''
Notepad++ says the encoding of the source XML file is ANSI.
Things I've Tried:
I've scoured the documentation without any success.
Variations for line 3:
print s.DIAttribute.prettify('ascii')
print s.DIAttribute.prettify('windows-1252')
print s.DIAttribute.prettify('ansi')
print s.DIAttribute.prettify('utf-8')
print s.DIAttribute['value'].replace('\r','
').replace('\n','
') #This works, but it feels like a bandaid and will likely other problems will remain.
Any ideas anyone? I appreciate any comments/suggestions.
Just for record, first the libraries that DO NOT handle properly the
entity: BeautifulSoup(data ,convertEntities=BeautifulSoup.HTML_ENTITIES), lxml.html.soupparser.unescape, xml.sax.saxutils.unescape
And this is what works (in Python 2.x):
import sys
import HTMLParser
## accept file name as argument, or read stdin if nothing passed
data = len(sys.argv) > 1 and open(sys.argv[1]).read() or sys.stdin.read()
parser = HTMLParser.HTMLParser()
print parser.unescape(data)

Syntax Issue in Python urllib2?

Am trying to test out urllib2. Here's my code:
import urllib2
response = urllib2.urlopen('http://pythonforbeginners.com/')
print response.info()
html = response.read()
response.close()
When I run it, I get:
Syntax Error: invalid syntax. Carrot points to line 3 (the print line). Any idea what's going on here? I'm just trying to follow a tutorial and this is the first thing they do...
Thanks,
Mariogs
In Python3 print is a function. Therefore it needs parentheses around its argument:
print(response.info())
In Python2, print is a statement, and hence does not require parentheses.
After correcting the SyntaxError, as alecxe points out, you'll probably encounter an ImportError next. That is because the Python2 module called urllib2 was renamed to urllib.request in Python3. So you'll need to change it to
import urllib.request as request
response = request.urlopen('http://pythonforbeginners.com/')
As you can see, the tutorial you are reading is meant for Python2. You might want to find a Python3 tutorial or Python3 urllib HOWTO to avoid running into more of these problems.

Facebook Graph API encoding - Python

I am going around in circles and tried so many different ways so I guess my core understanding is wrong. I would be grateful for help in understanding my encoding/decoding issues.
import urllib2
result = urllib2.urlopen("https://graph.facebook.com/163146530455639")
rawdata = result.read().decode('utf-8')
print "HEADER: " + str(result.info())
print "I want this to work ", rawdata.find('http://www.facebook.com')
print "I dont want this to work ", rawdata.find('http:\/\/www.facebook.com')
I guess what im getting isnt utf-8 even though the header seems to say it is. Or as a newbie to Python im doing something dumb. :(
Thanks for any help,
Phil
You're getting JSON back from Facebook, so the easiest thing to do is use the built in json module to decode it (provided you're using Python 2.6+, otherwise you'll have to install).
import json
import urllib2
result = urllib2.urlopen("https://graph.facebook.com/163146530455639")
rawdata = result.read()
jsondata = json.load(rawdata)
print jsondata['link']
gives you:
u'http://www.facebook.com/GrosvenorCafe'

Taking String arguments for a function without quotes

I've got a function meant to download a file from a URL and write it to a disk, along with imposing a particular file extension. At present, it looks something like this:
import requests
import os
def getpml(url,filename):
psc = requests.get(url)
outfile = os.path.join(os.getcwd(),filename+'.pml')
f = open(outfile,'w')
f.write(psc.content)
f.close()
try:
with open(outfile) as f:
print "File Successfully Written"
except IOError as e:
print "I/O Error, File Not Written"
return
When I try something like
getpml('http://www.mysite.com/data.txt','download') I get the appropriate file sitting in the current working directory, download.pml. But when I feed the function the same arguments without the ' symbol, Python says something to the effect of "NameError: name 'download' is not defined" (the URL produces a syntax error). This even occurs if, within the function itself, I use str(filename) or things like that.
I'd prefer not to have to input the arguments of the function in with quote characters - it just makes entering URLs and the like slightly more difficult. Any ideas? I presume there is a simple way to do this, but my Python skills are spotty.
No, that cannot be done. When you are typing Python source code you have to type quotes around strings. Otherwise Python can't tell where the string begins and ends.
It seems like you have a more general misunderstanding too. Calling getpml(http://www.mysite.com) without quotes isn't calling it with "the same argument without quotes". There simply isn't any argument there at all. It's not like there are "arguments with quotes" and "arguments without quotes". Python isn't like speaking a natural human language where you can make any sound and it's up to the listener to figure out what you mean. Python code can only be made up of certain building blocks (object names, strings, operators, etc.), and URLs aren't one of those.
You can call your function differently:
data = """\
http://www.mysite.com/data1.txt download1
http://www.mysite.com/data2.txt download2
http://www.mysite.com/data3.txt download3
"""
for line in data.splitlines():
url, filename = line.strip().split()
getpml(url, filename)

Python - writing lines from file into IRC buffer

Ok, so I am trying to write a Python script for XCHAT that will allow me to type "/hookcommand filename" and then will print that file line by line into my irc buffer.
EDIT: Here is what I have now
__module_name__ = "scroll.py"
__module_version__ = "1.0"
__module_description__ = "script to scroll contents of txt file on irc"
import xchat, random, os, glob, string
def gg(ascii):
ascii = glob.glob("F:\irc\as\*.txt")
for textfile in ascii:
f = open(textfile, 'r')
def gg_cb(word, word_eol, userdata):
ascii = gg(word[0])
xchat.command("msg %s %s"%(xchat.get_info('channel'), ascii))
return xchat.EAT_ALL
xchat.hook_command("gg", gg_cb, help="/gg filename to use")
Well, your first problem is that you're referring to a variable ascii before you define it:
ascii = gg(ascii)
Try making that:
ascii = gg(word[0])
Next, you're opening each file returned by glob... only to do absolutely nothing with them. I'm not going to give you the code for this: please try to work out what it's doing or not doing for yourself. One tip: the xchat interface is an extra complication. Try to get it working in plain Python first, then connect it to xchat.
There may well be other problems - I don't know the xchat api.
When you say "not working", try to specify exactly how it's not working. Is there an error message? Does it do the wrong thing? What have you tried?

Categories

Resources