JSON output using python including unwanted symbols - python

I am using Python for the first time to create a simple JSON parser. However, when printing the JSON data to the console, it includes many extra brackets and other symbols that are unwanted. I am also running Python 2.7.10.
import json
from urllib2 import urlopen
response = urlopen("https://finance.yahoo.com/webservice/v1/symbols/allcurrencies/quote?format=json")
source = response.read()
# print(source)
data = json.loads(source)
# print(json.dumps(data, indent=2))
usd_rates = dict()
for item in data['list']['resources']:
name = item['resource']['fields']['name']
price = item['resource']['fields']['price']
usd_rates[name] = price
print(name, price)
And the output is as follows:
When I try to change the python version to 3.7.10:

I think you are actually printing a tuple with python 2 print syntax and the u character is a unicode flag (What exactly do "u" and "r" string flags do, and what are raw string literals?).
Also in python 3 you couldn't use urllib2 but would have to use urllib.request.
This code works for me (python 3.6.5):
import json
from urllib.request import urlopen
response = urlopen("https://finance.yahoo.com/webservice/v1/symbols/allcurrencies/quote?format=json")
source = response.read()
data = json.loads(source)
usd_rates = dict()
for item in data['list']['resources']:
name = item['resource']['fields']['name']
price = item['resource']['fields']['price']
usd_rates[name] = price
print(name, price)
EDIT ---------
From the image you posted it looks like you have python 3 installed but your usr/bin/python is a symbolic link to usr/bin/python2.
If you want to run python 3 by default you could create an alias.
Check this link for more info https://askubuntu.com/questions/320996/how-to-make-python-program-command-execute-python-3
(should be valid info for macs too)

Convert name and price to string
print(str(name), str(price))
or
use
name = str(item['resource']['fields']['name'])

Related

How do I automatically change a part of a url to query a website a set number of times in python?

I have very basic knowledge of python, so sorry if my question sounds dumb.
I need to query a website for a personal project I am doing, but I need to query it 500 times, and each time I need to change 1 specific part of the url, then take the data and upload it to gsheets.
(The () signifies what part of the url I need to change)
'https://www.alphavantage.co/query?function=BALANCE_SHEET&symbol=(symbol)&apikey=apikey'
I thought about using while and format {} to do it, but I was unsure how to change the string each time, bar writing out the names for variables by hand (defeating the whole purpose of this).
I already have a list of the symbols I need to use, but I don't know how to input them
Example of how I get 1 piece of data
import requests
url = 'https://www.alphavantage.co/query?function=BALANCE_SHEET&symbol=MMM&apikey=demo'
r = requests.get(url)
data = r.json()
Example of what I'd like to change it to
import requests
url = 'https://www.alphavantage.co/query?function=BALANCE_SHEET&symbol=AOS&apikey=demo'
r = requests.get(url)
data = r.json()
#then change it to
import requests
url = 'https://www.alphavantage.co/query?function=BALANCE_SHEET&symbol=ABT&apikey=demo'
r = requests.get(url)
data = r.json()
so on and so forth, 500 times.
You might combine .format with for loop, consider following simple example
symbols = ["abc","xyz","123"]
for s in symbols:
url = 'https://www.example.com?symbol={}'.format(s)
print(url)
output
https://www.example.com?symbol=abc
https://www.example.com?symbol=xyz
https://www.example.com?symbol=123
You might also elect to use any other way of formatting, e.g. f-string (requires python3.6 or newer) in which case code would be
symbols = ["abc","xyz","123"]
for s in symbols:
url = f'https://www.example.com?symbol={s}'
print(url)
Alternatively you might params optional argument of requests.get function as follows
import requests
symbols = ["abc","xyz","123"]
for s in symbols:
r = requests.get('https://www.example.com', params={'symbol':s})
print(r.url)
output
https://www.example.com/?symbol=abc
https://www.example.com/?symbol=xyz
https://www.example.com/?symbol=123

How to access specific element of API Get with Python's Request Package?

Hi I am currently using Python 3.X to run the following code:
import requests
jurisdiction = 'us'
name = 'netflix'
limit = 1
r = requests.get('https://opencorporates.com/reconcile?query={%22query%22:%22' + name + '%22,%20%22limit%22:' + str(limit) + ',%20%22jurisdiction_code%22:%22' + jurisdiction + '%22}')
print(type(r))
print(r.text)
The output of this is
<class 'requests.models.Response'>
{"result":[{"id":"/companies/gb/12022722","name":"AMAZON-UK LIMITED","type":[{"id":"/organization/organization","name":"Organization"}],"score":69.0,"match":false,"uri":"https://opencorporates.com/companies/gb/12022722"}],"duration":157.957621}
I want to be able to access the company name from the response and then add it to a list. So I can iterate over a bunch of names/jurisdictions and have a list at the end that I can export to csv (or whatever).
I think that using a to.json or json.dump or something like that might work but I am not sure exactly how to do this? I am open to importing more packages like pandas etc, should the need arise.
Import json and you can read the company names like this:-
import json
data = json.loads(r.text)
#initialize your list
namesList = []
for s in data['result']:
name = s['name']
namesList.append(name)
You can try this.
data = {"result":[{"id":"/companies/gb/12022722","name":"AMAZON-UK LIMITED","type":[{"id":"/organization/organization","name":"Organization"}],"score":69.0,"match":False,"uri":"https://opencorporates.com/companies/gb/12022722"}],"duration":157.957621}
for i in data['result']:
print(i['name'])

scraping data from json after using requests

i am trying to extract specific data from requested json file
so after passing Authorization and using requests.get i got my request , i think it is called dictionary for python coders and called json for javascript coders
it containt too much information that i dont need and i would like to extract one or two only
for example {"bio" : " hello world " }
and that json file contains more that one " bio "
for example i am scraping 100 accounts and i would like to extract all " bio " in one code
so i tried this :
from bs4 import BeautifulSoup
import requests
headers = {"Authorization" : "xxxx"}
req = requests.get('website', headers = headers)
data = req.text
soup = BeautifulSoup(data,'html.parser')
titles = soup.find_all('span',{'class':'bio'})
for title in titles :
print(title.text)
and didnt work , i tried multiple ideas with no success
if possible please write me a code that i can understande since iam trying to learn more about my mistakes
thanks
The Aphid library I created is perfect for this.
from command-prompt
py -m pip install Aphid
Then its just as easy as loading your json data and searching it with aphid.
import json
import Aphid
resp = requests.get(yoururl)
data = json.loads(resp.text)
results = Aphid.findall(data, 'bio')
results is now equal to a list of tuples(key, value), of every occurence of the 'bio' key.
After you get your request either:
you get a simple json file (in which case you import it to python using json) or
you get an html file from which you can extract the json code (using BeautifulSoup) which in turn you will parse using json library.

Is there a module to convert Chinese character to Japanese (kanji) or Korean (hanja) in Python 3?

I'd like to switch CJK characters in Python 3.3. That is, I need to get 價(Korean) from 价(Chinese), and 価(Japanese) from 價. Is there a external module like that?
Unihan information
The Unihan page about 價 provide a simplified variant (vs. traditionnal), but doesn't seems to give Japanese/Korean one. So...
CJKlib
I would recommend to have a look at CJKlib, which has a feature section called Variants stating:
Z-variant forms, which only differ in typeface
[update] Z-variant
Your sample character 價 (U+50F9) doesn't have a z-variant. However 価 (U+4FA1) has a kZVariant to U+50F9 價. This seems weird.
Further reading
Package documentation is available on Python.org/pypi/cjklib ;
Z-variant form definition.
Here is a relatively complete conversion table. You can dump it to json for later use:
import requests
from bs4 import BeautifulSoup as BS
import json
def gen(soup):
for tr in soup.select('tr'):
tds = tr.select('td.tdR4')
if len(tds) == 6:
yield tds[2].string, tds[3].string
uri = 'http://www.kishugiken.co.jp/cn/code10d.html'
soup = BS(requests.get(uri).content, 'html5lib')
d = {}
for hanzi, kanji in gen(soup):
a = d.get(hanzi, [])
a.append(kanji)
d[hanzi] = a
print(json.dumps(d, indent=4))
The code and it's output are in this gist.

Download prices with python

I have tried this before. I'm completely at a loss for ideas.
On this page this dialog box to qet quotes.
http://www.schwab.com/public/schwab/non_navigable/marketing/email/get_quote.html?
I used SPY, XLV, IBM, MSFT
The output is the above with a table.
If you have an account the quote are real time --- via cookie.
How do I get the table into python using 2.6. The data as list or dictionary
Use something like Beautiful Soup to parse the HTML response from the web site and load it into a dictionary. use the symbol as the key and a tuple of whatever data you're interested in as the value. Iterate over all the symbols returned and add one entry per symbol.
You can see examples of how to do this in Toby Segaran's "Programming Collective Intelligence". The samples are all in Python.
First problem: the data is actually in an iframe in a frame; you need to be looking at https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=APC (where you substitute the appropriate symbol on the end of the URL).
Second problem: extracting the data from the page. I personally like lxml and xpath, but there are many packages which will do the job. I would probably expect some code like
import urllib2
import lxml.html
import re
re_dollars = '\$?\s*(\d+\.\d{2})'
def urlExtractData(url, defs):
"""
Get html from url, parse according to defs, return as dictionary
defs is a list of tuples ("name", "xpath", "regex", fn )
name becomes the key in the returned dictionary
xpath is used to extract a string from the page
regex further processes the string (skipped if None)
fn casts the string to the desired type (skipped if None)
"""
page = urllib2.urlopen(url) # can modify this to include your cookies
tree = lxml.html.parse(page)
res = {}
for name,path,reg,fn in defs:
txt = tree.xpath(path)[0]
if reg != None:
match = re.search(reg,txt)
txt = match.group(1)
if fn != None:
txt = fn(txt)
res[name] = txt
return res
def getStockData(code):
url = 'https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=' + code
defs = [
("stock_name", '//span[#class="header1"]/text()', None, str),
("stock_symbol", '//span[#class="header2"]/text()', None, str),
("last_price", '//span[#class="neu"]/text()', re_dollars, float)
# etc
]
return urlExtractData(url, defs)
When called as
print repr(getStockData('MSFT'))
it returns
{'stock_name': 'Microsoft Corp', 'last_price': 25.690000000000001, 'stock_symbol': 'MSFT:NASDAQ'}
Third problem: the markup on this page is presentational, not structural - which says to me that code based on it will likely be fragile, ie any change to the structure of the page (or variation between pages) will require reworking your xpaths.
Hope that helps!
Have you thought of using yahoo's quotes api?
see: http://developer.yahoo.com/yql/console/?q=show%20tables&env=store://datatables.org/alltableswithkeys#h=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20%3D%20%22YHOO%22
You will be able to dynamically generate a request to the website such as:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20%3D%20%22YHOO%22&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys
And just poll it with standard a http GET request. The response is in XML format.
matplotlib has a module that gets historical quotes from Yahoo:
>>> from matplotlib.finance import quotes_historical_yahoo
>>> from datetime import date
>>> from pprint import pprint
>>> pprint(quotes_historical_yahoo('IBM', date(2010, 11, 12), date(2010, 11, 18)))
[(734088.0,
144.59,
143.74000000000001,
145.77000000000001,
143.55000000000001,
4731500.0),
(734091.0,
143.88999999999999,
143.63999999999999,
144.75,
143.27000000000001,
3827700.0),
(734092.0,
142.93000000000001,
142.24000000000001,
143.38,
141.18000000000001,
6342100.0),
(734093.0,
142.49000000000001,
141.94999999999999,
142.49000000000001,
141.38999999999999,
4785900.0)]

Categories

Resources