Python Extracting Data from JSON without a label? - python

The API here: https://api.bitfinex.com/v2/tickers?symbols=ALL
does not have any labels and I want to extract all of the tBTCUSD, tLTCUSD etc.. Basically everything without numbers. Normally, i would extract this information if they are labeled so i can do something like:
data['name']
or something like that however this API does not have labels.. how can i get this info with python?

You can do it like this:
import requests
j = requests.get('https://api.bitfinex.com/v2/tickers?symbols=ALL').json()
mydict = {}
for i in j:
mydict[i[0]] = i[1:]
Or using dictionary comprehension:
mydict = {i[0]: i[1:] for i in j}
Then access it as:
mydict['tZRXETH']

I don't have access to Python right now, but it looks like they're organized in a superarray of several subarrays.
You should be able to extract everything (the superarray) as data, and then do a:
for array in data:
print array[0]
Not sure if this answers your question. Let me know!

Even if it doesn't have labels (or, more specifically, if it's not a JSON object) it's still a perfectly legal piece of JSON, since it's just some arrays contained within a parent array.
Assuming you can already get the text from the api, you can load it as a Python object using json.loads:
import json
data = json.loads(your_data_as_string)
Then, since the labels you want to extract are always in the first position of the arrays, you can store them in a list using a list comprehension:
labels = [x[0] for x in data]
labels will be:
['tBTCUSD', 'tLTCUSD', 'tLTCBTC', 'tETHUSD', 'tETHBTC', 'tETCBTC', ...]

Related

Assign a label to each element of an array in Python

Hi So basically I got 2 arrays. For the sake of simplicity the following:
array_notepad = []
array_images = []
Some magic happens and they are populated, i.e. data is loaded, for array_notepad data is read from a notepad file whilst array_images is populated with the RGB values from a folder containing images.
How do I use array_notepad as a label of array_images?
i.e. the label of array_images[0] is array_notepad[0], array_images[1] is array_notepad[1], array_images[1] is array_notepad[1], and so on until array_images[999] is array_notepad[999]
If it makes any difference I am using glob and cv2 to read the image data, whilst normal python file reader to read the content in the notepad.
Thanks a lot for your help!
Your question isn't entirely clear on what your expected output should be. You mention 'label' - to me it sounds like you're describing key-value pairs i.e. a dictionary.
In which case you should be able to use the zip function as described in this question: Convert two lists into a dictionary
I hope you want to create a dictionary from 2 lists. If so you could do as follows.
array_notepad = ['label1', 'label2', 'label3']
array_images = ['rgb1', 'rgb2', 'rgb3']
d = { label: value for label, value in zip(array_notepad, array_images) }
d

PairRDD(K, L<V>) to multiple files by key serializing each value in L<V>

I have a PairRDD with a set of key and list of values, each value in the list is a json which I already loaded beginning of my spark app, how can I iterate over each value of the list in my pair RDD to transform it to a string then save the whole content of the key to a file?
my input files look like:
{cat:'red',value:'asd'}
{cat:'green',value:'zxc'}
{cat:'red',value:'jkl'}
The PairRDD looks like
('red', [{cat:'red',value:'asd'},{cat:'red',value:'jkl'}])
('green', [{cat:'green',value:'zxc'}])
so as you can see I I'd like to serialize each json in the value list back to string so I can easily saveAsTextFile(), ofcourse I'm trying to save a separate file for each key
The way I got here:
rawcatRdd = sc.textFile("hdfs://x.x.x.../unstructured/cat-0-500.txt")
import json
categoriesJson = rawcatRdd.map(lambda x: json.loads(x))
categories = categoriesJson
catByDate = categories.map(lambda x: (x['cat'], x)
catGroup = catByDate.groupByKey()
catGroupArr = catGroup.mapValues(lambda x : list(x))
Ideally I want to create a cat-red.txt that looks like:
{cat:'red',value:'asd'}
{cat:'red',value:'jkl'}
and the same for the rest of the keys.
I already looked at this answer but I'm slightly lost as host to process each value in the list before I save the contents to a file
Thanks in advance!

Urlencode dictionary using Python - naming key and value in the url

I am attempting to generate a URL link in the following format using urllib and urlencode.
<img src=page.psp?KEY=%28SpecA%2CSpecB%29&VALUE=1&KEY=%28SpecA%2C%28SpecB%2CSpecC%29%29&VALUE=2>
I'm trying to use data from my dictionary to input into the urllib.urlencode() function however, I need to get it into a format where the keys and values have a variable name, like below. So the keys from my dictionary will = NODE and values will = VALUE.
wanted = urllib.urlencode( [("KEY",v1),("VALUE",v2)] )
req.write( "<a href=page.psp?%s>" % (s) );
The problem I am having is that I want the URL as above and instead I am getting what is below, rather than KEY=(SpecA,SpecB) NODE=1, KEY=(SpecA,SpecB,SpecC) NODE=2 which is what I want.
KEY=%28SpecA%2CSpecB%29%2C%28%28SpecA%2CSpecB%29%2CSpecC%29&VALUE=1%2C2
So far I have extracted keys and values from the dictionary, extracted into tuples, lists, strings and also tried dict.items() but it hasn't helped much as I still can't get it to go into the format I want. Also I am doing this using Python server pages which is why I keep having to print things as a string due to constant string errors. This is part of what I have so far:
k = (str(dict))
ver1 = dict.keys()
ver2 = dict.values()
new = urllib.urlencode(function)
f = urllib.urlopen("page.psp?%s" % new)
I am wondering what I need to change in terms of extracting values from the dictionary/converting them to different formats in order to get the output I want? Any help would be appreciated and I can add more of my code (as messy as it has become) if need be. Thanks.
This should give you the format you want:
data = {
'(SpecA,SpecB)': 1,
'(SpecA,SpecB,SpecC)': 2,
}
params = []
for k,v in data.iteritems():
params.append(('KEY', k))
params.append(('VALUE', v))
new = urllib.urlencode(params)
Note that the KEY/VALUE pairings may not be the order you want, given that dicts are unordered.

Parse json data

My data.json is
{"a":[{"b":{"c":{ "foo1":1, "foo2":2, "foo3":3, "foo4":4}}}],"d":[{"e":{"bar1":1, "bar2":2, "bar3":3, "bar4":4}}]}
I am able to list both key/pair values. My code is:
#! /usr/bin/python
import json
from pprint import pprint
with open('data2.json') as data_file:
data = json.load(data_file)
pprint(data["d"][0]["e"])
Which gives me:
{u'bar1': 1, u'bar2': 2, u'bar3': 3, u'bar4': 4}
But I want to display only the keys without any quotes and u like this:
bar1, bar2, bar3, bar4
Can anybody suggest anything? It need not be only in python, can be in shell script also.
The keys of this object are instances of the unicode string class. Given this, the default printing behavior of the dict instance for which they are the keys will print them as you show in your post.
This is because the dict implementation of representing its contents as a string (__repr__ and/or __str__) seeks to show you what objects reside in the dict, not what the string representation of those objects looks like. This is an important distinction, for example:
In [86]: print u'hi'
hi
In [87]: x = u'hi'
In [88]: x
Out[88]: u'hi'
In [89]: print x
hi
This should work for you, assuming that printing the keys together as a comma-separated unicode is fine:
print ", ".join(data["d"][0]["e"])
You can achieve this using the keys member function from dict too, but it's not strictly necessary.
print ', '.join((data["d"][0]["e"].keys()))
data["d"][0]["e"] returns a dict. In python2, You could use this to get the keys of that dict with something like this:
k = data["d"][0]["e"].keys()
print(", ".join(k))
In python3, wrap k in a list like this
k = list(data["d"][0]["e"].keys())
print(", ".join(k))
Even simpler, join will iterate over the keys of the dict.
print(", ".join(data["d"][0]["e"]))
Thanks to #thefourtheye for pointing this out.

Splitting json data in python

I'm trying to manipulate a list of items in python but im getting the error "AttributeError: 'list' object has no attribute 'split'"
I understand that list does not understand .split but i don't know what else to do. Below is a copy paste of the relevant part of my code.
tourl = 'http://data.bitcoinity.org/chart_data'
tovalues = {'timespan':'24h','resolution':'hour','currency':'USD','exchange':'all','mining_pool':'all','compare':'no','data_type':'price_volume','chart_type':'line_bar','smoothing':'linear','chart_types':'ccacdfcdaa'}
todata = urllib.urlencode(tovalues)
toreq = urllib2.Request(tourl, todata)
tores = urllib2.urlopen(toreq)
tores2 = tores.read()
tos = json.loads(tores2)
tola = tos["data"]
for item in tola:
ting = item.get("values")
ting.split(',')[2] <-----ERROR
print(ting)
To understand what i'm trying to do you will also need to see the json data. Ting outputs this:
[
[1379955600000L, 123.107310846774], [1379959200000L, 124.092526428571],
[1379962800000L, 125.539504822835], [1379966400000L, 126.27024617931],
[1379970000000L, 126.723474983766], [1379973600000L, 126.242406356837],
[1379977200000L, 124.788410570987], [1379980800000L, 126.810084904632],
[1379984400000L, 128.270580796748], [1379988000000L, 127.892411269036],
[1379991600000L, 126.140579640523], [1379995200000L, 126.513705084746],
[1379998800000L, 128.695124951923], [1380002400000L, 128.709738051044],
[1380006000000L, 125.987767097378], [1380009600000L, 124.323433535528],
[1380013200000L, 123.359378559603], [1380016800000L, 125.963250678733],
[1380020400000L, 125.074618194444], [1380024000000L, 124.656345088853],
[1380027600000L, 122.411303435449], [1380031200000L, 124.145747100372],
[1380034800000L, 124.359452274881], [1380038400000L, 122.815357211394],
[1380042000000L, 123.057706915888]
]
[
[1379955600000L, 536.4739135], [1379959200000L, 1235.42506637],
[1379962800000L, 763.16329656], [1379966400000L, 804.04579319],
[1379970000000L, 634.84689741], [1379973600000L, 753.52716718],
[1379977200000L, 506.90632968], [1379980800000L, 494.473732950001],
[1379984400000L, 437.02095093], [1379988000000L, 176.25405034],
[1379991600000L, 319.80432715], [1379995200000L, 206.87212398],
[1379998800000L, 638.47226435], [1380002400000L, 438.18036666],
[1380006000000L, 512.68490443], [1380009600000L, 904.603705539997],
[1380013200000L, 491.408088450001], [1380016800000L, 670.275397960001],
[1380020400000L, 767.166941339999], [1380024000000L, 899.976089609997],
[1380027600000L, 1243.64963909], [1380031200000L, 1508.82429811],
[1380034800000L, 1190.18854705], [1380038400000L, 546.504592349999],
[1380042000000L, 206.84883264]
]
And ting[0] outputs this:
[1379955600000L, 123.187067936508]
[1379955600000L, 536.794013499999]
What i'm really trying to do is add up the values from ting[0-24] that comes AFTER the second comma. This made me try to do a split but that does not work
You already have a list; the commas are put there by Python to delimit the values only when printing the list.
Just access element 2 directly:
print ting[2]
This prints:
[1379962800000, 125.539504822835]
Each of the entries in item['values'] (so ting) is a list of two float values, so you can address each of those with index 0 and 1:
>>> print ting[2][0]
1379962800000
>>> print ting[2][1]
125.539504822835
To get a list of all the second values, you could use a list comprehension:
second_vals = [t[1] for t in ting]
When you load the data with json.loads, it is already parsed into a real list that you can slice and index as normal. If you want the data starting with the third element, just use ting[2:]. (If you just want the third element by itself, just use ting[2].)

Categories

Resources