Python List comprehension and JSON parsing - python

I'm new to Python and trying to figure out the best way to parse the values of a JSON object into an array, using a list comprehension.
Here is my code - I'm querying the publicly available iNaturalist API and would like to take the JSON object that it returns, so that I take specific parts of the JSON object into a bumpy array:
import json
import urllib2
#Set Observations URL request for Resplendent Quetzal of Costa Rica
query = urllib2.urlopen("http://api.inaturalist.org/v1/observations?place_id=6924&taxon_id=20856&per_page=200&order=desc&order_by=created_at")
obSet = json.load(query)
#Print out Lat Long of observation
n = obSet['total_results']
for i in range(n) :
print obSet['results'][i]['location']
This all works fine and gives the following output:
9.5142456535,-83.8011438905
10.2335478381,-84.8517773638
10.3358965682,-84.9964271008
10.3744851815,-84.9871494128
10.2468720343,-84.9298072822
...
What I'd like to do next is replace the for loop with a list comprehension, and store the location value in a tuple. I'm struggling with the syntax in that I'm guessing it's something like this:
[(long,lat) for i in range(n) for (long,lat) in obSet['results'][i]['location']]
But this doesn't work...thanks for any help.

obSet['results'] is a list, no need to use range to iterate over it:
for item in obSet['results']:
print(item['location'])
To make this into list comprehension you can write:
[item['location'] for item in obSet['results']]
But, each location is coded as a string, instead of list or tuple of floats. To get it to the proper format, use
[tuple(float(coord) for coord in item['location'].split(','))
for item in obSet['results']]
That is, split the item['location'] string into parts using , as the delimiter, then convert each part into a float, and make a tuple of these float coordinates.

The direct translation of your code into a list comprehension is:
positions = [obSet['results'][i]['location'] for i in range(obSet['total_results'])]
The obSet['total_results'] is informative but not needed, you could just loop over obSet['results'] directly and use each resulting dictionary:
positions = [res['location'] for res in obSet['results']]
Now you have a list of strings however, as each 'location' is still the long,lat formatted string you printed before.
Split that string and convert the result into a sequence of floats:
positions = [map(float, res['location'].split(',')) for res in obSet['results']]
Now you have a list of lists with floating point values:
>>> [map(float, res['location'].split(',')) for res in obSet['results']]
[[9.5142456535, -83.8011438905], [10.2335478381, -84.8517773638], [10.3358965682, -84.9964271008], [10.3744851815, -84.9871494128], [10.2468720343, -84.9298072822], [10.3456659939, -84.9451804822], [10.3611732346, -84.9450302597], [10.3174360636, -84.8798676791], [10.325110706, -84.939710318], [9.4098152454, -83.9255607577], [9.4907141714, -83.9240819199], [9.562637289, -83.8170178428], [9.4373885911, -83.8312881263], [9.4766746409, -83.8120952573], [10.2651190176, -84.6360466565], [9.6572995298, -83.8322965118], [9.6997991784, -83.9076919066], [9.6811177044, -83.8487647156], [9.7416717045, -83.929327673], [9.4885099275, -83.9583968683], [10.1233252667, -84.5751029683], [9.4411815757, -83.824401543], [9.4202687169, -83.9550344212], [9.4620656621, -83.665183105], [9.5861809119, -83.8358881552], [9.4508914243, -83.9054016165], [9.4798058284, -83.9362558497], [9.5970449879, -83.8969131893], [9.5855562829, -83.8354434596], [10.2366179555, -84.854847472], [9.718459702, -83.8910277016], [9.4424384874, -83.8880459793], [9.5535916157, -83.9578166199], [10.4124554163, -84.9796942349], [10.0476688795, -84.298227929], [10.2129436252, -84.8384097435], [10.2052632717, -84.6053701877], [10.3835784147, -84.8677930134], [9.6079669672, -83.9084281155], [10.3583643315, -84.8069762134], [10.3975986735, -84.9196996767], [10.2060835381, -84.9698814407], [10.3322929317, -84.8805587129], [9.4756504472, -83.963818143], [10.3997876964, -84.9127311339], [10.1777433853, -84.0673088686], [10.3346128571, -84.9306278215], [9.5193346195, -83.9404786293], [9.421538224, -83.7689452093], [9.430427837, -83.9532672942], [10.3243212895, -84.9653175843], [10.021698503, -83.885674888]]
If you must have tuples rather than lists, add a tuple() call:
positions = [tuple(map(float, res['location'].split(',')))
for res in obSet['results']]
The latter also makes sure the expression works in Python 3 (where map() returns an iterator, not a list); you'd otherwise have to use a nested list comprehension:
# produce a list of lists in Python 3
positions = [[float(p) for p in res['location'].split(',')] for res in obSet['results']]

Another way to get list of [long, lat] without list comprehension:
In [14]: map(lambda x: obSet['results'][x]['location'].split(','), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you would like list of tuples instead:
In [14]: map(lambda x: tuple(obSet['results'][x]['location'].split(',')), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you want to convert to floats too:
In [17]: map(lambda x: tuple(map(float, obSet['results'][x]['location'].split(','))), range(obSet['total_results']))
Out[17]:
[(9.5142456535, -83.8011438905),
(10.2335478381, -84.8517773638),
(10.3358965682, -84.9964271008),
(10.3744851815, -84.9871494128),
(10.2468720343, -84.9298072822),
(10.3456659939, -84.9451804822),
...

You can iterate over the list of results directly:
print([tuple(result['location'].split(',')) for result in obSet['results']])
>> [('9.5142456535', '-83.8011438905'), ('10.2335478381', '-84.8517773638'), ... ]

[tuple(obSet['results'][i]['location'].split(',')) for i in range(n)]
This will return a list of tuple, elements of the tuples are unicode.
If you want that the elements of tuples as floats, do the following:
[tuple(map(float,obSet['results'][i]['location'].split(','))) for i in range(n)]

To correct way to get a list of tuples using list comprehensions would be:
def to_tuple(coords_str):
return tuple(coords_str.split(','))
output_list = [to_tuple(obSet['results'][i]['location']) for i in range(obSet['total_results'])]
You can of course replace to_tuple() with a lambda function, I just wanted to make the example clear. Moreover, you could use map() to have a tuple with floats instead of string: return tuple(map(float,coords_str.split(','))).

Let's try to give this a shot, starting with just 1 location:
>>> (long, lat) = obSet['results'][0]['location']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Alright, so that didn't work, but why? It's because the longitude and latitude coordinates are just 1 string, so you can't unpack it immediately as a tuple. We must first separate it into two different strings.
>>> (long, lat) = obSet['results'][0]['location'].split(",")
From here we will want to iterate through the whole set of results, which we know are indexed from 0 to n. tuple(obSet['results'][i]['location'].split(",")) will give us the tuple of longitude, latitude for the result at index i, so:
>>> [tuple(obSet['results'][i]['location'].split(",")) for i in range(n)]
ought to give us the set of tuples we want.

Related

How to convert to 1-dimensional list a list of a string that is list that contains int elements

Can't figure out how to make a proper 1-dimensional list out of
a = ['[2,5911,3391,10687,9796,15870,11533]']
a[1:-1]
i want to get
[2,5911,3391,10687,9796,15870,11533]
Slices don't seem to work there
Is there any elegant way to do it without writing 2 for loops
Appreciate all the help
You have to first get the string a[0] then remove the brakets: a[0][1:-1] then you can use .split() like so:
a = ['[2,5911,3391,10687,9796,15870,11533]']
a = a[0][1:-1].split(',')
print(a)
Output:
['2', '5911', '3391', '10687', '9796', '15870', '11533']
You can also get the integers (if that's what you're after) using a list comprehension like so:
a = ['[2,5911,3391,10687,9796,15870,11533]']
a = [int(item) for item in a[0][1:-1].split(',')]
print(a)
Output:
[2, 5911, 3391, 10687, 9796, 15870, 11533]
import ast
a = ['[2,5911,3391,10687,9796,15870,11533]']
ast.literal_eval(a[0])
Output:
[2, 5911, 3391, 10687, 9796, 15870, 11533]
You can use the standard package json for that, in particular the loads method that can deserialize a string:
import json
a = ['[2,5911,3391,10687,9796,15870,11533]']
json.loads(a[0])
# [2, 5911, 3391, 10687, 9796, 15870, 11533]
Or if you expect a to contain more than just one element:
a_converted = [json.loads(_el) for _el in a]
# [[2, 5911, 3391, 10687, 9796, 15870, 11533]]
The advantage here is that this will work even if the string contains other data types like floats or strings.

Sort list for date in Python

import re
arr1 = ['2018.07.17 11:30:00,-0.19', '2018.07.17 17:55:00,0.86']
arr2 = ['2018.07.17 11:34:00,-0.39', '2018.07.17 17:59:01,0.85']
def combine_strats_lambda(*strats):
"""
Takes *strats in date,return format
combines infinite amount of strats with date, return and packs them into
one
single sorted array
>> RETURN: combined list
"""
temp = []
# create combined list
for v in enumerate(strats):
i = 0
while i < len(v[1]):
temp.append(v[1][i])
#k = re.findall(r"[\w']+", temp)[:6]
i += 1
temp2 = sorted(timestamps, key=lambda d: tuple(map(int, re.findall(r"[\w']+", d[0]))))
return temp2
Hi,
I've been trying to finish this function, which should combine multiple lists of dates,percentage returns and sort them.
I've come across a solution with lambda but all I get is this message:
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
Do you know an easier solution to the problem or what the error is caused by? I can't seem to figure it out.
Anything appreciated :)
The very basic error in your code is in line:
for v in enumerate(strats):
You have apparently forgotten that enumerate(...) returns two
values: the index and the current value from the iterable.
So, as you used just single v, it gets the index, not the value.
Another important point is that if the datetime strings are written as
yyyy.MM.dd hh:mm:ss, you can sort them using just string sort.
So, to gather the strings, you need a list comprehension, with 2 nested
loops.
And to sort them, you should use sorted function, specifying as the sort
key the "initial" (date / time) part, before the comma.
To sum up, to get the sorted list of strings, taken from a couple of
arguments of your function, sorted on the date / time part,
you can use the following program, written using version 3.6 of Python:
arr1 = ['2018.07.17 11:30:00,-0.19', '2018.07.17 17:55:00,0.86']
arr2 = ['2018.07.17 11:34:00,-0.39', '2018.07.17 17:59:01,0.85']
def combine_strats_lambda(*strats):
temp = [ v2 for v1 in strats for v2 in v1 ]
return sorted(temp, key = lambda v: v.split(',')[0])
res = combine_strats_lambda(arr1, arr2)
for x in res:
parts = x.split(',')
print("{:20s} {:>6s}".format(parts[0], parts[1]))
It does not even use re module.

Python List Index Out of Range nested list

I have a nested list main_category, each nested list is a unicode string of business names. The first five lines of the nested lists are below:
[[u'Medical Centers', u'Health and Medical'],
[u'Massage', u'Beauty and Spas'],
[u'Tattoo', u'Beauty and Spas'],
[u'Music & DVDs', u'Books, Mags, Music and Video', u'Shopping'],
[u'Food', u'Coffee & Tea']]
So I want to get the first element of every list, and I have tried list comprehension, zip, but nothing works.
new_cate = [d[0] for d in main_category]
lst = zip(*main_category)[0]
But all of them give me
IndexErrorTraceback (most recent call last)
<ipython-input-49-4a397c8e62fd> in <module>()
----> 1 lst = zip(*main_category)[0]
IndexError: list index out of range
I really don't know what is wrong with this. So could anyone help? Thanks so much!
The error indicates one/some of the sublists in the full list are empty lists. You need to properly handle that. You can put a ternary operator in the list comprehension to substitute a default value when the list empty is and index the first item when it isn't:
default = ''
new_cate = [d[0] if d else default for d in main_category]
# ^^^^-> test if list is truthy
You can also replicate this fix for zip by using it's itertools variant izip_longest which allows you to set a fillvalue:
from itertools import izip_longest
default = ''
lst = list(izip_longest(*main_category, fillvalue=default))[0]
So you have a list of lists.
for content in matrix:
At each iteration content will return a full list. [u'Medical Centers', u'Health and Medical'] for example.
If you print(content[0]), you will get the first value of the current list, which would be u'Medical Centers'.
If there's a list without content in the matrix, print(content[0]) would raise IndexError, so you need to check if the current list is not None with if content:.
matrix = [[u'Medical Centers', u'Health and Medical'],
[u'Massage', u'Beauty and Spas'],
[u'Tattoo', u'Beauty and Spas'],
[u'Music & DVDs', u'Books, Mags, Music and Video', u'Shopping'],
[u'Food', u'Coffee & Tea']]
for content in matrix:
if content:
print(content[0])
>>> Medical Centers
>>> Massage
>>> Tattoo
>>> Music & DVDs
>>> Food

How do I pull certain values from a list of lists in python?

def get_monthly_averages(original_list):
#print(original_list)
daily_averages_list = [ ]
for i in range (0, len(original_list)):
month_list = i[0][0:7]
volume_str = i[5]
#print(volume_str)
adj_close_str = i[6]
#print(adj_close_str)
daily_averages_tuple = (month_list,volume_str,adj_close_str)
daily_averages_list.append(daily_averages_tuple.split(','))
return daily_averages_list
I have a list like
[
['2004-08-30', '105.28', '105.49', '102.01', '102.01', '2601000', '102.01'],
['2004-08-27', '108.10', '108.62', '105.69', '106.15', '3109000', '106.15'],
['2004-08-26', '104.95', '107.95', '104.66', '107.91', '3551000', '107.91'],
['2004-08-25', '104.96', '108.00', '103.88', '106.00', '4598900', '106.00'],
['2004-08-24', '111.24', '111.60', '103.57', '104.87', '7631300', '104.87'],
['2004-08-23', '110.75', '113.48', '109.05', '109.40', '9137200', '109.40'],
['2004-08-20', '101.01', '109.08', '100.50', '108.31', '11428600', '108.31'],
['2004-08-19', '100.00', '104.06', '95.96', '100.34', '22351900', '100.34']
]
I am attempting to pull certain multiple values from within each list within the 'long' list. I need to use beginning python techniques. For instance, we haven't learned lambda in the class as of yet. MUST use beginning techniques.
as of right now the lines using i[][] are giving me a type error saying that 'int' is not subscriptable.
Your variable i is an integer. You should be indexing into original_list and not i.
I think you wwant
month_list = original_list[i][0][0:7]
volume_str = original_list[i][5]
#print(volume_str)
adj_close_str = original_list[i][6]
Don't use range to iterate over lists. Do this:
for datestr, n1, n2, n3, someval, otherval in original_list:
#do your stuff here
This will iterate over every list in original_list, and assign the 6 elements of each such list to the variables given.

PYTHON problem with negative decimals

I have a list of negative floats. I want to make a histogram with them. As far as I know, Python can't do operations with negative numbers. Is this correct? The list is like [-0.2923998, -1.2394875, -0.23086493, etc.]. I'm trying to find the maximum and minimum number so I can find out what the range is. My code is giving an error:
setrange = float(maxv) - float(minv)
TypeError: float() argument must be a string or a number
And this is the code:
f = open('clusters_scores.out','r')
#first, extract all of the sim values
val = []
for line in f:
lineval = line.split()
print lineval
val.append(lineval)
print val
#val = map(float,val)
maxv = max(val)
minv = min(val)
setrange = float(maxv) - float(minv)
All the values that are being put into the 'val' list are negative decimals. What is the error referring to, and how do I fix it?
The input file looks like:
-0.0783532095182 -0.99415440702 -0.692972552716 -0.639273674023 -0.733029194040.765257900121 -0.755438339963
-0.144140594077 -1.06533353638 -0.366278118372 -0.746931508538 -1.02549039392 -0.296715961215
-0.0915937502791 -1.68680560936 -0.955147543358
-0.0488457137771 -0.0943080192383 -0.747534412969 -1.00491121699
-1.43973471463
-0.0642611118901 -0.0910684525497
-1.19327387414 -0.0794696449245
-1.00791366035 -0.0509749096549
-1.08046507281 -0.957339914505 -0.861495748259
The results of split() are a list of split values, which is probably why you are getting that error.
For example, if you do '-0.2'.split(), you get back a list with a single value ['-0.2'].
EDIT: Aha! With your input file provided, it looks like this is the problem: -0.733029194040.765257900121. I think you mean to make that two separate floats?
Assuming a corrected file like this:
-0.0783532095182 -0.99415440702 -0.692972552716 -0.639273674023 -0.733029194040 -0.765257900121 -0.755438339963
-0.144140594077 -1.06533353638 -0.366278118372 -0.746931508538 -1.02549039392 -0.296715961215
-0.0915937502791 -1.68680560936 -0.955147543358
-0.0488457137771 -0.0943080192383 -0.747534412969 -1.00491121699
-1.43973471463
-0.0642611118901 -0.0910684525497
-1.19327387414 -0.0794696449245
-1.00791366035 -0.0509749096549
-1.08046507281 -0.957339914505 -0.861495748259
The following code will no longer throw that exception:
f = open('clusters_scores.out','r')
#first, extract all of the sim values
val = []
for line in f:
linevals = line.split()
print linevals
val += linevals
print val
val = map(float, val)
maxv = max(val)
minv = min(val)
setrange = float(maxv) - float(minv)
I have changed it to take the list result from split() and concatenate it to the list, rather than append it, which will work provided there are valid inputs in your file.
All the values that are being put into the 'val' list are negative decimals.
No, they aren't; they're lists of strings that represent negative decimals, since the .split() call produces a list. maxv and minv are lists of strings, which can't be fed to float().
What is the error referring to, and how do I fix it?
It's referring to the fact that the contents of val aren't what you think they are. The first step in debugging is to verify your assumptions. If you try this code out at the REPL, then you could inspect the contents of maxv and minv and notice that you have lists of strings rather than the expected strings.
I assume you want to put all the lists of strings (from each line of the file) together into a single list of strings. Use val.extend(lineval) rather than val.append(lineval).
That said, you'll still want to map the strings into floats before calling max or min because otherwise you will be comparing the strings as strings rather than floats. (It might well work, but explicit is better than implicit.)
Simpler yet, just read the entire file at once and split it; .split() without arguments splits on whitespace, and a newline is whitespace. You can also do the mapping at the same point as the reading, with careful application of a list comprehension. I would write:
with open('clusters_scores.out') as f:
val = [float(x) for x in f.read().split()]
result = max(val) - min(val)

Categories

Resources