Python List Index Out of Range nested list - python

I have a nested list main_category, each nested list is a unicode string of business names. The first five lines of the nested lists are below:
[[u'Medical Centers', u'Health and Medical'],
[u'Massage', u'Beauty and Spas'],
[u'Tattoo', u'Beauty and Spas'],
[u'Music & DVDs', u'Books, Mags, Music and Video', u'Shopping'],
[u'Food', u'Coffee & Tea']]
So I want to get the first element of every list, and I have tried list comprehension, zip, but nothing works.
new_cate = [d[0] for d in main_category]
lst = zip(*main_category)[0]
But all of them give me
IndexErrorTraceback (most recent call last)
<ipython-input-49-4a397c8e62fd> in <module>()
----> 1 lst = zip(*main_category)[0]
IndexError: list index out of range
I really don't know what is wrong with this. So could anyone help? Thanks so much!

The error indicates one/some of the sublists in the full list are empty lists. You need to properly handle that. You can put a ternary operator in the list comprehension to substitute a default value when the list empty is and index the first item when it isn't:
default = ''
new_cate = [d[0] if d else default for d in main_category]
# ^^^^-> test if list is truthy
You can also replicate this fix for zip by using it's itertools variant izip_longest which allows you to set a fillvalue:
from itertools import izip_longest
default = ''
lst = list(izip_longest(*main_category, fillvalue=default))[0]

So you have a list of lists.
for content in matrix:
At each iteration content will return a full list. [u'Medical Centers', u'Health and Medical'] for example.
If you print(content[0]), you will get the first value of the current list, which would be u'Medical Centers'.
If there's a list without content in the matrix, print(content[0]) would raise IndexError, so you need to check if the current list is not None with if content:.
matrix = [[u'Medical Centers', u'Health and Medical'],
[u'Massage', u'Beauty and Spas'],
[u'Tattoo', u'Beauty and Spas'],
[u'Music & DVDs', u'Books, Mags, Music and Video', u'Shopping'],
[u'Food', u'Coffee & Tea']]
for content in matrix:
if content:
print(content[0])
>>> Medical Centers
>>> Massage
>>> Tattoo
>>> Music & DVDs
>>> Food

Related

Combining lists within a nested list, if lists contain the same element?

I have nested list that has a structure similar to this, except it's obviously much longer:
mylist = [ ["Bob", "12-01 2:30"], ["Sal", "12-01 5:23"], ["Jill", "12-02 1:28"] ]
My goal is to create another nested lists that combines all elements that have the same date. So, the following output is desired:
newlist = [ [["Bob", "12-01 2:30"], ["Sal", "12-01 5:23"]], [["Jill", "12-02 1:28"]] ]
Above, all items with the date 12-01, regardless of time, are combined, and all elements of 12-02 are combined.
I've sincerely been researching how to do this for the past 1 hour and can't find anything. Furthermore, I'm a beginner at programming, so I'm not skilled enough to try to create my own solution. So, please don't think that I haven't attempted to do research or put any effort into trying this problem myself. I'll add a few links as examples of my research below:
Collect every pair of elements from a list into tuples in Python
Create a list of tuples with adjacent list elements if a condition is true
How do I concatenate two lists in Python?
Concatenating two lists of Strings element wise in Python without Nested for loops
Zip two lists together based on matching date in string
How to merge lists into a list of tuples?
Use dict or orderdict(if the sort is important) group data by the date time .
from collections import defaultdict # use defaultdict like {}.setdefault(), it's very facility
mylist = [["Bob", "12-01 2:30"], ["Sal", "12-01 5:23"], ["Jill", "12-02 1:28"]]
record_dict = defaultdict(list)
# then iter the list group all date time.
for data in mylist:
_, time = data
date_time, _ = time.split(" ")
record_dict[date_time].append(data)
res_list = list(record_dict.values())
print(res_list)
output:
[[['Bob', '12-01 2:30'], ['Sal', '12-01 5:23']], [['Jill', '12-02 1:28']]]
A pure list-based solution as an alternative to the accepted dictionary-based solution. This offers the additional feature of easily sorting the whole list, first by date, then by hour, then by name
from itertools import groupby
mylist = [["Bob", "12-01 2:30"], ["Sal", "12-01 5:23"], ["Jill", "12-02 1:28"]]
newlist = [dt.split() + [name] for (name, dt) in mylist]
newlist.sort() # can be removed if inital data is already sorted by date
newlist = [list(group) for (date, group) in groupby(newlist, lambda item:item[0])]
# result:
# [[['12-01','2:30','Bob'], ['12-01','5:23','Sal']], [['12-02','1:28','Jill']]]
If you really want the same item format as the initial list, it requires a
double iteration:
newlist = [[[name, date + ' ' + time] for (date, time, name) in group]
for (date, group) in groupby(newlist, lambda item:item[0])]
# result:
# [[['Bob', '12-01 2:30'], ['Sal', '12-01 5:23']], [['Jill', '12-02 1:28']]]
If you don't mind going heavy on your memory usage, you can try using a dictionary. You can use the date as the key and make a list of values.
all_items = {}
for line in myList:
x, y = line
date, time = y.split()
try:
all_items[date].append(line)
except:
all_items[date] = [line,]
Then, you can create a new list using the sorted date for keys.
If all of the elements with the same date are consecutive, you can use itertools.groupby:
list(map(list, groupby(data, lambda value: ...)))

iter through the dict store the key value and iter again to look for similar word in dict and delete form dict eg(Light1on,Light1off) in Python

[I had problem on how to iter through dict to find a pair of similar words and output it then the delete from dict]
My intention is to generate a random output label then store it into dictionary then iter through the dictionary and store the first key in the list or some sort then iter through the dictionary to search for similar key eg Light1on and Light1off has Light1 in it and get the value for both of the key to store into a table in its respective columns.
such as
Dict = {Light1on,Light2on,Light1off...}
store value equal to Light1on the iter through the dictionary to get eg Light1 off then store its Light1on:value1 and Light1off:value2 into a table or DF with columns name: On:value1 off:value2
As I dont know how to insert the code as code i can only provide the image sry for the trouble,its my first time asking question here thx.
from collections import defaultdict
import difflib, random
olist = []
input = 10
olist1 = ['Light1on','Light2on','Fan1on','Kettle1on','Heater1on']
olist2 = ['Light2off','Kettle1off','Light1off','Fan1off','Heater1off']
events = list(range(input + 1))
for i in range(len(olist1)):
output1 = random.choice(olist1)
print(output1,'1')
olist1.remove(output1)
output2 = random.choice(olist2)
print(output2,'2')
olist2.remove(output2)
olist.append(output1)
olist.append(output2)
print(olist,'3')
outputList = {olist[i]:events[i] for i in range(10)}
print (str(outputList),'4')
# Iterating through the keys finding a pair match
for s in range(5):
for i in outputList:
if i == list(outputList)[0]:
skeys = difflib.get_close_matches(i, outputList, n=2, cutoff=0.75)
print(skeys,'5')
del outputList[skeys]
# Modified Dictionary
difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animaltion'])
['animal']
Updated: I was unable to delete the pair of similar from the list(Dictionary) after founding par in the dictionary
You're probably getting an error about a dictionary changing size during iteration. That's because you're deleting keys from a dictionary you're iterating over, and Python doesn't like that:
d = {1:2, 3:4}
for i in d:
del d[i]
That will throw:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around that, one solution is to store a list of the keys you want to delete, then delete all those keys after you've finished iterating:
keys_to_delete = []
d = {1:2, 3:4}
for i in d:
if i%2 == 1:
keys_to_delete.append(i)
for i in keys_to_delete:
del d[i]
Ta-da! Same effect, but this way avoids the error.
Also, your code above doesn't call the difflib.get_close_matches function properly. You can use print(help(difflib.get_close_matches)) to see how you are meant to call that function. You need to provide a second argument that indicates the items to which you wish to compare your first argument for possible matches.
All of that said, I have a feeling that you can accomplish your fundamental goals much more simply. If you spend a few minutes describing what you're really trying to do (this shouldn't involve any references to data types, it should just involve a description of your data and your goals), then I bet someone on this site can help you solve that problem much more simply!

Finding the same second elements in nested lists - recursive function

I have nested lists looks like this;
[['CELTIC AMBASSASDOR', 'Warrenpoint'],['HAV SNAPPER', 'Silloth'],['BONAY', 'Antwerp'],['NINA', 'Antwerp'],['FRI SKIEN', 'Warrenpoint']]
and goes on. How can I find the lists that have same second elements, for example
['CELTIC AMBASSASDOR', 'Warrenpoint']
['FRI SKIEN', 'Warrenpoint']
['BONAY', 'Antwerp']
['NINA', 'Antwerp']
The list is too long (I'm reading it from a .csv file) and I can't determine to search which thing exactly (eg: I can't search for 'Antwerp' to find all Antwerps because I don't know all of the texts in csv file), so I thought I need a recursive function that will search until find the all nested lists seperated by second items. Couldn't figure out how to make the recursive function, if anyone has a better solution, much appreciated.
There's no need to use recursion here. Create a dictionary with a key of the second element and values of the whole sublist, then create a result that only includes the matches you're interested in:
import collections
l = [['CELTIC AMBASSASDOR', 'Warrenpoint'],['HAV SNAPPER', 'Silloth'],['BONAY', 'Antwerp'],['NINA', 'Antwerp'],['FRI SKIEN', 'Warrenpoint']]
d = collections.defaultdict(list)
for item in l:
d[item[1]].append(item)
result = dict(item for item in d.items() if len(d[item[0]]) > 1)
Result:
>>> import pprint
>>> pprint.pprint(result)
{'Antwerp': [['BONAY', 'Antwerp'], ['NINA', 'Antwerp']],
'Warrenpoint': [['CELTIC AMBASSASDOR', 'Warrenpoint'],
['FRI SKIEN', 'Warrenpoint']]}
filter(lambda x:x[1] in set(filter(lambda x:zip(*l)[1].count(x)==2,zip(*l)[1])),l)

Python List comprehension and JSON parsing

I'm new to Python and trying to figure out the best way to parse the values of a JSON object into an array, using a list comprehension.
Here is my code - I'm querying the publicly available iNaturalist API and would like to take the JSON object that it returns, so that I take specific parts of the JSON object into a bumpy array:
import json
import urllib2
#Set Observations URL request for Resplendent Quetzal of Costa Rica
query = urllib2.urlopen("http://api.inaturalist.org/v1/observations?place_id=6924&taxon_id=20856&per_page=200&order=desc&order_by=created_at")
obSet = json.load(query)
#Print out Lat Long of observation
n = obSet['total_results']
for i in range(n) :
print obSet['results'][i]['location']
This all works fine and gives the following output:
9.5142456535,-83.8011438905
10.2335478381,-84.8517773638
10.3358965682,-84.9964271008
10.3744851815,-84.9871494128
10.2468720343,-84.9298072822
...
What I'd like to do next is replace the for loop with a list comprehension, and store the location value in a tuple. I'm struggling with the syntax in that I'm guessing it's something like this:
[(long,lat) for i in range(n) for (long,lat) in obSet['results'][i]['location']]
But this doesn't work...thanks for any help.
obSet['results'] is a list, no need to use range to iterate over it:
for item in obSet['results']:
print(item['location'])
To make this into list comprehension you can write:
[item['location'] for item in obSet['results']]
But, each location is coded as a string, instead of list or tuple of floats. To get it to the proper format, use
[tuple(float(coord) for coord in item['location'].split(','))
for item in obSet['results']]
That is, split the item['location'] string into parts using , as the delimiter, then convert each part into a float, and make a tuple of these float coordinates.
The direct translation of your code into a list comprehension is:
positions = [obSet['results'][i]['location'] for i in range(obSet['total_results'])]
The obSet['total_results'] is informative but not needed, you could just loop over obSet['results'] directly and use each resulting dictionary:
positions = [res['location'] for res in obSet['results']]
Now you have a list of strings however, as each 'location' is still the long,lat formatted string you printed before.
Split that string and convert the result into a sequence of floats:
positions = [map(float, res['location'].split(',')) for res in obSet['results']]
Now you have a list of lists with floating point values:
>>> [map(float, res['location'].split(',')) for res in obSet['results']]
[[9.5142456535, -83.8011438905], [10.2335478381, -84.8517773638], [10.3358965682, -84.9964271008], [10.3744851815, -84.9871494128], [10.2468720343, -84.9298072822], [10.3456659939, -84.9451804822], [10.3611732346, -84.9450302597], [10.3174360636, -84.8798676791], [10.325110706, -84.939710318], [9.4098152454, -83.9255607577], [9.4907141714, -83.9240819199], [9.562637289, -83.8170178428], [9.4373885911, -83.8312881263], [9.4766746409, -83.8120952573], [10.2651190176, -84.6360466565], [9.6572995298, -83.8322965118], [9.6997991784, -83.9076919066], [9.6811177044, -83.8487647156], [9.7416717045, -83.929327673], [9.4885099275, -83.9583968683], [10.1233252667, -84.5751029683], [9.4411815757, -83.824401543], [9.4202687169, -83.9550344212], [9.4620656621, -83.665183105], [9.5861809119, -83.8358881552], [9.4508914243, -83.9054016165], [9.4798058284, -83.9362558497], [9.5970449879, -83.8969131893], [9.5855562829, -83.8354434596], [10.2366179555, -84.854847472], [9.718459702, -83.8910277016], [9.4424384874, -83.8880459793], [9.5535916157, -83.9578166199], [10.4124554163, -84.9796942349], [10.0476688795, -84.298227929], [10.2129436252, -84.8384097435], [10.2052632717, -84.6053701877], [10.3835784147, -84.8677930134], [9.6079669672, -83.9084281155], [10.3583643315, -84.8069762134], [10.3975986735, -84.9196996767], [10.2060835381, -84.9698814407], [10.3322929317, -84.8805587129], [9.4756504472, -83.963818143], [10.3997876964, -84.9127311339], [10.1777433853, -84.0673088686], [10.3346128571, -84.9306278215], [9.5193346195, -83.9404786293], [9.421538224, -83.7689452093], [9.430427837, -83.9532672942], [10.3243212895, -84.9653175843], [10.021698503, -83.885674888]]
If you must have tuples rather than lists, add a tuple() call:
positions = [tuple(map(float, res['location'].split(',')))
for res in obSet['results']]
The latter also makes sure the expression works in Python 3 (where map() returns an iterator, not a list); you'd otherwise have to use a nested list comprehension:
# produce a list of lists in Python 3
positions = [[float(p) for p in res['location'].split(',')] for res in obSet['results']]
Another way to get list of [long, lat] without list comprehension:
In [14]: map(lambda x: obSet['results'][x]['location'].split(','), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you would like list of tuples instead:
In [14]: map(lambda x: tuple(obSet['results'][x]['location'].split(',')), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you want to convert to floats too:
In [17]: map(lambda x: tuple(map(float, obSet['results'][x]['location'].split(','))), range(obSet['total_results']))
Out[17]:
[(9.5142456535, -83.8011438905),
(10.2335478381, -84.8517773638),
(10.3358965682, -84.9964271008),
(10.3744851815, -84.9871494128),
(10.2468720343, -84.9298072822),
(10.3456659939, -84.9451804822),
...
You can iterate over the list of results directly:
print([tuple(result['location'].split(',')) for result in obSet['results']])
>> [('9.5142456535', '-83.8011438905'), ('10.2335478381', '-84.8517773638'), ... ]
[tuple(obSet['results'][i]['location'].split(',')) for i in range(n)]
This will return a list of tuple, elements of the tuples are unicode.
If you want that the elements of tuples as floats, do the following:
[tuple(map(float,obSet['results'][i]['location'].split(','))) for i in range(n)]
To correct way to get a list of tuples using list comprehensions would be:
def to_tuple(coords_str):
return tuple(coords_str.split(','))
output_list = [to_tuple(obSet['results'][i]['location']) for i in range(obSet['total_results'])]
You can of course replace to_tuple() with a lambda function, I just wanted to make the example clear. Moreover, you could use map() to have a tuple with floats instead of string: return tuple(map(float,coords_str.split(','))).
Let's try to give this a shot, starting with just 1 location:
>>> (long, lat) = obSet['results'][0]['location']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Alright, so that didn't work, but why? It's because the longitude and latitude coordinates are just 1 string, so you can't unpack it immediately as a tuple. We must first separate it into two different strings.
>>> (long, lat) = obSet['results'][0]['location'].split(",")
From here we will want to iterate through the whole set of results, which we know are indexed from 0 to n. tuple(obSet['results'][i]['location'].split(",")) will give us the tuple of longitude, latitude for the result at index i, so:
>>> [tuple(obSet['results'][i]['location'].split(",")) for i in range(n)]
ought to give us the set of tuples we want.

Access an element in a list of lists in python

I am new to python and am trying to access a single specific element in a list of lists.
I have tried:
line_list[2][0]
this one isn't right as its a tuple and the list only accepts integers.
line_list[(2, 0)]
line_list[2, 0]
This is probably really obvious but I just can't see it.
def rpd_truncate(map_ref):
#Munipulate string in order to get the reference value
with open (map_ref, "r") as reference:
line_list = []
for line in reference:
word_list = []
word_list.append(line[:-1].split("\t\t"))
line_list.append(word_list)
print line_list[2][0]
I get the exact same as if I used line_list[2]:
['Page_0', '0x00000000', '0x002DF8CD']
actually split will return a list
more over you don't require word_list variable
for line in reference:
line_list.append(line[:-1].split("\t\t"))
print line_list[2][0]

Categories

Resources