Positional index of list elements using Python - python

I'm looking to create a positional_index function that takes in two or more lists as argument and should return the doc_id and the position of the word that occurs in various lists.
ex:
index = create_positional_index([['a', 'b', 'a'], ['a', 'c']])
index['a']
[[0, 0, 2], [1, 0]]
index['b']
[[0, 1]]
index['c']
[[1, 1]]
Here the first '0' represents the doc_id followed by the positions at which 'a' appears in document 0. since 'a' appears in both the documents we've got two lists. 'b' appears only in document 0 and 'c' appears only in document 1..
can anyone help me write this function ?

I would suggest using dict's for doc_id.
EDIT: Changed function to OPs output format.
class create_positional_index():
def __init__(self, lst):
self.lst = lst
def __getitem__(self, elm):
return [[doc_id] + [pos for pos, key in enumerate(sub_lst) if key==elm] for doc_id, sub_lst in enumerate(self.lst) if elm in sub_lst]
index = create_positional_index([['a', 'b', 'a'], ['a', 'c']])
print index['a']
print index['b']
print index['c']
def create_positional_index2(lst, elm):
return [[doc_id] + [pos for pos, key in enumerate(sub_lst) if key==elm] for doc_id, sub_lst in enumerate(lst) if elm in sub_lst]
print create_positional_index2([['a', 'b', 'a'], ['a', 'c']], 'a')
print create_positional_index2([['a', 'b', 'a'], ['a', 'c']], 'b')
print create_positional_index2([['a', 'b', 'a'], ['a', 'c']], 'c')

You can use the following function :
>>> from itertools import chain
>>> sample_list = [['a', 'b', 'a'], ['a', 'c']]
>>> def find_index(s_list) :
... for elem in set(chain(*s_list)) :
... yield {elem:[[i]+[t for t,k in enumerate(j) if k==elem] for i,j in enumerate(s_list)]}
...
>>> list(find_index(sample_list)
... )
[{'a': [[0, 0, 2], [1, 0]]}, {'c': [[0], [1, 1]]}, {'b': [[0, 1], [1]]}]
All stuff that you need here is using enumerate within two list comprehension . note that set(chain(*s_list)) create a set of unique elements of all your sub_lists .

I found it fairly difficult to understand your question. However in spite of of that and after looking at the other answers along with your comments, I think the following would be a good way to accomplish want your goal. It defines a subclass of dict to hold the index, so you can also use regular dictionary methods such as keys(), items(), or update on any instances created if you want.
from itertools import chain
class PositionalIndex(dict):
def __init__(self, *lists):
self.update(
(doc_id, [[i]+[j for j, k in enumerate(sublist) if k == doc_id]
for i, sublist in enumerate(lists)])
for doc_id in set(chain.from_iterable(lists)))
index = PositionalIndex(['a', 'b', 'a'], ['a', 'c'])
for doc_id in sorted(index):
print('index[{!r}] --> {}'.format(doc_id, index[doc_id]))
Output:
index['a'] --> [[0, 0, 2], [1, 0]]
index['b'] --> [[0, 1], [1]]
index['c'] --> [[0], [1, 1]]

Related

Manipulating a list of lists multiple times in order

I'm trying to alter a list of lists in multiple ways by using a function (as I will have more than one list of lists).
I know how to change something once, but how do I do more than that? I get the error:
AttributeError: 'int' object has no attribute 'insert'
I understand that the error essentially means (whatever I'm trying to use .insert() on is not a list) but I don't quite understand why it's not a list...
See my code below:
This works and gives me the desired output
list_of_list3 = [['a', 1], ['b', 2], ['c', 3]]
list_to_add = ['Z', 'X', 'Y']
for list_position in range(len(list_of_list3)):
original_list = list_of_list3[list_position]
element_to_add = list_to_add[list_position]
original_list.insert(0, element_to_add)
print(list_of_list3)
This will give me what I want:
[['Z', 'a', 1], ['X', 'b', 2], ['Y', 'c', 3]]
However, what I need is a function which does more than one thing at once. I am trying the code below:
def output_function(add_list, list_of_list):
for list_position in range(len(list_of_list)):
list_within_list = cleaned_list[list_position]
add_element1 = add_list[list_position] # The two lists will always have the same length
list_within_list = list_within_list.pop() # I want to remove the last element
list_with_element1 = list_within_list.insert(0, add_element1) # I then want to add a new element
list_with_new_list = list_with_element1.insert(0, ['Column1', 'Column2', 'Column3']) #Then I want to add a new list to the beginning of list of lists
new_elements = ['A', 'B', 'C']
original_list_list = [['D', 1, 2], ['E', 3, 4], ['F', 5, 6']
output_function(new_elements, original_list_list)
My desired output is (ultimately will turn this into a pandas df)
[['Column1', 'Column2', 'Column3'], ['A', 'D', 1], ['B','E', 3], ['C', 'F', 5]]
Any help is appreciated. Thanks!
I believe you are having some misunderstanding with the methods you are calling.
Your comments indicate you are going to throw away the popped element, but you are actually throwing away the list, and using the element instead.
These 2 lines:
list_within_list = list_within_list.pop() # I want to remove the last element
list_with_element1 = list_within_list.insert(0, add_element1) # I then want to add a new element
One way to accomplish:
list_with_element1 = list_within_list[:-1]
list_with_element1.insert(0, add_element1)

Replacing an element in a list with multiple elements

I am trying to modify a list of two lists. For each of the two inside lists, I perform some operation and 'split' them into new lists.
Here is a simple example of what I'm trying to do:
[['a', 'b'], ['c', 'd']] --> [['a'], ['b'], ['c', 'd']]
Currently my algorithm passes ['a', 'b'] to a function that determines whether or not it should be split into [['a'], ['b']] (e.g. based on their correlations). The function returns [['a'], ['b']] which tells me that ['a', 'b'] should be split, or returns ['a', 'b'] (the original list) which indicates that it should not be split.
Currently I have something like this:
blist = [['a', 'b'], ['c', 'd']] #big list
slist = [['a'], ['b']] #small list returned by function
nlist = [items for i in xrange(len(blist)) for items in (slist if i==0 else blist[i])]
This produces [['a'], ['b'], 'c', 'd'] as opposed to the desired output [['a'], ['b'], ['c', 'd']] which does not alter the second list in the original blist. I understand why this is happening--my second loop is also applied to blist[1] in this case, but I am not sure how to fix it as I do not understand list comprehension completely.
A 'pythonic' solution is preferred.
Any feedback would be appreciated, thank you!
EDIT: Like the title suggests, I am trying to 'replace' ['a', 'b'] with ['a'], ['b']. So I would like the 'position' to be the same, having ['a'], ['b'] appear in the original list before ['c', 'd']
RESULTS
Thank you Christian, Paul and schwobaseggl for your solutions! They all work :)
Try
... else [blist[i]])]
to create a list of lists.
You can use slice assignment:
>> l1 = [[1, 2], [3, 4]]
>>> l2 = [[1], [2]]
>>> l1[0:1] = l2
>>> l1
[[1], [2], [3, 4]]
This changes l1, so if you want to keep it make a copy before.
Another way that doesn't change l1 is addition:
>> l1 = [[1, 2], [3, 4]]
>>> l3 = l2 + l1[1:]
>>> l3
[[1], [2], [3, 4]]
You could alter your split function to return structurally adequate lists. Then you can use a comprehension:
def split_or_not(l):
if condition: # split
return [l[:1], l[1:]]
return [l] # wrap in extra list
# using map
nlist = [x for sub_l in map(split_or_not, blist) for x in sub_l]
# or nested comprehension
nlist = [x for sub_l in (split_or_not(l) for l in blist) for x in sub_l]
Assuming you have the mentioned funtion that decides whether to split an item:
def munch(item):
if item[0] == 'a': # split
return [[item[0]], [item[1]]]
return [item] # don't split
You can use it in s simple for-loop.
nlist = []
for item in blist:
nlist.extend(munch(item))
"Pythonic" is whatever is easy to read and understand. Don't use list comprehensions just because you can.

Finding common elements in list in python

Finding common elements in list in python?
Imagine if i have a list like follows
[[a,b],[a,c],[b,c],[c,d],[e,f],[f,g]]
My output must be
[a,b,c,d]
[e,f,g]
How do i do it?
What i tried is like this
for i in range(0,len(fin3)):
for j in range(i+1,len(fin3)):
grop = []
grop = list(set(fin3[i]) & set(fin3[j]))
if len(grop)>0:
grop2 = []
grop2.append(link[i])
grop2.append(link[j])
grop3.append(grop2)
Thanks in advance...
I think you want something like:
data = [[1, 2], [2, 3], [4, 5]]
output = []
for item1, item2 in data:
for item_set in output:
if item1 in item_set or item2 in item_set:
item_set.update((item1, item2))
break
else:
output.append(set((item1, item2)))
output = map(list, output)
This gives:
output == [[1, 2, 3], [4, 5]]
If you want to find common elements even if lists are no adjacent and if the order in the result doesn't matter:
def condense_sets(sets):
result = []
for candidate in sets:
for current in result:
if candidate & current: # found overlap
current |= candidate # combine (merge sets)
# new items from candidate may create an overlap
# between current set and the remaining result sets
result = condense_sets(result) # merge such sets
break
else: # no common elements found (or result is empty)
result.append(candidate)
return result
Example:
>>> data = [['a','b'], ['a','c'], ['b','c'], ['c','d'], ['e','f'], ['f','g']]
>>> map(list, condense_sets(map(set, data)))
[['a', 'c', 'b', 'd'], ['e', 'g', 'f']]
See Replace list of list with “condensed” list of list while maintaining order.
As was noted in a comment above, it looks like you want to do set consolidation.
Here's a solution I adapted from code at the link in that comment above.
def consolidate(seq):
if len(seq) < 2:
return seq
result, tail = [seq[0]], consolidate(seq[1:])
for item in tail:
if result[0].intersection(item):
result[0].update(item)
else:
result.append(item)
return result
def main():
sets = [set(pair) for pair in [['a','b'],['a','c'],['b','c'],['c','d'],['e','f'],['f','g']]]
print("Input: {0}".format(sets))
result = consolidate(sets)
print("Result: {0}".format(result))
if __name__ == '__main__':
main()
Sample output:
Input: [set(['a', 'b']), set(['a', 'c']), set(['c', 'b']), set(['c', 'd']), set(['e', 'f']), set(['g', 'f'])]
Result: [set(['a', 'c', 'b', 'd']), set(['e', 'g', 'f'])]
Another approach, which looks about as (in)efficient -- O(n^2) where n = number of items. It's not quite elegant, but it's correct. The following function returns a set of (hashable) frozensets if you supply the value True for the named argument return_sets, otherwise it returns a list of lists (the default, as your question indicates that's what you really want):
def create_equivalence_classes(relation, return_sets=False):
eq_class = {}
for x, y in relation:
# Use tuples of x, y in case either is a string of length > 1 (iterable),
# and so that elements x, y can be noniterables such as ints.
eq_class_x = eq_class.get(x, frozenset( (x,) ))
eq_class_y = eq_class.get(y, frozenset( (y,) ))
join = eq_class_x.union(eq_class_y)
for u in eq_class_x:
eq_class[u] = join
for v in eq_class_y:
eq_class[v] = join
set_of_eq_classes = set(eq_class.values())
if return_sets:
return set_of_eq_classes
else:
return list(map(list, set_of_eq_classes))
Usage:
>>> data = [['a','b'], ['a','c'], ['b','c'], ['c','d'], ['e','f'], ['f','g']]
>>> print(create_equivalence_classes(data))
[['d', 'c', 'b', 'a'], ['g', 'f', 'e']]
>>> print(create_equivalence_classes(data, return_sets=False))
{frozenset({'d', 'c', 'b', 'a'}), frozenset({'g', 'f', 'e'})}
>>> data1 = [['aa','bb'], ['bb','cc'], ['bb','dd'], ['fff','ggg'], ['ggg','hhh']]
>>> print(create_equivalence_classes(data1))
[['bb', 'aa', 'dd', 'cc'], ['hhh', 'fff', 'ggg']]
>>> data2 = [[0,1], [2,3], [0,2], [16, 17], [21, 21], [18, 16]]
>>> print(create_equivalence_classes(data2))
[[21], [0, 1, 2, 3], [16, 17, 18]]

How to reorganize sublists and exclude specific indexes in those sublists?

How can I reorganize sublists and exclude certain items from sublists to create a new list of sublists?
By reorganize I mean that I want to change the order of the items within each sublists across each sublist. For example moving every element at index 0 to index 1, and moving every element in index 2 to index 0 across every sublist. At the same time, I don't want to include index 1 in the original list of sublists.
Original_List = [['a','b','c'],['a','b','c'],['a','b','c']]
Desired_List = [['c','a'],['c','a'],['c','a']]
I currently have this function, which rearranges and pulls out different indexes from a sublist.
def Function(li):
return map(lambda x: (x[2] + "|" + x[0]).split("|"),li)
However, there are situations in which the sublists are much longer and there are more indexes that I want to pull out.
Rather than making this same function for 3 or 4 indexes like this for example:
def Function(li):
return map(lambda x: (x[2] + "|" + x[1] + "|" + x[0]).split("|"),li)
I'd like to use the *args, so that I can specify different amounts of indexes of the sublists to pull out. This is what I have so far, but I get a TypeError.
def Function(self,li,*args):
return map(lambda x: ([int(arg) + "|" for arg in args]).split("|"))
I get a TypeError, which I can understand but can't get around:
TypeError: string indices must be integers, not str
Perhaps there is a better and faster method entirely to rearrange sublists and exclude certain items within those sublists?
Also, it would be amazing if the function could deal with sub-sub-lists like this.
Original_List = [['a','b','c',['1','2','3']],['a','b','c',['1','2','3']],['a','b','c',['1','2','3']]]
Inputs that I'd like to achieve this:
[2] for c
[0] for a
[3][1] for '2'
Desired_List = [['c','a','2'],['c','a','2'],['c','a','2']]
I think what you are describing is this:
def sublist_indices(lst, *args):
return [[l[i] for i in args] for l in lst]
>>> sublist_indices([[1, 2, 3], [4, 5, 6]], 2, 0)
[[3, 1], [6, 4]]
If your sublists and sub-sublists contain all iterable items (e.g. strings, lists), you can use itertools.chain.from_iterable to flatten the sub-sublists, and then index in:
from itertools import chain
def sublists(lst, *args):
return [[list(chain.from_iterable(l))[i] for i in args] for l in lst]
e.g.
>>> lst = [['a', 'b', 'c', ['1', '2', '3']],
['a', 'b', 'c', ['1', '2', '3']],
['a', 'b', 'c', ['1', '2', '3']]]
>>> sublists(lst, 2, 0, 4)
[['c', 'a', '2'], ['c', 'a', '2'], ['c', 'a', '2']]
original = [['a','b','c'],['a','b','c'],['a','b','c']]
desired = [['c','a'],['c','a'],['c','a']]
def filter_indices(xs, indices):
return [[x[i] for i in indices if i < len(x)] for x in xs]
filter_indices(original, [2, 0])
# [['c', 'a'], ['c', 'a'], ['c', 'a']]
filter_indices(original, [2, 1, 0])
# [['c', 'b', 'a'], ['c', 'b', 'a'], ['c', 'b', 'a']]
I'm not sure what exactly you mean by "reorganize", but this nested list comprehension will take in a list of lists li and return a new list which contains the lists in li, but with the indices in args excluded.
def exclude_indices(li, *args):
return [[subli[i] for i in range(len(subli)) if i not in args] for subli in li]

Confused by chain enumeration

I wanted to gather all of the header files in a list of subdirectories. However, if I do
from glob import glob
from itertools import chain
subDirs = ['FreeRTOS', 'Twig']
for each in chain(glob(eachDir+'/*.h') for eachDir in subDirs):
print each
What I get is
['FreeRTOS/croutine.h', 'FreeRTOS/FreeRTOS.h', 'FreeRTOS/FreeRTOSConfig.h', 'FreeRTOS/list.h', 'FreeRTOS/mpu_wrappers.h', 'FreeRTOS/portable.h', 'FreeRTOS/portmacro.h', 'FreeRTOS/projdefs.h', 'FreeRTOS/queue.h', 'FreeRTOS/semphr.h', 'FreeRTOS/StackMacros.h', 'FreeRTOS/task.h', 'FreeRTOS/timers.h']
['Twig/twig.h']
But what I wanted to see was
'FreeRTOS/croutine.h'
'FreeRTOS/FreeRTOS.h'
'FreeRTOS/FreeRTOSConfig.h'
'FreeRTOS/list.h'
'FreeRTOS/mpu_wrappers.h'
'FreeRTOS/portable.h'
'FreeRTOS/portmacro.h'
'FreeRTOS/projdefs.h'
'FreeRTOS/queue.h'
'FreeRTOS/semphr.h'
'FreeRTOS/StackMacros.h'
'FreeRTOS/task.h'
'FreeRTOS/timers.h'
'Twig/twig.h'
I thought that was what the chain() would do for me. What am I missing?
I think you are looking for itertools.chain.from_iterable:
import os
import glob
import itertools
for each in itertools.chain.from_iterable(
glob.glob(os.path.join(eachDir,'/*.h'))
for eachDir in subDirs):
print each
It flattens an iterable of iterables:
In [6]: import itertools as IT
In [7]: list(IT.chain.from_iterable([['a', 'b', 'c'], [1, 2, 3]]))
Out[7]: ['a', 'b', 'c', 1, 2, 3]
Probably slower than unutbu's answer, but this would also work:
import collections
def flattenIter(l):
"""
Iterator that flattens a list by one each iteration
To get a fully flattened list of L do:
list(flattenIter(L))
"""
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
for sub in flattenIter(el):
yield sub
else:
yield el
Note that compared to unutbu's answer this flattens any amount of lists in list as opposed to his method. Example:
print list(IT.chain.from_iterable([['a', 'b', 'c'], [1, 2, 3],"aap",["beer",[1]]]))
# ['a', 'b', 'c', 1, 2, 3, 'a', 'a', 'p', 'beer', [1]]
print list(flattenIter([['a', 'b', 'c'], [1, 2, 3],1,"aap",["beer",[1]]]))
# ['a', 'b', 'c', 1, 2, 3, 1, 'aap', 'beer', 1]
Same 'out' as the unutbu's answer.
In : list(flattenIter([['a', 'b', 'c'], [1, 2, 3]]))
Out: ['a', 'b', 'c', 1, 2, 3]
Here is a speed comparison with unutbu's version:
import timeit
import collections
import time
import itertools as IT
class Timer:
def __enter__(self):
self.start = time.clock()
return self
def __exit__(self, *args):
self.end = time.clock()
self.interval = self.end - self.start
def flattenIter(l):
"""
Iterator that flattens a list by one each iteration
To get a fully flattened list of L do:
list(flattenIter(L))
"""
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
for sub in flattenIter(el):
yield sub
else:
yield el
with Timer() as t:
for x in xrange(10000):
list(IT.chain.from_iterable([['a', 'b', 'c'], [1, 2, 3]]))
print t.interval
# result: 0.0220727116414
with Timer() as t:
for x in xrange(10000):
list(flattenIter([['a', 'b', 'c'], [1, 2, 3]]))
print t.interval
# result: 0.147218201587
Also check out:
Making a flat list out of list of lists in Python
It's marked as duplicate though it has some other solid links in there as well. ;)

Categories

Resources