Merging sublists with same initial element by substituting null values

Merging sublists with same initial element by substituting null values - python

I want to merge sublists with the same initial first element, but instead of adding the other values one after the other, I want to substitute values where they are None.
I have a matrix with sublists, each sublist contains 7 values: number of the element, score A, score B, score C, score D, score E, score F. So far for each sublist there is only one value (even if this is the same for different sublists), but I want to merge sublists that contain different scores for the same element.
I have
sub_lists = [(1,None,None,12,None,None,None),
(2,67,None,None,None,None,None),
(2,None,None,83,None,None,None),
...]
So for each sublist there is only 1 score indicated while the others are null. The result I am looking for is
sub_lists = [(1,None,None,12,None,None,None),
(2,67,None,83,None,None,None),
...]
What I have tried is
res = []
for sub in sub_lists:
if res and res[-1][0] == sub[0]:
res[-1].extend(sub[1:])
else:
res.append([ele for ele in sub])
res = list(map(tuple, res))
But this only adds the values one after the other, resulting in
sub_lists = [(1,None,None,12,None,None,None),
(2,67,None,None,None,None,None,None,None,83,None,None,None),
...]
Does someone know how to help me with this?

adam-smooch
he was right I think
sub_lists=[
(1,None,None,12,None,None,None),
(2,67,None,None,None,None,None),
(2,None,None,83,None,None,None)
]
def my_combine(l1, l2):
l1 = list(l1)
l2 = list(l2)
for i in range(len(l1)):
if l1[i] is None:
l1[i] = l2[i]
return l1
results = dict()
for sl in sub_lists:
if sl[0] not in results:
results[sl[0]] = sl[1:]
else:
results[sl[0]] = my_combine(results[sl[0]], sl[1:])
print(results)
# desired outcome.
# sub_lists=[
# (1,None,None,12,None,None,None),
# (2,67,None,83,None,None,None)
# ]
I think #Adam Smooch was right I am modified it a bit

since sub-lists' first numbers will be unique at the end, you could use a dictionary.
so do something like:
def my_combine(l1, l2):
l1 = list(l1)
for i in range(len(l1)):
if l1[i] is None:
l1[i] = l2[i]
return tuple(l1)
results = dict()
for sl in sub_lists:
if sl[0] not in results:
results[sl[0]] = sl[1:]
else:
results[sl[0]] = my_combine(results[sl[0]], sl[1:])

sub_lists=[(1,None,None,12,None,None,None),(2,67,None,None,None,None,None),(2,None,None,83,None,None,None)]
res = [] #resultant matrix
for sub in sub_lists:
if res and res[-1][0] == sub[0]:
valid = [] # let "valid" valid sublist that can be e added in resultant matrix.
valid.append(res[-1][0]) # we will add the the first element of last sublist of resultant matrix, and rest of the elements of sub list of "sub_lists"
valid.extend(sub[1:])
res[-1] = valid
print(sub)
else:
res.append([ele for ele in sub])
res = list(map(tuple, res))
print(res)
# desired outcome.
# sub_lists=[(1,None,None,12,None,None,None),(2,67,None,83,None,None,None)]
hope you got the desirable code,
and my comments make sense.☺️
feel free to ask anything further...!

Related

How to create a nest list or a tree view from a flat list based on value condition?

I am working on a problem that given a flat list of strings. But based on the name of the string I would like to create either a nested list, a dictionary or a Node class. Anything that can have a tree structure
The string looks something like:
['big_1', 'small_1', 'item_1', 'item_2', 'big_2', 'small_2', 'item_3']
This should be turned into something like
['big_1', ['small_1', ['item_1', 'item_2']], 'big_2', ['small_2', ['item_3']]]
or a nested dictionary:
{'big_1': { 'small_1': ['item_1', 'item_2']}, 'big_2': {'small_2': ['item_3']}}
The example has 3 levels but it can be of any amount of levels. I tried something like this because that is not correct:
x = ['a_1', 'b_1', 'b_2', 'a_2', 'b_3', 'a_3', 'b_4', 'b_5', 'b_6']
def category(input):
return input.split('_')[0]
cats = list(dict.fromkeys([category(i) for i in x])) # order preserved unique set
results = {}
prev_cat = ''
result = []
index = 0
for item in x:
if index == 0:
result.append(item)
index += 1
prev_cat = category(item)
else:
curr_cat = category(item)
if curr_cat == prev_cat:
# Same category, append
result.append(item)
else:
result.append([item])
print(result)
It returns ['a_1', ['b_1'], ['b_2'], 'a_2', ['b_3'], 'a_3', ['b_4'], ['b_5'], ['b_6']]
Any suggestion please?

I think I found a way to achieve this using recursion.
x = ["big_1", "small_1", "item_1", "item_2", "big_2", "small_2", "item_3"]
def category(input):
return input.split("_")[0]
def parse(l):
out = []
cat = category(l[0])
sublist = []
for el in l:
if category(el) != cat:
sublist.append(el)
else:
if len(sublist) > 0:
out.append(parse(sublist))
sublist = []
out.append(el)
if len(sublist) > 0:
out.append(parse(sublist))
sublist = []
return out
print(parse(x))
Basically what I do is recursively call the parse function each time I find a different level. The code is just a test and can definitely be improved.

How to find the highest value element in a list with reference to a dictionary on python

How do I code a function in python which can:
iterate through a list of word strings which may contain duplicate words and referencing to a dictionary,
find the word with the highest absolute sum, and
output it along with the corresponding absolute value.
The function also has to ignore words which are not in the dictionary.
For example,
Assume the function is called H_abs_W().
Given the following list and dict:
list_1 = ['apples','oranges','pears','apples']
Dict_1 = {'apples':5.23,'pears':-7.62}
Then calling the function as:
H_abs_W(list_1,Dict_1)
Should give the output:
'apples',10.46
EDIT:
I managed to do it in the end with the code below. Looking over the answers, turns out I could have done it in a shorter fashion, lol.
def H_abs_W(list_1,Dict_1):
freqW = {}
for char in list_1:
if char in freqW:
freqW[char] += 1
else:
freqW[char] = 1
ASum_W = 0
i_word = ''
for a,b in freqW.items():
x = 0
d = Dict_1.get(a,0)
x = abs(float(b)*float(d))
if x > ASum_W:
ASum_W = x
i_word = a
return(i_word,ASum_W)

list_1 = ['apples','oranges','pears','apples']
Dict_1 = {'apples':5.23,'pears':-7.62}
d = {k:0 for k in list_1}
for x in list_1:
if x in Dict_1.keys():
d[x]+=Dict_1[x]
m = max(Dict_1, key=Dict_1.get)
print(m,Dict_1[m])

try this,
key, value = sorted(Dict_1.items(), key = lambda x : x[1], reverse=True)[0]
print(f"{key}, {list_1.count(key) * value}")
# apples, 10.46

you can use Counter to calculate the frequency(number of occurrences) of each item in the list.
max(counter.values()) will give us the count of maximum occurring element
max(counter, key=counter.get) will give the which item in the list is
associated with that highest count.
========================================================================
from collections import Counter
def H_abs_W(list_1, Dict_1):
counter = Counter(list_1)
count = max(counter.values())
item = max(counter, key=counter.get)
return item, abs(count * Dict_1.get(item))

How to find first value in a list having no duplicates?

l1 = ['A','B','C','D','A','B']
l2 = []
'C' is the first value in list l1, i want to create a function so that it returns C in l2.

In 3.6 and higher, this is very easy. Now that dicts preserve insertion order, collections.Counter can be used to efficiently count all elements in a single pass, then you can just scan the resulting Counter in order to find the first element with a count of 1:
from collections import Counter
l1 = ['A','B','C','D','A','B']
l2 = [next(k for k, v in Counter(l1).items() if v == 1)]
Work is strictly O(n), with only one pass of the input required (plus a partial pass of the unique values in the Counter itself), and the code is incredibly simple. In modern Python, Counter even has a C accelerator for counting inputs that pushes all the Counter construction work to the C layer, making it impossible to beat. If you want to account for the possibility that no such element exists, just wrap the l2 initialization to make it:
try:
l2 = [next(k for k, v in Counter(l1).items() if v == 1)]
except StopIteration:
l2 = []
# ... whatever else makes sense for your scenario ...
or avoid exception handling with itertools.islice (so l2 is 0-1 items, and it still short-circuits once a hit is found):
from itertools import islice
l2 = list(islice((k for k, v in Counter(l1).items() if v == 1), 1))

You can convert list to string and then compare index of each character from left and right using find and rfind functions of string. It stops counting as soon as the first match is found,
l1 = ['A','B','C','D','A','B']
def i_list(input):
l1 = ''.join(input)
for i in l1:
if l1.find(i) == l1.rfind(i):
return(i)
print(i_list(l1))
# output
C

An implementation using a defaultdict:
# Initialize
from collections import defaultdict
counts = defaultdict(int)
# Count all occurrences
for item in l1:
counts[item] += 1
# Find the first non-duplicated item
for item in l1:
if counts[item] == 1:
l2 = [item]
break
else:
l2 = []

As a follow up to ShadowRanger's answer, if you're using a lower version of Python, it's not that more complicated to filter the original list so that you don't have to rely on the ordering of the counter items:
from collections import Counter
l1 = ['A','B','C','D','A','B']
c = Counter(l1)
l2 = [x for x in l1 if c[x] == 1][:1]
print(l2) # ['C']
This is also O(n).

We can do it also with "numpy".
def find_first_non_duplicate(l1):
indexes_counts = np.asarray(np.unique(l1, return_counts=True, return_index=True)[1:]).T
(not_duplicates,) = np.where(indexes_counts[:, 1] == 1)
if not_duplicates.size > 0:
return [l1[np.min(indexes_counts[not_duplicates, 0])]]
else:
return []
print(find_first_non_duplicate(l1))
# output
['C']

A faster way to get the first unique element in a list:
Check each element one by one and store it in a dict.
Loop through the dict and check the first element whose count is
def countElement(a):
"""
returns a dict of element and its count
"""
g = {}
for i in a:
if i in g:
g[i] +=1
else:
g[i] =1
return g
#List to be processed - input_list
input_list = [1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4] #Input from user
try:
if input_list: #if list is not empty
print ([i for i,v in countElement(input_list).items() if v == 1][0]) #get the first element whose count is 1
else: #if list is empty
print ("empty list in Input")
except: #if list is empty - IndexError
print (f"Only duplicate values in list - {input_list}")

Delete all previous lists if a following list contains same element

For expample I have some lists:
[date1, time1, nickname1, point1 = 56.341708,43.948463]
[date2, time2, nickname2, point2 = 56.321795,43.9996]
[date3, time3, nickname3, point1 = 56.341708,43.948463]
[date4, time4, nickname4, point1 = 56.341708,43.948463]
[date5, time5, nickname5, point3 = 56.236278,43.960233]
[date6, time6, nickname7, point3 = 56.236278,43.960233]
I need to delete all previous lists if the following list have the same point.
Correct output should be:
[date2, time2, nickname2, point2 = 56.321795,43.9996]
[date4, time4, nickname4, point1 = 56.341708,43.948463]
[date6, time6, nickname7, point3 = 56.236278,43.960233]
My code removes some lists but it has the task not fully:
checked3 - list of lists
checked4 - list of points
Code:
r = -1
v = -2
k = -len(checked3)
try:
while v > k:
if str(checked4[r]) in checked3[v]:
checked3.pop(v)
print ('now', checked3)
v = v - 1
else:
print ('else', checked3)
r = r - 1
except:
pass
Could you help me please, how can I get Correct output?

This should work
distinctList = []
distinctDict = {}
for l in checked3:
point = l[-1] #last element of inside list
distinctDict[point] = l
for l in distinctDict:
distinctList.append(distinctDict[l])

Here's one approach:
Create a dictionary of points.
Walk the list from behind( since you want the last value retained)
Flag when you find an item's first occurrence
Delete when you find a duplicate point
For simplicity in explanation, I simulated with a list of lists that has strings for the inconsequential fields and tuples for points.
listOfLists = [
['date1', 'time1', 'nickname1', (56.341708,43.948463)],
['date2', 'time2', 'nickname2', (56.321795,43.9996)],
['date3', 'time3', 'nickname3', (56.341708,43.948463)],
['date4', 'time4', 'nickname4', (56.341708,43.948463)],
['date5', 'time5', 'nickname5', (56.236278,43.960233)],
['date6', 'time6', 'nickname7', (56.236278,43.960233)]
]
Using dictionary comprehension, create a record for each point
pointsDict = {item[3]:'Duplicates in list' for item in listOfLists}
Walk the list reversed, from behind. We flag for deleting by altering the value corresponding to the point to set it to 'Can delete'. When we encounter 'Can delete' as a value corresponding to the item in iteration, we delete it from the original list.
for item in listOfLists[::-1]:
point = item[3]
if pointsDict[point] == 'Duplicates in list':
pointsDict[point] = 'Can delete'
elif pointsDict[point] == 'Can delete':
listOfLists.pop(listOfLists.index(item))
At this point listOfLists contains what you want.

Is there a way to check if a list is a sublist of another list? [duplicate]

I want to write a function that determines if a sublist exists in a larger list.
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
#Should return true
sublistExists(list1, [1,1,1])
#Should return false
sublistExists(list2, [1,1,1])
Is there a Python function that can do this?

Let's get a bit functional, shall we? :)
def contains_sublist(lst, sublst):
n = len(sublst)
return any((sublst == lst[i:i+n]) for i in xrange(len(lst)-n+1))
Note that any() will stop on first match of sublst within lst - or fail if there is no match, after O(m*n) ops

If you are sure that your inputs will only contain the single digits 0 and 1 then you can convert to strings:
def sublistExists(list1, list2):
return ''.join(map(str, list2)) in ''.join(map(str, list1))
This creates two strings so it is not the most efficient solution but since it takes advantage of the optimized string searching algorithm in Python it's probably good enough for most purposes.
If efficiency is very important you can look at the Boyer-Moore string searching algorithm, adapted to work on lists.
A naive search has O(n*m) worst case but can be suitable if you cannot use the converting to string trick and you don't need to worry about performance.

No function that I know of
def sublistExists(list, sublist):
for i in range(len(list)-len(sublist)+1):
if sublist == list[i:i+len(sublist)]:
return True #return position (i) if you wish
return False #or -1
As Mark noted, this is not the most efficient search (it's O(n*m)). This problem can be approached in much the same way as string searching.

My favourite simple solution is following (however, its brutal-force, so i dont recommend it on huge data):
>>> l1 = ['z','a','b','c']
>>> l2 = ['a','b']
>>>any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True
This code above actually creates all possible slices of l1 with length of l2, and sequentially compares them with l2.
Detailed explanation
Read this explanation only if you dont understand how it works (and you want to know it), otherwise there is no need to read it
Firstly, this is how you can iterate over indexes of l1 items:
>>> [i for i in range(len(l1))]
[0, 1, 2, 3]
So, because i is representing index of item in l1, you can use it to show that actuall item, instead of index number:
>>> [l1[i] for i in range(len(l1))]
['z', 'a', 'b', 'c']
Then create slices (something like subselection of items from list) from l1 with length of2:
>>> [l1[i:i+len(l2)] for i in range(len(l1))]
[['z', 'a'], ['a', 'b'], ['b', 'c'], ['c']] #last one is shorter, because there is no next item.
Now you can compare each slice with l2 and you see that second one matched:
>>> [l1[i:i+len(l2)] == l2 for i in range(len(l1))]
[False, True, False, False] #notice that the second one is that matching one
Finally, with function named any, you can check if at least one of booleans is True:
>>> any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True

The efficient way to do this is to use the Boyer-Moore algorithm, as Mark Byers suggests. I have done it already here: Boyer-Moore search of a list for a sub-list in Python, but will paste the code here. It's based on the Wikipedia article.
The search() function returns the index of the sub-list being searched for, or -1 on failure.
def search(haystack, needle):
"""
Search list `haystack` for sublist `needle`.
"""
if len(needle) == 0:
return 0
char_table = make_char_table(needle)
offset_table = make_offset_table(needle)
i = len(needle) - 1
while i < len(haystack):
j = len(needle) - 1
while needle[j] == haystack[i]:
if j == 0:
return i
i -= 1
j -= 1
i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i]));
return -1
def make_char_table(needle):
"""
Makes the jump table based on the mismatched character information.
"""
table = {}
for i in range(len(needle) - 1):
table[needle[i]] = len(needle) - 1 - i
return table
def make_offset_table(needle):
"""
Makes the jump table based on the scan offset in which mismatch occurs.
"""
table = []
last_prefix_position = len(needle)
for i in reversed(range(len(needle))):
if is_prefix(needle, i + 1):
last_prefix_position = i + 1
table.append(last_prefix_position - i + len(needle) - 1)
for i in range(len(needle) - 1):
slen = suffix_length(needle, i)
table[slen] = len(needle) - 1 - i + slen
return table
def is_prefix(needle, p):
"""
Is needle[p:end] a prefix of needle?
"""
j = 0
for i in range(p, len(needle)):
if needle[i] != needle[j]:
return 0
j += 1
return 1
def suffix_length(needle, p):
"""
Returns the maximum length of the substring ending at p that is a suffix.
"""
length = 0;
j = len(needle) - 1
for i in reversed(range(p + 1)):
if needle[i] == needle[j]:
length += 1
else:
break
j -= 1
return length
Here is the example from the question:
def main():
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
index = search(list1, [1, 1, 1])
print(index)
index = search(list2, [1, 1, 1])
print(index)
if __name__ == '__main__':
main()
Output:
2
-1

Here is a way that will work for simple lists that is slightly less fragile than Mark's
def sublistExists(haystack, needle):
def munge(s):
return ", "+format(str(s)[1:-1])+","
return munge(needle) in munge(haystack)

def sublistExists(x, y):
occ = [i for i, a in enumerate(x) if a == y[0]]
for b in occ:
if x[b:b+len(y)] == y:
print 'YES-- SUBLIST at : ', b
return True
if len(occ)-1 == occ.index(b):
print 'NO SUBLIST'
return False
list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]
#should return True
sublistExists(list1, [1,1,1])
#Should return False
sublistExists(list2, [1,1,1])

Might as well throw in a recursive version of #NasBanov's solution
def foo(sub, lst):
'''Checks if sub is in lst.
Expects both arguments to be lists
'''
if len(lst) < len(sub):
return False
return sub == lst[:len(sub)] or foo(sub, lst[1:])

def sublist(l1,l2):
if len(l1) < len(l2):
for i in range(0, len(l1)):
for j in range(0, len(l2)):
if l1[i]==l2[j] and j==i+1:
pass
return True
else:
return False

I know this might not be quite relevant to the original question but it might be very elegant 1 line solution to someone else if the sequence of items in both lists doesn't matter. The result below will show True if List1 elements are in List2 (regardless of order). If the order matters then don't use this solution.
List1 = [10, 20, 30]
List2 = [10, 20, 30, 40]
result = set(List1).intersection(set(List2)) == set(List1)
print(result)
Output
True

if iam understanding this correctly, you have a larger list, like :
list_A= ['john', 'jeff', 'dave', 'shane', 'tim']
then there are other lists
list_B= ['sean', 'bill', 'james']
list_C= ['cole', 'wayne', 'jake', 'moose']
and then i append the lists B and C to list A
list_A.append(list_B)
list_A.append(list_C)
so when i print list_A
print (list_A)
i get the following output
['john', 'jeff', 'dave', 'shane', 'tim', ['sean', 'bill', 'james'], ['cole', 'wayne', 'jake', 'moose']]
now that i want to check if the sublist exists:
for value in list_A:
value= type(value)
value= str(value).strip('<>').split()[1]
if (value == "'list'"):
print "True"
else:
print "False"
this will give you 'True' if you have any sublist inside the larger list.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merging sublists with same initial element by substituting null values - python

Related

How to create a nest list or a tree view from a flat list based on value condition?

How to find the highest value element in a list with reference to a dictionary on python

How to find first value in a list having no duplicates?

Delete all previous lists if a following list contains same element

Is there a way to check if a list is a sublist of another list? [duplicate]

Categories

Resources