Group elements of a list based on repetition of values - python

I am really new to Python and I am having a issue figuring out the problem below.
I have a list like:
my_list = ['testOne:100', 'testTwo:88', 'testThree:76', 'testOne:78', 'testTwo:88', 'testOne:73', 'testTwo:66', 'testThree:90']
And I want to group the elements based on the occurrence of elements that start with 'testOne'.
Expected Result:
new_list=[['testOne:100', 'testTwo:88', 'testThree:76'], ['testOne:78', 'testTwo:88'], ['testOne:73', 'testTwo:66', 'testThree:90']]

Just start a new list at every testOne.
>>> new_list = []
>>> for item in my_list:
if item.startswith('testOne:'):
new_list.append([])
new_list[-1].append(item)
>>> new_list
[['testOne:100', 'testTwo:88', 'testThree:76'], ['testOne:78', 'testTwo:88'], ['testOne:73', 'testTwo:66', 'testThree:90']]

Not a cool one-liner, but this works also with more general labels:
result = [[]]
seen = set()
for entry in my_list:
test, val = entry.split(":")
if test in seen:
result.append([entry])
seen = {test}
else:
result[-1].append(entry)
seen.add(test)
Here, we are keeping track of the test labels we've already seen in a set and starting a new list whenever we encounter a label we've already seen in the same list.
Alternatively, assuming the lists always start with testOne, you could just start a new list whenever the label is testOne:
result = []
for entry in my_list:
test, val = entry.split(":")
if test == "testOne":
result.append([entry])
else:
result[-1].append(entry)

It'd be nice to have an easy one liner, but I think it'd end up looking a bit too complicated if I tried that. Here's what I came up with:
# Create a list of the starting indices:
ind = [i for i, e in enumerate(my_list) if e.split(':')[0] == 'testOne']
# Create a list of slices using pairs of indices:
new_list = [my_list[i:j] for (i, j) in zip(ind, ind[1:] + [None])]

Not very sophisticated but it works:
my_list = ['testOne:100', 'testTwo:88', 'testThree:76', 'testOne:78', 'testTwo:88', 'testOne:73', 'testTwo:66', 'testThree:90']
splitting_word = 'testOne'
new_list = list()
partial_list = list()
for item in my_list:
if item.startswith(splitting_word) and partial_list:
new_list.append(partial_list)
partial_list = list()
partial_list.append(item)
new_list.append(partial_list)

joining the list into a string with delimiter |
step1="|".join(my_list)
splitting the listing based on 'testOne'
step2=step1.split("testOne")
appending "testOne" to the list elements to get the result
new_list=[[i for i in str('testOne'+i).split("|") if len(i)>0] for i in step2[1:]]

Related

Remove file name duplicates in a list

I have a list l:
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
In this list, I need to remove duplicates without considering the extension. The expected output is below.
l = ['Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
I tried:
l = list(set(x.split('.')[0] for x in l))
But getting only unique filenames without extension
How could I achieve it?
You can use a dictionary comprehension that uses the name part as key and the full file name as the value, exploiting the fact that dict keys must be unique:
>>> list({x.split(".")[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the file names can be in more sophisticated formats (such as with directory names, or in the foo.bar.xls format) you should use os.path.splitext:
>>> import os
>>> list({os.path.splitext(x)[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the order of the end result doesn't matter, we could split each item on the period. We'll regard the first item in the list as the key and then keep the item if the key is unique.
oldList = l
setKeys = set()
l = []
for item in oldList:
itemKey = item.split(".")[0]
if itemKey in setKeys:
pass
else:
setKeys.add(itemKey)
l.append(item)
Try this
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
for x in l:
name = x.split('.')[0]
find = 0
for index,d in enumerate(l, start=0):
txt = d.split('.')[0]
if name == txt:
find += 1
if find > 1:
l.pop(index)
print(l)
#Selcuk Definitely the best solution, unfortunately I don't have enough reputation to vote you answer.
But I would rather use el[:el.rfind('.')] as my dictionary key than os.path.splitext(x)[0] in order to handle the case where we have sophisticated formats in the name. that will give something like this:
list({x[:x.rfind('.')]: x for x in l}.values())

Separate List data with Python

I have a list of multiple strings ) and I want to separate them by :
MainList :
[
GENERAL NOTES & MISCELLANEOUS DETAILS_None_None_None,
STR_XX_XX_0001,
STR_XX_XX_0002,
STR_XX_XX_0003,
GENERAL ARRANGEMENT_None_None_None,
STR_XX_XX_10001.0,
STR_XX_XX_10002.0,
STR_XX_XX_10003.0,
STR_XX_XX_10004.0,
STR_XX_XX_10005.0,
STR_XX_XX_10006.0
]
if string "_None_None_None" found in main list, it can add this data in new empty list and and remaining STR_XX_XX_0001 value to another list and it goes until it found another string with "_None_None_None" and do the same.
I have tried myself, But I think I won't be able to break my loop when it will find next string with "_None_None_None". Just figuring out the way, not sure logic is right.
empty1 = []
empty2 = []
for i in MainList:
if "_None_None_None" in i:
empty1.append(i)
# Need help on hear onwards
else:
while "_None" not in i:
empty2.append(i)
break
I am expecting the Output result in two list. Something like this:
List1:
[
GENERAL NOTES & MISCELLANEOUS DETAILS_None_None_None,
GENERAL ARRANGEMENT_None_None_None
]
List2:
[
[STR_XX_XX_0001,STR_XX_XX_0002,STR_XX_XX_0003],[STR_XX_XX_10001.0,STR_XX_XX_10002.0,STR_XX_XX_10003.0,STR_XX_XX_10004.0,STR_XX_XX_10005.0,STR_XX_XX_10006.0]
]
List2 is the list with sublists
You are making it a little too complicated, you can let the list run the whole way through without the internal while loop. Just make the decision for each element as it shows up in the loop:
empty1 = []
empty2 = []
for i in MainList:
if "_None_None_None" in i:
empty1.append(i)
else:
empty2.append(i)
This will give you two lists:
> empty1
> ['GENERAL NOTES & MISCELLANEOUS DETAILS_None_None_None',
'GENERAL ARRANGEMENT_None_None_None']
> empty2
> ['STR_XX_XX_0001',
'STR_XX_XX_0002',
'STR_XX_XX_0003',
'STR_XX_XX_10001.0',
'STR_XX_XX_10002.0',
'STR_XX_XX_10003.0',
'STR_XX_XX_10004.0',
'STR_XX_XX_10005.0',
'STR_XX_XX_10006.0']
EDIT Based on comment
If the commenter is correct and you want to group the non-NONE values into separate lists, this is a good use case for itertools.groupby. It will make the groups for you in a convenient, efficient way and your loop will look almost the same:
from itertools import groupby
empty1 = []
empty2 = []
for k, i in groupby(MainList, key = lambda x: "_None_None_None" in x):
if k:
empty1.extend(i)
else:
empty2.append(list(i))
This will give you the same empty1 but empty2 will not be a list of lists:
[['STR_XX_XX_0001', 'STR_XX_XX_0002', 'STR_XX_XX_0003'],
['STR_XX_XX_10001.0',
'STR_XX_XX_10002.0',
'STR_XX_XX_10003.0',
'STR_XX_XX_10004.0',
'STR_XX_XX_10005.0',
'STR_XX_XX_10006.0']]
You can try the following code snippet:
dlist = ["GENERAL NOTES & MISCELLANEOUS DETAILS_None_None_None","STR_XX_XX_0001","STR_XX_XX_0002","STR_XX_XX_0003", "GENERAL ARRANGEMENT_None_None_None","STR_XX_XX_10001.0","STR_XX_XX_10002.0", "STR_XX_XX_10003.0", "STR_XX_XX_10004.0", "STR_XX_XX_10005.0", "STR_XX_XX_10006.0"]
with_None = [elem for elem in dlist if elem.endswith("_None")]
without_None = [elem for elem in dlist if not elem.endswith("_None")]
You can also write a generic function for the process:
def cust_sept(src_list, value_to_find,f):
with_value, without_value = [elem for elem in dlist if f(elem,value_to_find)],[elem for elem in dlist if not f(elem,value_to_find)]
return with_value,without_value
list_one,list_two = cust_sept(dlist,"_None",str.endswith)

While loop within for loop for list of lists

I'm trying to create a big list that will contain lists of strings. I iterate over the input list of strings and create a temporary list.
Input:
['Mike','Angela','Bill','\n','Robert','Pam','\n',...]
My desired output:
[['Mike','Angela','Bill'],['Robert','Pam']...]
What i get:
[['Mike','Angela','Bill'],['Angela','Bill'],['Bill']...]
Code:
for i in range(0,len(temp)):
temporary = []
while(temp[i] != '\n' and i<len(temp)-1):
temporary.append(temp[i])
i+=1
bigList.append(temporary)
Use itertools.groupby
from itertools import groupby
names = ['Mike','Angela','Bill','\n','Robert','Pam']
[list(g) for k,g in groupby(names, lambda x:x=='\n') if not k]
#[['Mike', 'Angela', 'Bill'], ['Robert', 'Pam']]
Fixing your code, I'd recommend iterating over each element directly, appending to a nested list -
r = [[]]
for i in temp:
if i.strip():
r[-1].append(i)
else:
r.append([])
Note that if temp ends with a newline, r will have a trailing empty [] list. You can get rid of that though:
if not r[-1]:
del r[-1]
Another option would be using itertools.groupby, which the other answerer has already mentioned. Although, your method is more performant.
Your for loop was scanning over the temp array just fine, but the while loop on the inside was advancing that index. And then your while loop would reduce the index. This caused the repitition.
temp = ['mike','angela','bill','\n','robert','pam','\n','liz','anya','\n']
# !make sure to include this '\n' at the end of temp!
bigList = []
temporary = []
for i in range(0,len(temp)):
if(temp[i] != '\n'):
temporary.append(temp[i])
print(temporary)
else:
print(temporary)
bigList.append(temporary)
temporary = []
You could try:
a_list = ['Mike','Angela','Bill','\n','Robert','Pam','\n']
result = []
start = 0
end = 0
for indx, name in enumerate(a_list):
if name == '\n':
end = indx
sublist = a_list[start:end]
if sublist:
result.append(sublist)
start = indx + 1
>>> result
[['Mike', 'Angela', 'Bill'], ['Robert', 'Pam']]

Removing specific columns and rows from nested list

If for example I have glider = [[0,0,0,0],[1,2,3,4],[0,1,3,4],[0,0,0,0]], how would I go about deleting the first and last list as well as the first and last character per list if the nested lists varies. It would look like this after.
glider = [[2,3],[1,3]]
For example, I can't just simply use the del function because the dimensions will vary. ex:[[0,0,0,0,0],[1,2,3,4,5],[0,1,2,3,4],[0,0,0,0,0]]
, [[0,0,0],[1,2,3],[0,0,0]]
This is a small part of a bigger program but it has me stumped. Maybe the better method would be to to create a whole new list? Thank you.
You can do this using slicing and a temporary list:
glider = [[0,0,0,0],[1,2,3,4],[0,1,3,4],[0,0,0,0]]
glider = glider[1:-1]
templist = []
for i in glider:
templist.append(i[1:-1])
glider = templist
del templist
Output:
>>> glider
[[2, 3], [1, 3]]
You can create a new list with your given conditions
glider = [[y for y in x[1:-1]] for x in glider[1:-1]]
Try this:
glider = [[0,0,0,0],[1,2,3,4],[0,1,3,4],[0,0,0,0]]
def my_strip(lst):
if isinstance(lst, list):
lst = lst[1:-1]
for idx, val in enumerate(lst):
lst[idx] = my_strip(val)
return lst
print my_strip(glider)
Do you want something like:
a = [[0,0,0,0,0],[1,2,3,4,5],[0,1,2,3,4],[0,0,0,0,0]]
b = [k[1:-1] for k in a[1:-1]]
# remove consecutive equal items
if b:
res = [b[0]] + [k for i, k in enumerate(b[1:], 1) if k != b[i-1]]
else:
res = []

Nested lists python

Can anyone tell me how can I call for indexes in a nested list?
Generally I just write:
for i in range (list)
but what if I have a list with nested lists as below:
Nlist = [[2,2,2],[3,3,3],[4,4,4]...]
and I want to go through the indexes of each one separately?
If you really need the indices you can just do what you said again for the inner list:
l = [[2,2,2],[3,3,3],[4,4,4]]
for index1 in xrange(len(l)):
for index2 in xrange(len(l[index1])):
print index1, index2, l[index1][index2]
But it is more pythonic to iterate through the list itself:
for inner_l in l:
for item in inner_l:
print item
If you really need the indices you can also use enumerate:
for index1, inner_l in enumerate(l):
for index2, item in enumerate(inner_l):
print index1, index2, item, l[index1][index2]
Try this setup:
a = [["a","b","c",],["d","e"],["f","g","h"]]
To print the 2nd element in the 1st list ("b"), use print a[0][1] - For the 2nd element in 3rd list ("g"): print a[2][1]
The first brackets reference which nested list you're accessing, the second pair references the item in that list.
You can do this. Adapt it to your situation:
for l in Nlist:
for item in l:
print item
The question title is too wide and the author's need is more specific. In my case, I needed to extract all elements from nested list like in the example below:
Example:
input -> [1,2,[3,4]]
output -> [1,2,3,4]
The code below gives me the result, but I would like to know if anyone can create a simpler answer:
def get_elements_from_nested_list(l, new_l):
if l is not None:
e = l[0]
if isinstance(e, list):
get_elements_from_nested_list(e, new_l)
else:
new_l.append(e)
if len(l) > 1:
return get_elements_from_nested_list(l[1:], new_l)
else:
return new_l
Call of the method
l = [1,2,[3,4]]
new_l = []
get_elements_from_nested_list(l, new_l)
n = [[1, 2, 3], [4, 5, 6, 7, 8, 9]]
def flatten(lists):
results = []
for numbers in lists:
for numbers2 in numbers:
results.append(numbers2)
return results
print flatten(n)
Output: n = [1,2,3,4,5,6,7,8,9]
I think you want to access list values and their indices simultaneously and separately:
l = [[2,2,2],[3,3,3],[4,4,4],[5,5,5]]
l_len = len(l)
l_item_len = len(l[0])
for i in range(l_len):
for j in range(l_item_len):
print(f'List[{i}][{j}] : {l[i][j]}' )

Categories

Resources