Separate List data with Python - python

I have a list of multiple strings ) and I want to separate them by :
MainList :
[
GENERAL NOTES & MISCELLANEOUS DETAILS_None_None_None,
STR_XX_XX_0001,
STR_XX_XX_0002,
STR_XX_XX_0003,
GENERAL ARRANGEMENT_None_None_None,
STR_XX_XX_10001.0,
STR_XX_XX_10002.0,
STR_XX_XX_10003.0,
STR_XX_XX_10004.0,
STR_XX_XX_10005.0,
STR_XX_XX_10006.0
]
if string "_None_None_None" found in main list, it can add this data in new empty list and and remaining STR_XX_XX_0001 value to another list and it goes until it found another string with "_None_None_None" and do the same.
I have tried myself, But I think I won't be able to break my loop when it will find next string with "_None_None_None". Just figuring out the way, not sure logic is right.
empty1 = []
empty2 = []
for i in MainList:
if "_None_None_None" in i:
empty1.append(i)
# Need help on hear onwards
else:
while "_None" not in i:
empty2.append(i)
break
I am expecting the Output result in two list. Something like this:
List1:
[
GENERAL NOTES & MISCELLANEOUS DETAILS_None_None_None,
GENERAL ARRANGEMENT_None_None_None
]
List2:
[
[STR_XX_XX_0001,STR_XX_XX_0002,STR_XX_XX_0003],[STR_XX_XX_10001.0,STR_XX_XX_10002.0,STR_XX_XX_10003.0,STR_XX_XX_10004.0,STR_XX_XX_10005.0,STR_XX_XX_10006.0]
]
List2 is the list with sublists

You are making it a little too complicated, you can let the list run the whole way through without the internal while loop. Just make the decision for each element as it shows up in the loop:
empty1 = []
empty2 = []
for i in MainList:
if "_None_None_None" in i:
empty1.append(i)
else:
empty2.append(i)
This will give you two lists:
> empty1
> ['GENERAL NOTES & MISCELLANEOUS DETAILS_None_None_None',
'GENERAL ARRANGEMENT_None_None_None']
> empty2
> ['STR_XX_XX_0001',
'STR_XX_XX_0002',
'STR_XX_XX_0003',
'STR_XX_XX_10001.0',
'STR_XX_XX_10002.0',
'STR_XX_XX_10003.0',
'STR_XX_XX_10004.0',
'STR_XX_XX_10005.0',
'STR_XX_XX_10006.0']
EDIT Based on comment
If the commenter is correct and you want to group the non-NONE values into separate lists, this is a good use case for itertools.groupby. It will make the groups for you in a convenient, efficient way and your loop will look almost the same:
from itertools import groupby
empty1 = []
empty2 = []
for k, i in groupby(MainList, key = lambda x: "_None_None_None" in x):
if k:
empty1.extend(i)
else:
empty2.append(list(i))
This will give you the same empty1 but empty2 will not be a list of lists:
[['STR_XX_XX_0001', 'STR_XX_XX_0002', 'STR_XX_XX_0003'],
['STR_XX_XX_10001.0',
'STR_XX_XX_10002.0',
'STR_XX_XX_10003.0',
'STR_XX_XX_10004.0',
'STR_XX_XX_10005.0',
'STR_XX_XX_10006.0']]

You can try the following code snippet:
dlist = ["GENERAL NOTES & MISCELLANEOUS DETAILS_None_None_None","STR_XX_XX_0001","STR_XX_XX_0002","STR_XX_XX_0003", "GENERAL ARRANGEMENT_None_None_None","STR_XX_XX_10001.0","STR_XX_XX_10002.0", "STR_XX_XX_10003.0", "STR_XX_XX_10004.0", "STR_XX_XX_10005.0", "STR_XX_XX_10006.0"]
with_None = [elem for elem in dlist if elem.endswith("_None")]
without_None = [elem for elem in dlist if not elem.endswith("_None")]
You can also write a generic function for the process:
def cust_sept(src_list, value_to_find,f):
with_value, without_value = [elem for elem in dlist if f(elem,value_to_find)],[elem for elem in dlist if not f(elem,value_to_find)]
return with_value,without_value
list_one,list_two = cust_sept(dlist,"_None",str.endswith)

Related

python list of lists contain substring

I have the list_of_lists and I need to get the string that contains 'height' in the sublists and if there is no height at all I need to get 'nvt' for the whole sublist.
I have tried the following:
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
_lists = []
for list in list_of_lists:
list1 = []
for st in list:
if ("height" ) in st:
list1.append(st)
else:
list1.append('nvt')
_lists.append(list1)
OUT = _lists
the result I need to have is :
_lists = ['nvt', 'height=4']
what I'm getting is:
_lists = [['nvt','nvt'],['nvt','nvt','height=4']]
This is a good case for implementing a for/else construct as follows:
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
result = []
for e in list_of_lists:
for ss in e:
if ss.startswith('height'):
result.append(ss)
break
else:
result.append('nvt')
print(result)
Output:
['nvt', 'height=4']
Note:
This could probably be done with a list comprehension but I think this is more obvious and probably has no significant difference in terms of performance
This should work, you can assign height variable to first value in the sublist where s.startswith("height") is True, and if nothing matches this filter, you can assign height to 'nvt'.
_lists = []
for sublist in list_of_lists:
height = next(filter(lambda s: s.startswith("height"), sublist), 'nvt')
_lists.append(height)
And if you wish to be crazy, you can use list comprehension to reduce the code to the:
_lists = [next(filter(lambda s: s.startswith("height"), sublist), 'nvt') for sublist in list_of_lists]
Try this (Python 3.x):
import re
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
_lists = []
r = re.compile("height=")
for li in list_of_lists:
match = list(filter(r.match, li))
if len(match) > 0:
_lists.extend(match)
else:
_lists.append('nvt')
OUT = _lists
print(OUT)

Remove file name duplicates in a list

I have a list l:
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
In this list, I need to remove duplicates without considering the extension. The expected output is below.
l = ['Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
I tried:
l = list(set(x.split('.')[0] for x in l))
But getting only unique filenames without extension
How could I achieve it?
You can use a dictionary comprehension that uses the name part as key and the full file name as the value, exploiting the fact that dict keys must be unique:
>>> list({x.split(".")[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the file names can be in more sophisticated formats (such as with directory names, or in the foo.bar.xls format) you should use os.path.splitext:
>>> import os
>>> list({os.path.splitext(x)[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the order of the end result doesn't matter, we could split each item on the period. We'll regard the first item in the list as the key and then keep the item if the key is unique.
oldList = l
setKeys = set()
l = []
for item in oldList:
itemKey = item.split(".")[0]
if itemKey in setKeys:
pass
else:
setKeys.add(itemKey)
l.append(item)
Try this
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
for x in l:
name = x.split('.')[0]
find = 0
for index,d in enumerate(l, start=0):
txt = d.split('.')[0]
if name == txt:
find += 1
if find > 1:
l.pop(index)
print(l)
#Selcuk Definitely the best solution, unfortunately I don't have enough reputation to vote you answer.
But I would rather use el[:el.rfind('.')] as my dictionary key than os.path.splitext(x)[0] in order to handle the case where we have sophisticated formats in the name. that will give something like this:
list({x[:x.rfind('.')]: x for x in l}.values())

How to find repeating strings one after the other?

I recently started studying Python on my own, and I have not been able to answer this question:
"Write a function that receives a list of names (as strings). The function returns a list of all the names that appear twice in a row in the list, without duplicates. For example, for the list:
["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"] The function will return a list of Avi and Shlomo. The function must work in O (n)."
This is what I have written so far, but I have not succeeded:
def double_names(lst):
new_lst = []
for i in lst:
if lst[i] == lst[i+1]:
new_lst.append(i)
return new_lst
print(double_names(["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]))
Your attempt fails because:
There's not always a list element i+1 (you could use zip(lst, lst[1:]) instead).
You are not accounting for the fact that in your result no names should appear twice. You could use a set for this.
You're iterating over a list while expecting to get indices (you'll actually get list items). (Thanks, Paul)
Something like this should work:
def double_names(lst):
new_set = set()
for first, second in zip(lst, lst[1:]):
if first == second:
new_set.add(first)
return list(new_set)
keep a variable previous_name which will store the name from the previous iteration.
def double_names(names_list):
double_names = []
previous_name = None
for n in names_list:
if previous_name is not None:
if n == previous_name:
if n not in double_names:
double_names.append(n)
previous_name = n
return double_names
I think the easiest way is by saving the first element of the list. Then you start iterating over the list from first position on and check if the current element is the same as the last element you checked. If it is, then you have a match, if not then you just update the last element and keep going.
def doublenames(l):
last = l[0]
new_list = []
for element in l[1:]:
if element == last:
if element not in new_list:
new_list.append(element)
else:
last = element
return new_list
Hope it helped :)
(You can also do it in one line:
[x[0] for x in set(zip(l, l[1:])) if (x[0] == x[1])]
)
Another solution:
lst = ["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]
def doublenames(L):
out = {}
for v in L:
out.setdefault(v, []).append(v)
return [v[0] for v in out.values() if len(v) > 1]
print(doublenames(lst))
Prints:
['avi', 'shlomo']
If the order of the names doesn't matter, you can use a set to remove duplicates
def double_names(names):
from itertools import groupby
return list({key for key, group in groupby(names) if len(tuple(group)) >= 2})
print(double_names(["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]))
Output:
['shlomo', 'avi']
>>>
Just tried this with Python 3.8.2, and it seems the order is not preserved.

Group elements of a list based on repetition of values

I am really new to Python and I am having a issue figuring out the problem below.
I have a list like:
my_list = ['testOne:100', 'testTwo:88', 'testThree:76', 'testOne:78', 'testTwo:88', 'testOne:73', 'testTwo:66', 'testThree:90']
And I want to group the elements based on the occurrence of elements that start with 'testOne'.
Expected Result:
new_list=[['testOne:100', 'testTwo:88', 'testThree:76'], ['testOne:78', 'testTwo:88'], ['testOne:73', 'testTwo:66', 'testThree:90']]
Just start a new list at every testOne.
>>> new_list = []
>>> for item in my_list:
if item.startswith('testOne:'):
new_list.append([])
new_list[-1].append(item)
>>> new_list
[['testOne:100', 'testTwo:88', 'testThree:76'], ['testOne:78', 'testTwo:88'], ['testOne:73', 'testTwo:66', 'testThree:90']]
Not a cool one-liner, but this works also with more general labels:
result = [[]]
seen = set()
for entry in my_list:
test, val = entry.split(":")
if test in seen:
result.append([entry])
seen = {test}
else:
result[-1].append(entry)
seen.add(test)
Here, we are keeping track of the test labels we've already seen in a set and starting a new list whenever we encounter a label we've already seen in the same list.
Alternatively, assuming the lists always start with testOne, you could just start a new list whenever the label is testOne:
result = []
for entry in my_list:
test, val = entry.split(":")
if test == "testOne":
result.append([entry])
else:
result[-1].append(entry)
It'd be nice to have an easy one liner, but I think it'd end up looking a bit too complicated if I tried that. Here's what I came up with:
# Create a list of the starting indices:
ind = [i for i, e in enumerate(my_list) if e.split(':')[0] == 'testOne']
# Create a list of slices using pairs of indices:
new_list = [my_list[i:j] for (i, j) in zip(ind, ind[1:] + [None])]
Not very sophisticated but it works:
my_list = ['testOne:100', 'testTwo:88', 'testThree:76', 'testOne:78', 'testTwo:88', 'testOne:73', 'testTwo:66', 'testThree:90']
splitting_word = 'testOne'
new_list = list()
partial_list = list()
for item in my_list:
if item.startswith(splitting_word) and partial_list:
new_list.append(partial_list)
partial_list = list()
partial_list.append(item)
new_list.append(partial_list)
joining the list into a string with delimiter |
step1="|".join(my_list)
splitting the listing based on 'testOne'
step2=step1.split("testOne")
appending "testOne" to the list elements to get the result
new_list=[[i for i in str('testOne'+i).split("|") if len(i)>0] for i in step2[1:]]

Remove duplicates in a list while keeping its order (Python)

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)
biglist =
[
{'title':'U2 Band','link':'u2.com'},
{'title':'Live Concert by U2','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'}
]
In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.
use set(), then re-sort using the index of the original list.
>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']
My answer to your other question, which you completely ignored!, shows you're wrong in claiming that
The answers of that question did not
keep the "order"
my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:
Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:
biglist = [
{'title':'U2 Band','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'},
{'title':'Live Concert by U2','link':'u2.com'}
]
known_links = set()
newlist = []
for d in biglist:
link = d['link']
if link in known_links: continue
newlist.append(d)
known_links.add(link)
biglist[:] = newlist
Generators are great.
def unique( seq ):
seen = set()
for item in seq:
if item not in seen:
seen.add( item )
yield item
biglist[:] = unique( biglist )
This page discusses different methods and their speeds:
http://www.peterbe.com/plog/uniqifiers-benchmark
The recommended* method:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
f5(biglist,lambda x: x['link'])
*by that page
This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):
mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]
And in the context of the answer:
[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]
dups = {}
newlist = []
for x in biglist:
if x['link'] not in dups:
newlist.append(x)
dups[x['link']] = None
print newlist
produces
[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]
Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.
Try this :
list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
if i not in uniq:
uniq.append(i)
print list
print uniq
output will be :
['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']
A super easy way to do this is:
def uniq(a):
if len(a) == 0:
return []
else:
return [a[0]] + uniq([x for x in a if x != a[0]])
This is not the most efficient way, because:
it searches through the whole list for every element in the list, so it's O(n^2)
it's recursive so uses a stack depth equal to the length of the list
However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.
I think using a set should be pretty efficent.
seen_links = set()
for index in len(biglist):
link = biglist[index]['link']
if link in seen_links:
del(biglist[index])
seen_links.add(link)
I think this should come in at O(nlog(n))

Categories

Resources