"group" items in list - python

I have
[string, int,int],[string, int,int]... kind of list that I want to group with a different list
[string1, int1,int1] + [string2, int2,int2] = ["string+string2", int1+int1+int2+int2]
the History goes like I have already made import function that gets me compounds:
ex[Ch3, 15.3107,15.284] kinda like this...
I have a function that gives me:
dictionary{0:"CH3"}
and another that gives me:
List ["CH3",30.594700000000003]
def group_selectec_compounds(numCompound,values)
values can be list of list that have everything
I also have dic made that is something like this {0:["CH4"],...}
numCoumpound should be various variables (I think) or tuple of keys? So I can do the math for the user.
In the end I want something like: ["CH3+CH4",61.573]
it can also be: ["CH3+CH4+H2SO4",138.773]

I would solve this using '+'.join, sum and comprehensions:
>>> data = [['string1', 2, 3], ['string2', 4, 5], ['string3', 6, 7]]
>>> ['+'.join(s for s, _, _ in data), sum(x + y for _, x, y in data)]
['string1+string2+string3', 27]

First, create a dictionary that stores the location of the type:
helper = {}
for elem in lst1:
elemType = str(type(elem))
if elemType not in helper:
helper[elemType] = lst1.index[elem]
Now you have a dictionary that indexes your original list by type, you just have to run the second list and append accordingly:
for elem in lst2:
elemType = str(type(elem))
if elemType not in helper:
#in case list 2 contains a type that list 1 doesn't have
lst1.append(elem)
helper[elemType] = lst1.index[elem]
else:
lst1[helper[elemType]] += elem
Hope this makes sense! I have not vetted this for correctness but the idea is there.
Edit: This also does not solve the issue of list 1 having more than 1 string or more than 1 int, etc., but to solve that should be trivial depending on how you wish to resolve that issue.
2nd Edit: This answer is generic, so it doesn't matter how you order the strings and ints in the list, in fact, lst1 can be [string, int, double] and lst2 can be [int, double, string] and this would still work, so it is robust in case the order of your list changes

Related

Removing duplicates from a list in Python [duplicate]

This question already has answers here:
Removing duplicates in lists
(56 answers)
Closed 4 years ago.
How would I use python to check a list and delete all duplicates? I don't want to have to specify what the duplicate item is - I want the code to figure out if there are any and remove them if so, keeping only one instance of each. It also must work if there are multiple duplicates in a list.
For example, in my code below, the list lseparatedOrbList has 12 items - one is repeated six times, one is repeated five times, and there is only one instance of one. I want it to change the list so there are only three items - one of each, and in the same order they appeared before. I tried this:
for i in lseparatedOrbList:
for j in lseparatedOrblist:
if lseparatedOrbList[i] == lseparatedOrbList[j]:
lseparatedOrbList.remove(lseparatedOrbList[j])
But I get the error:
Traceback (most recent call last):
File "qchemOutputSearch.py", line 123, in <module>
for j in lseparatedOrblist:
NameError: name 'lseparatedOrblist' is not defined
I'm guessing because it's because I'm trying to loop through lseparatedOrbList while I loop through it, but I can't think of another way to do it.
Use set():
woduplicates = set(lseparatedOrblist)
Returns a set without duplicates. If you, for some reason, need a list back:
woduplicates = list(set(lseperatedOrblist))
This will, however, have a different order than your original list.
Just make a new list to populate, if the item for your list is not yet in the new list input it, else just move on to the next item in your original list.
for i in mylist:
if i not in newlist:
newlist.append(i)
This should be faster and will preserve the original order:
seen = {}
new_list = [seen.setdefault(x, x) for x in my_list if x not in seen]
If you don't care about order, you can just:
new_list = list(set(my_list))
You can do this like that:
x = list(set(x))
Example: if you do something like that:
x = [1,2,3,4,5,6,7,8,9,10,2,1,6,31,20]
x = list(set(x))
x
you will see the following result:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 31]
There is only one thing you should think of: the resulting list will not be ordered as the original one (will lose the order in the process).
The modern way to do it that maintains the order is:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(lseparatedOrbList))
as discussed by Raymond Hettinger in this answer. In python 3.5 and above this is also the fastest way - see the linked answer for details. However the keys must be hashable (as is the case in your list I think)
As of python 3.7 ordered dicts are a language feature so the above call becomes
>>> list(dict.fromkeys(lseparatedOrbList))
Performance:
"""Dedup list."""
import sys
import timeit
repeat = 3
numbers = 1000
setup = """"""
def timer(statement, msg='', _setup=None):
print(msg, min(
timeit.Timer(statement, setup=_setup or setup).repeat(
repeat, numbers)))
print(sys.version)
s = """import random; n=%d; li = [random.randint(0, 100) for _ in range(n)]"""
for siz, m in ((150, "\nFew duplicates"), (15000, "\nMany duplicates")):
print(m)
setup = s % siz
timer('s = set(); [i for i in li if i not in s if not s.add(i)]', "s.add(i):")
timer('list(dict.fromkeys(li))', "dict:")
timer('list(set(li))', 'Not order preserving: list(set(li)):')
gives:
3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]
Few duplicates
s.add(i): 0.008242200000040611
dict: 0.0037373999998635554
Not order preserving: list(set(li)): 0.0029409000001123786
Many duplicates
s.add(i): 0.2839437000000089
dict: 0.21970469999996567
Not order preserving: list(set(li)): 0.102068700000018
So dict seems consistently faster although approaching list comprehension with set.add for many duplicates - not sure if further varying the numbers would give different results. list(set) is of course faster but does not preserve original list order, a requirement here
This should do it for you:
new_list = list(set(old_list))
set will automatically remove duplicates. list will cast it back to a list.
No, it's simply a typo, the "list" at the end must be capitalized. You can nest loops over the same variable just fine (although there's rarely a good reason to).
However, there are other problems with the code. For starters, you're iterating through lists, so i and j will be items not indices. Furthermore, you can't change a collection while iterating over it (well, you "can" in that it runs, but madness lies that way - for instance, you'll propably skip over items). And then there's the complexity problem, your code is O(n^2). Either convert the list into a set and back into a list (simple, but shuffles the remaining list items) or do something like this:
seen = set()
new_x = []
for x in xs:
if x in seen:
continue
seen.add(x)
new_xs.append(x)
Both solutions require the items to be hashable. If that's not possible, you'll probably have to stick with your current approach sans the mentioned problems.
It's because you are missing a capital letter, actually.
Purposely dedented:
for i in lseparatedOrbList: # capital 'L'
for j in lseparatedOrblist: # lowercase 'l'
Though the more efficient way to do it would be to insert the contents into a set.
If maintaining the list order matters (ie, it must be "stable"), check out the answers on this question
for unhashable lists. It is faster as it does not iterate about already checked entries.
def purge_dublicates(X):
unique_X = []
for i, row in enumerate(X):
if row not in X[i + 1:]:
unique_X.append(row)
return unique_X
There is a faster way to fix this:
list = [1, 1.0, 1.41, 1.73, 2, 2, 2.0, 2.24, 3, 3, 4, 4, 4, 5, 6, 6, 8, 8, 9, 10]
list2=[]
for value in list:
try:
list2.index(value)
except:
list2.append(value)
list.clear()
for value in list2:
list.append(value)
list2.clear()
print(list)
print(list2)
In this way one can delete a particular item which is present multiple times in a list : Try deleting all 5
list1=[1,2,3,4,5,6,5,3,5,7,11,5,9,8,121,98,67,34,5,21]
print list1
n=input("item to be deleted : " )
for i in list1:
if n in list1:
list1.remove(n)
print list1

Matching elements between lists in Python - keeping location

I have two lists, both fairly long. List A contains a list of integers, some of which are repeated in list B. I can find which elements appear in both by using:
idx = set(list_A).intersection(list_B)
This returns a set of all the elements appearing in both list A and list B.
However, I would like to find a way to find the matches between the two lists and also retain information about the elements' positions in both lists. Such a function might look like:
def match_lists(list_A,list_B):
.
.
.
return match_A,match_B
where match_A would contain the positions of elements in list_A that had a match somewhere in list_B and vice-versa for match_B.
I can see how to construct such lists using a for-loop, however this feels like it would be prohibitively slow for long lists.
Regarding duplicates: list_B has no duplicates in it, if there is a duplicate in list_A then return all the matched positions as a list, so match_A would be a list of lists.
That should do the job :)
def match_list(list_A, list_B):
intersect = set(list_A).intersection(list_B)
interPosA = [[i for i, x in enumerate(list_A) if x == dup] for dup in intersect]
interPosB = [i for i, x in enumerate(list_B) if x in intersect]
return interPosA, interPosB
(Thanks to machine yearning for duplicate edit)
Use dicts or defaultdicts to store the unique values as keys that map to the indices they appear at, then combine the dicts:
from collections import defaultdict
def make_offset_dict(it):
ret = defaultdict(list) # Or set, the values are unique indices either way
for i, x in enumerate(it):
ret[x].append(i)
dictA = make_offset_dict(A)
dictB = make_offset_dict(B)
for k in dictA.viewkeys() & dictB.viewkeys(): # Plain .keys() on Py3
print(k, dictA[k], dictB[k])
This iterates A and B exactly once each so it works even if they're one-time use iterators, e.g. from a file-like object, and it works efficiently, storing no more data than needed and sticking to cheap hashing based operations instead of repeated iteration.
This isn't the solution to your specific problem, but it preserves all the information needed to solve your problem and then some (e.g. it's cheap to figure out where the matches are located for any given value in either A or B); you can trivially adapt it to your use case or more complicated ones.
How about this:
def match_lists(list_A, list_B):
idx = set(list_A).intersection(list_B)
A_indexes = []
for i, element in enumerate(list_A):
if element in idx:
A_indexes.append(i)
B_indexes = []
for i, element in enumerate(list_B):
if element in idx:
B_indexes.append(i)
return A_indexes, B_indexes
This only runs through each list once (requiring only one dict) and also works with duplicates in list_B
def match_lists(list_A,list_B):
da=dict((e,i) for i,e in enumerate(list_A))
for bi,e in enumerate(list_B):
try:
ai=da[e]
yield (e,ai,bi) # element e is in position ai in list_A and bi in list_B
except KeyError:
pass
Try this:
def match_lists(list_A, list_B):
match_A = {}
match_B = {}
for elem in list_A:
if elem in list_B:
match_A[elem] = list_A.index(elem)
match_B[elem] = list_B.index(elem)
return match_A, match_B

How to remove all duplicate items from a list [duplicate]

This question already has answers here:
Removing duplicates in lists
(56 answers)
Closed 4 years ago.
How would I use python to check a list and delete all duplicates? I don't want to have to specify what the duplicate item is - I want the code to figure out if there are any and remove them if so, keeping only one instance of each. It also must work if there are multiple duplicates in a list.
For example, in my code below, the list lseparatedOrbList has 12 items - one is repeated six times, one is repeated five times, and there is only one instance of one. I want it to change the list so there are only three items - one of each, and in the same order they appeared before. I tried this:
for i in lseparatedOrbList:
for j in lseparatedOrblist:
if lseparatedOrbList[i] == lseparatedOrbList[j]:
lseparatedOrbList.remove(lseparatedOrbList[j])
But I get the error:
Traceback (most recent call last):
File "qchemOutputSearch.py", line 123, in <module>
for j in lseparatedOrblist:
NameError: name 'lseparatedOrblist' is not defined
I'm guessing because it's because I'm trying to loop through lseparatedOrbList while I loop through it, but I can't think of another way to do it.
Use set():
woduplicates = set(lseparatedOrblist)
Returns a set without duplicates. If you, for some reason, need a list back:
woduplicates = list(set(lseperatedOrblist))
This will, however, have a different order than your original list.
Just make a new list to populate, if the item for your list is not yet in the new list input it, else just move on to the next item in your original list.
for i in mylist:
if i not in newlist:
newlist.append(i)
This should be faster and will preserve the original order:
seen = {}
new_list = [seen.setdefault(x, x) for x in my_list if x not in seen]
If you don't care about order, you can just:
new_list = list(set(my_list))
You can do this like that:
x = list(set(x))
Example: if you do something like that:
x = [1,2,3,4,5,6,7,8,9,10,2,1,6,31,20]
x = list(set(x))
x
you will see the following result:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 31]
There is only one thing you should think of: the resulting list will not be ordered as the original one (will lose the order in the process).
The modern way to do it that maintains the order is:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(lseparatedOrbList))
as discussed by Raymond Hettinger in this answer. In python 3.5 and above this is also the fastest way - see the linked answer for details. However the keys must be hashable (as is the case in your list I think)
As of python 3.7 ordered dicts are a language feature so the above call becomes
>>> list(dict.fromkeys(lseparatedOrbList))
Performance:
"""Dedup list."""
import sys
import timeit
repeat = 3
numbers = 1000
setup = """"""
def timer(statement, msg='', _setup=None):
print(msg, min(
timeit.Timer(statement, setup=_setup or setup).repeat(
repeat, numbers)))
print(sys.version)
s = """import random; n=%d; li = [random.randint(0, 100) for _ in range(n)]"""
for siz, m in ((150, "\nFew duplicates"), (15000, "\nMany duplicates")):
print(m)
setup = s % siz
timer('s = set(); [i for i in li if i not in s if not s.add(i)]', "s.add(i):")
timer('list(dict.fromkeys(li))', "dict:")
timer('list(set(li))', 'Not order preserving: list(set(li)):')
gives:
3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]
Few duplicates
s.add(i): 0.008242200000040611
dict: 0.0037373999998635554
Not order preserving: list(set(li)): 0.0029409000001123786
Many duplicates
s.add(i): 0.2839437000000089
dict: 0.21970469999996567
Not order preserving: list(set(li)): 0.102068700000018
So dict seems consistently faster although approaching list comprehension with set.add for many duplicates - not sure if further varying the numbers would give different results. list(set) is of course faster but does not preserve original list order, a requirement here
This should do it for you:
new_list = list(set(old_list))
set will automatically remove duplicates. list will cast it back to a list.
No, it's simply a typo, the "list" at the end must be capitalized. You can nest loops over the same variable just fine (although there's rarely a good reason to).
However, there are other problems with the code. For starters, you're iterating through lists, so i and j will be items not indices. Furthermore, you can't change a collection while iterating over it (well, you "can" in that it runs, but madness lies that way - for instance, you'll propably skip over items). And then there's the complexity problem, your code is O(n^2). Either convert the list into a set and back into a list (simple, but shuffles the remaining list items) or do something like this:
seen = set()
new_x = []
for x in xs:
if x in seen:
continue
seen.add(x)
new_xs.append(x)
Both solutions require the items to be hashable. If that's not possible, you'll probably have to stick with your current approach sans the mentioned problems.
It's because you are missing a capital letter, actually.
Purposely dedented:
for i in lseparatedOrbList: # capital 'L'
for j in lseparatedOrblist: # lowercase 'l'
Though the more efficient way to do it would be to insert the contents into a set.
If maintaining the list order matters (ie, it must be "stable"), check out the answers on this question
for unhashable lists. It is faster as it does not iterate about already checked entries.
def purge_dublicates(X):
unique_X = []
for i, row in enumerate(X):
if row not in X[i + 1:]:
unique_X.append(row)
return unique_X
There is a faster way to fix this:
list = [1, 1.0, 1.41, 1.73, 2, 2, 2.0, 2.24, 3, 3, 4, 4, 4, 5, 6, 6, 8, 8, 9, 10]
list2=[]
for value in list:
try:
list2.index(value)
except:
list2.append(value)
list.clear()
for value in list2:
list.append(value)
list2.clear()
print(list)
print(list2)
In this way one can delete a particular item which is present multiple times in a list : Try deleting all 5
list1=[1,2,3,4,5,6,5,3,5,7,11,5,9,8,121,98,67,34,5,21]
print list1
n=input("item to be deleted : " )
for i in list1:
if n in list1:
list1.remove(n)
print list1

How to separate one list in two via list comprehension or otherwise

If have a list of dictionary items like so:
L = [{"a":1, "b":0}, {"a":3, "b":1}...]
I would like to split these entries based upon the value of "b", either 0 or 1.
A(b=0) = [{"a":1, "b":1}, ....]
B(b=1) = [{"a":3, "b":2}, .....]
I am comfortable with using simple list comprehensions, and i am currently looping through the list L two times.
A = [d for d in L if d["b"] == 0]
B = [d for d in L if d["b"] != 0]
Clearly this is not the most efficient way.
An else clause does not seem to be available within the list comprehension functionality.
Can I do what I want via list comprehension?
Is there a better way to do this?
I am looking for a good balance between readability and efficiency, leaning towards readability.
Thanks!
update:
thanks everyone for the comments and ideas! the most easiest one for me to read is the one by Thomas. but i will look at Alex' suggestion as well. i had not found any reference to the collections module before.
Don't use a list comprehension. List comprehensions are for when you want a single list result. You obviously don't :) Use a regular for loop:
A = []
B = []
for item in L:
if item['b'] == 0:
target = A
else:
target = B
target.append(item)
You can shorten the snippet by doing, say, (A, B)[item['b'] != 0].append(item), but why bother?
If the b value can be only 0 or 1, #Thomas's simple solution is probably best. For a more general case (in which you want to discriminate among several possible values of b -- your sample "expected results" appear to be completely divorced from and contradictory to your question's text, so it's far from obvious whether you actually need some generality;-):
from collections import defaultdict
separated = defaultdict(list)
for x in L:
separated[x['b']].append(x)
When this code executes, separated ends up with a dict (actually an instance of collections.defaultdict, a dict subclass) whose keys are all values for b that actually occur in dicts in list L, the corresponding values being the separated sublists. So, for example, if b takes only the values 0 and 1, separated[0] would be what (in your question's text as opposed to the example) you want as list A, and separated[1] what you want as list B.

How do you create a list like PHP's in Python?

This is an incredibly simple question (I'm new to Python).
I basically want a data structure like a PHP array -- i.e., I want to initialise it and then just add values into it.
As far as I can tell, this is not possible with Python, so I've got the maximum value I might want to use as an index, but I can't figure out how to create an empty list of a specified length.
Also, is a list the right data structure to use to model what feels like it should just be an array? I tried to use an array, but it seemed unhappy with storing strings.
Edit: Sorry, I didn't explain very clearly what I was looking for. When I add items into the list, I do not want to put them in in sequence, but rather I want to insert them into specified slots in the list.
I.e., I want to be able to do this:
list = []
for row in rows:
c = list_of_categories.index(row["id"])
print c
list[c] = row["name"]
Depending on how you are going to use the list, it may be that you actually want a dictionary. This will work:
d = {}
for row in rows:
c = list_of_categories.index(row["id"])
print c
d[c] = row["name"]
... or more compactly:
d = dict((list_of_categories.index(row['id']), row['name']) for row in rows)
print d
PHP arrays are much more like Python dicts than they are like Python lists. For example, they can have strings for keys.
And confusingly, Python has an array module, which is described as "efficient arrays of numeric values", which is definitely not what you want.
If the number of items you want is known in advance, and you want to access them using integer, 0-based, consecutive indices, you might try this:
n = 3
array = n * [None]
print array
array[2] = 11
array[1] = 47
array[0] = 42
print array
This prints:
[None, None, None]
[42, 47, 11]
Use the list constructor, and append your items, like this:
l = list ()
l.append ("foo")
l.append (3)
print (l)
gives me ['foo', 3], which should be what you want. See the documentation on list and the sequence type documentation.
EDIT Updated
For inserting, use insert, like this:
l = list ()
l.append ("foo")
l.append (3)
l.insert (1, "new")
print (l)
which prints ['foo', 'new', 3]
http://diveintopython3.ep.io/native-datatypes.html#lists
You don't need to create empty lists with a specified length. You just add to them and query about their current length if needed.
What you can't do without preparing to catch an exception is to use a non existent index. Which is probably what you are used to in PHP.
You can use this syntax to create a list with n elements:
lst = [0] * n
But be careful! The list will contain n copies of this object. If this object is mutable and you change one element, then all copies will be changed! In this case you should use:
lst = [some_object() for i in xrange(n)]
Then you can access these elements:
for i in xrange(n):
lst[i] += 1
A Python list is comparable to a vector in other languages. It is a resizable array, not a linked list.
Sounds like what you need might be a dictionary rather than an array if you want to insert into specified indices.
dict = {'a': 1, 'b': 2, 'c': 3}
dict['a']
1
I agree with ned that you probably need a dictionary for what you're trying to do. But here's a way to get a list of those lists of categories you can do this:
lst = [list_of_categories.index(row["id"]) for row in rows]
use a dictionary, because what you're really asking for is a structure you can access by arbitrary keys
list = {}
for row in rows:
c = list_of_categories.index(row["id"])
print c
list[c] = row["name"]
Then you can iterate through the known contents with:
for x in list.values():
print x
Or check if something exists in the "list":
if 3 in list:
print "it's there"
I'm not sure if I understood what you mean or want to do, but it seems that you want a list which
is dictonary-like where the index is the key. Even if I think, the usage of a dictonary would be a better
choice, here's my answer: Got a problem - make an object:
class MyList(UserList.UserList):
NO_ITEM = 'noitem'
def insertAt(self, item, index):
length = len(self)
if index < length:
self[index] = item
elif index == length:
self.append(item)
else:
for i in range(0, index-length):
self.append(self.NO_ITEM)
self.append(item)
Maybe some errors in the python syntax (didn't check), but in principle it should work.
Of course the else case works also for the elif, but I thought, it might be a little harder
to read this way.

Categories

Resources