How to group list by numbers

How to group list by numbers - python

Hello i have a list of the sort:
lst = [0.0000,0.0542,0.0899,0.7999,0.9999,1.8754]
The list keeps going.
Is there any way i can group the by one?
Like for example every value that is between 0 to 1, and 1 to 2 etc.
As Id like to reppresent it on matplotlib of the count of numbers per group.
I tried everything but no success.

There is a nice library named itertools which has a function called groupby which can help you here. It collects adjacent values in a list together based on a predicate.
from itertools import groupby
lst = [0.0000,0.0542,0.0899,0.7999,0.9999,1.8754]
grouped_lst = [list(g) for k,g in groupby(lst, lambda x:int(x))]
Output:
[[0.0, 0.0542, 0.0899, 0.7999, 0.9999], [1.8754]]
lambda x:int(x) is the predicate here. int will convert your values to an integer i.e remove the decimal point. You can then loop over these 'groups' and convert them to a list using list(g).
Note this method will only work if your list is sorted. Please sort your list beforehand if it may not be sorted.

You can use numpy and numpy_indexed:
import numpy as np
import numpy_indexed as npi
lst = [0.0000,0.0542,0.0899,0.7999,0.9999,1.8754]
npi.group_by(np.trunc(lst), lst)
Output
(array([0., 1.]),
[array([0. , 0.0542, 0.0899, 0.7999, 0.9999]), array([1.8754])])
#keys and groups
You can easily install the library with:
> pip install numpy-indexed

Related

How to index two elements from a Python list?

How come I can't index a Python list for multiple out-of-sequence index positions?
mylist = ['apple','guitar','shirt']
It's easy enough to get one element, but not more than one.
mylist[0] returns 'apple', but mylist[0,2] returns TypeError: list indices must be integers or slices, not tuple
So far, only this seems to work which looks hectic:
np.asarray(mylist)[[0,2]].tolist()

Use Extended Slices:
mylist = ['apple','guitar','shirt']
print(mylist[::2])
#Output: ['apple', 'shirt']

Use list comprehension:
print([mylist[i] for i in [0, 2]])
# ['apple', 'shirt']
Or use numpy.array:
import numpy as np
print(np.array(mylist)[[0, 2]])
# ['apple', 'shirt']

Python list supports only integer and slice for indices. The standard slicing rule of python is as follow:
i:j:k inside the square bracket for accessing more than one element.
where i is the starting index, j is the ending index and k is the steps.
>>> list_ = ['apple','guitar','shirt']
>>> mylist[0:2]
['apple', 'guitar']
if you want some random element as per some certain indices then use List Comprehension or just a for loop
There is an another way for accessing items from certain indices by using map() function.
>>> a_list = [1, 2, 3]
>>> indices_to_access = [0, 2]
>>> accessed_mapping = map(a_list.__getitem__, indices_to_access)
>>> accessed_list = list(accessed_mapping)
>>> accessed_list
[ 1, 3]

A recommendation from me would be: use the NumPy library (import numpy as np). It will allow you to create a numpy array which has advantages over a standard list. Using the numpy array, you will be able to access as many items as you would like through a process called Fancy Indexing.
mylist[0] returns 'apple'
The above code/statement which was available in the question description depicts a Python programmer performing indexing- which is the process of passing the index position of a sinlge item in order to retrieve the item- However in the event of requiring multiple items, that would be difficult/not possible.
import numpy as np #import the numpy package
mylist = np.array(['apple','guitar','shirt']) #create the numpy array
mylist[[0,2]] #return the first and third items ONLY. (zero-indexed)
Out[11]: array(['apple', 'shirt'], dtype='<U6')
If you were to make use of the NumPy library in python (looking above), you would be able to create a NumPy array, which allows for more methods and operations to be performed on your array.
As compared to mylist[0] which returns a single/individual item only, Using mylist[[0,2]] we specify to the python compiler that we wish to retrieve exactly two elements from our list, and those elements are located at index positions '0' and '2'. (zero-indexed). Notice that we passed in the index positions of the desired elements in a list. Therefore instead of returning one element, we return two (or as many as you would like).

Split array of similar numbers

Is there a default python function to be able to separate group of numbers without using a conventional loop?
inputArray=["slide_0000_00.jpg", "slide_0000_01.jpg","slide_0000_02.jpg","slide_0001_01.jpg","slide_0001_02.jpg","slide_0002_01.jpg"]
resultArray=[["slide_0000_01.jpg", "slide_0000_02.jpg", "slide_0000_03.jpg"],["slide_0001_01.jpg", "slide_0001_02.jpg"], ["slide_0002_01.jpg"]]

use itertools.groupby to group consecutive items by middle part:
inputArray=["slide_0000_00.jpg",
"slide_0000_01.jpg",
"slide_0000_02.jpg",
"slide_0001_01.jpg",
"slide_0001_02.jpg",
"slide_0002_01.jpg"]
import itertools
result = [list(g) for _,g in itertools.groupby(inputArray,key = lambda x:x.split("_")[1])]
which gives:
>>> result
[['slide_0000_00.jpg', 'slide_0000_01.jpg', 'slide_0000_02.jpg'],
['slide_0001_01.jpg', 'slide_0001_02.jpg'],
['slide_0002_01.jpg']]
note that if the groups don't follow, the grouping won't work (unless you sort the list first, here simple sort would work but the complexity isn't satisfactory). A classic alternative in that case is to use collections.defaultdict(list):
import collections
d = collections.defaultdict(list)
for x in inputArray:
d[x.split("_")[1]].append(x)
result = list(d.values())
the result is identical (order can vary, depending on the version of python and if dictionaries preserve order. You can expect that property from version 3.5)

Numerically sort comma separated strings of numbers

Looking to sort a set of .csv numeric values column-wise. Optionally, the number of columns varies. For example using Python:
print(sorted(['9,11', '70,10', '10,8,1','10,70']))
produces
['10,70', '10,8,1', '70,10', '9,11']
while the desired result is
['9,11', '10,8,1', '10,70', '70,10']
First, sort by the first column, then by the second, etc.
Obviously this can be done, but can this be done elegantly?

It can be done more elegantly by using the key argument of sorted:
data = [
'9,11',
'70,10',
'10,8,1',
'10,70'
]
print sorted(data, key=lambda s: map(int, s.split(',')))
Result:
['9,11', '10,8,1', '10,70', '70,10']
With the above code we convert each string of our list to a list of integer values and we use this list of integer values as our sorting key

If you don't mind third-party modules, you can use natsort, which is provides the function natsorted which is designed to be a drop-in replacement of sorted.
>>>> import natsort
>>> natsort.natsorted(['9,11', '70,10', '10,8,1','10,70'])
['9,11', '10,8,1', '10,70', '70,10']
Full disclosure, I am the package's author.

Clustering strings of a list and return a list of lists

I have a list of strings as the following one:
a = ['aaa-t1', 'aaa-t2', 'aab-t1', 'aab-t2', 'aab-t3', 'abc-t2']
I would like to cluster those strings by similarity. As you may note, a[0], and a[1] share the same root: aaa. I would like to produce a new list of lists that looks like this:
b = [['aaa-t1', 'aaa-t2'], ['aab-t1', 'aab-t2', 'aab-t3'], ['abc-t2']]
What would be a way to do so?. So far I have not succeeded and I don't have any decent code to show. I was trying comparing strings with fuzzywuzzy but doing so requires creating possible combinations of strings and that scales badly with list's length.

You can use groupby to group the strings by key generated with str.split:
>>> from itertools import groupby
>>> a = ['aaa-t1', 'aaa-t2', 'aab-t1', 'aab-t2', 'aab-t3', 'abc-t2']
>>> [list(g) for k, g in groupby(sorted(a), lambda x: x.split('-', 1)[0])]
[['aaa-t1', 'aaa-t2'], ['aab-t1', 'aab-t2', 'aab-t3'], ['abc-t2']]
groupby returns an iterable of tuples (key, group) where key is a key used for grouping and group is iterable of items in the group. First parameter given to groupby is the iterable to produce groups from and optional second parameter is a key function that is called to produce a key. Since groupby only groups the consecutive elements a needs to be sorted first.

Calculating permutations without repetitions in Python

I have two lists of items:
A = 'mno'
B = 'xyz'
I want to generate all permutations, without replacement, simulating replacing all combinations of items in A with items in B, without repetition. e.g.
>>> do_my_permutation(A, B)
['mno', 'xno', 'mxo', 'mnx', 'xyo', 'mxy', 'xyz', 'zno', 'mzo', 'mnz', ...]
This is straight-forward enough for me to write from scratch, but I'm aware of Python's starndard itertools module, which I believe may already implement this. However, I'm having trouble identifying the function that implements this exact behavior. Is there a function in this module I can use to accomplish this?

Is this what you need:
["".join(elem) for elem in itertools.permutations(A+B, 3)]
and replace permutations with combinations if you want all orderings of the same three letters to be collapsed down into a single item (e.g. so that 'mxo' and 'mox' do not each individually appear in the output).

You're looking for itertools.permutations.
From the docs:
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values.

To have only unique, lexically sorted, permutations, you can use this code:
import itertools
A = 'mno'
B = 'xyz'
s= {"".join(sorted(elem)) for elem in itertools.permutations(A+B, 3)}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to group list by numbers - python

Related

How to index two elements from a Python list?

Split array of similar numbers

Numerically sort comma separated strings of numbers

Clustering strings of a list and return a list of lists

Calculating permutations without repetitions in Python

Categories

Resources