Split array of similar numbers - python

Is there a default python function to be able to separate group of numbers without using a conventional loop?
inputArray=["slide_0000_00.jpg", "slide_0000_01.jpg","slide_0000_02.jpg","slide_0001_01.jpg","slide_0001_02.jpg","slide_0002_01.jpg"]
resultArray=[["slide_0000_01.jpg", "slide_0000_02.jpg", "slide_0000_03.jpg"],["slide_0001_01.jpg", "slide_0001_02.jpg"], ["slide_0002_01.jpg"]]

use itertools.groupby to group consecutive items by middle part:
inputArray=["slide_0000_00.jpg",
"slide_0000_01.jpg",
"slide_0000_02.jpg",
"slide_0001_01.jpg",
"slide_0001_02.jpg",
"slide_0002_01.jpg"]
import itertools
result = [list(g) for _,g in itertools.groupby(inputArray,key = lambda x:x.split("_")[1])]
which gives:
>>> result
[['slide_0000_00.jpg', 'slide_0000_01.jpg', 'slide_0000_02.jpg'],
['slide_0001_01.jpg', 'slide_0001_02.jpg'],
['slide_0002_01.jpg']]
note that if the groups don't follow, the grouping won't work (unless you sort the list first, here simple sort would work but the complexity isn't satisfactory). A classic alternative in that case is to use collections.defaultdict(list):
import collections
d = collections.defaultdict(list)
for x in inputArray:
d[x.split("_")[1]].append(x)
result = list(d.values())
the result is identical (order can vary, depending on the version of python and if dictionaries preserve order. You can expect that property from version 3.5)

Related

How to group list by numbers

Hello i have a list of the sort:
lst = [0.0000,0.0542,0.0899,0.7999,0.9999,1.8754]
The list keeps going.
Is there any way i can group the by one?
Like for example every value that is between 0 to 1, and 1 to 2 etc.
As Id like to reppresent it on matplotlib of the count of numbers per group.
I tried everything but no success.
There is a nice library named itertools which has a function called groupby which can help you here. It collects adjacent values in a list together based on a predicate.
from itertools import groupby
lst = [0.0000,0.0542,0.0899,0.7999,0.9999,1.8754]
grouped_lst = [list(g) for k,g in groupby(lst, lambda x:int(x))]
Output:
[[0.0, 0.0542, 0.0899, 0.7999, 0.9999], [1.8754]]
lambda x:int(x) is the predicate here. int will convert your values to an integer i.e remove the decimal point. You can then loop over these 'groups' and convert them to a list using list(g).
Note this method will only work if your list is sorted. Please sort your list beforehand if it may not be sorted.
You can use numpy and numpy_indexed:
import numpy as np
import numpy_indexed as npi
lst = [0.0000,0.0542,0.0899,0.7999,0.9999,1.8754]
npi.group_by(np.trunc(lst), lst)
Output
(array([0., 1.]),
[array([0. , 0.0542, 0.0899, 0.7999, 0.9999]), array([1.8754])])
#keys and groups
You can easily install the library with:
> pip install numpy-indexed

Numerically sort comma separated strings of numbers

Looking to sort a set of .csv numeric values column-wise. Optionally, the number of columns varies. For example using Python:
print(sorted(['9,11', '70,10', '10,8,1','10,70']))
produces
['10,70', '10,8,1', '70,10', '9,11']
while the desired result is
['9,11', '10,8,1', '10,70', '70,10']
First, sort by the first column, then by the second, etc.
Obviously this can be done, but can this be done elegantly?
It can be done more elegantly by using the key argument of sorted:
data = [
'9,11',
'70,10',
'10,8,1',
'10,70'
]
print sorted(data, key=lambda s: map(int, s.split(',')))
Result:
['9,11', '10,8,1', '10,70', '70,10']
With the above code we convert each string of our list to a list of integer values and we use this list of integer values as our sorting key
If you don't mind third-party modules, you can use natsort, which is provides the function natsorted which is designed to be a drop-in replacement of sorted.
>>>> import natsort
>>> natsort.natsorted(['9,11', '70,10', '10,8,1','10,70'])
['9,11', '10,8,1', '10,70', '70,10']
Full disclosure, I am the package's author.

Clustering strings of a list and return a list of lists

I have a list of strings as the following one:
a = ['aaa-t1', 'aaa-t2', 'aab-t1', 'aab-t2', 'aab-t3', 'abc-t2']
I would like to cluster those strings by similarity. As you may note, a[0], and a[1] share the same root: aaa. I would like to produce a new list of lists that looks like this:
b = [['aaa-t1', 'aaa-t2'], ['aab-t1', 'aab-t2', 'aab-t3'], ['abc-t2']]
What would be a way to do so?. So far I have not succeeded and I don't have any decent code to show. I was trying comparing strings with fuzzywuzzy but doing so requires creating possible combinations of strings and that scales badly with list's length.
You can use groupby to group the strings by key generated with str.split:
>>> from itertools import groupby
>>> a = ['aaa-t1', 'aaa-t2', 'aab-t1', 'aab-t2', 'aab-t3', 'abc-t2']
>>> [list(g) for k, g in groupby(sorted(a), lambda x: x.split('-', 1)[0])]
[['aaa-t1', 'aaa-t2'], ['aab-t1', 'aab-t2', 'aab-t3'], ['abc-t2']]
groupby returns an iterable of tuples (key, group) where key is a key used for grouping and group is iterable of items in the group. First parameter given to groupby is the iterable to produce groups from and optional second parameter is a key function that is called to produce a key. Since groupby only groups the consecutive elements a needs to be sorted first.

Calculating permutations without repetitions in Python

I have two lists of items:
A = 'mno'
B = 'xyz'
I want to generate all permutations, without replacement, simulating replacing all combinations of items in A with items in B, without repetition. e.g.
>>> do_my_permutation(A, B)
['mno', 'xno', 'mxo', 'mnx', 'xyo', 'mxy', 'xyz', 'zno', 'mzo', 'mnz', ...]
This is straight-forward enough for me to write from scratch, but I'm aware of Python's starndard itertools module, which I believe may already implement this. However, I'm having trouble identifying the function that implements this exact behavior. Is there a function in this module I can use to accomplish this?
Is this what you need:
["".join(elem) for elem in itertools.permutations(A+B, 3)]
and replace permutations with combinations if you want all orderings of the same three letters to be collapsed down into a single item (e.g. so that 'mxo' and 'mox' do not each individually appear in the output).
You're looking for itertools.permutations.
From the docs:
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values.
To have only unique, lexically sorted, permutations, you can use this code:
import itertools
A = 'mno'
B = 'xyz'
s= {"".join(sorted(elem)) for elem in itertools.permutations(A+B, 3)}

How to get value on a certain index, in a python list?

I have a list which looks something like this
List = [q1,a1,q2,a2,q3,a3]
I need the final code to be something like this
dictionary = {q1:a1,q2:a2,q3:a3}
if only I can get values at a certain index e.g List[0] I can accomplish this, is there any way I can get it?
Python dictionaries can be constructed using the dict class, given an iterable containing tuples. We can use this in conjunction with the range builtin to produce a collection of tuples as in (every-odd-item, every-even-item), and pass it to dict, such that the values organize themselves into key/value pairs in the final result:
dictionary = dict([(List[i], List[i+1]) for i in range(0, len(List), 2)])
Using extended slice notation:
dictionary = dict(zip(List[0::2], List[1::2]))
The range-based answer is simpler, but there's another approach possible using the itertools package:
from itertools import izip
dictionary = dict(izip(*[iter(List)] * 2))
Breaking this down (edit: tested this time):
# Create instance of iterator wrapped around List
# which will consume items one at a time when called.
iter(List)
# Put reference to iterator into list and duplicate it so
# there are two references to the *same* iterator.
[iter(List)] * 2
# Pass each item in the list as a separate argument to the
# izip() function. This uses the special * syntax that takes
# a sequence and spreads it across a number of positional arguments.
izip(* [iter(List)] * 2)
# Use regular dict() constructor, same as in the answer by zzzeeek
dict(izip(* [iter(List)] * 2))
Edit: much thanks to Chris Lutz' sharp eyes for the double correction.
d = {}
for i in range(0, len(List), 2):
d[List[i]] = List[i+1]
You've mentioned in the comments that you have duplicate entries. We can work with this. Take your favorite method of generating the list of tuples, and expand it into a for loop:
from itertools import izip
dictionary = {}
for k, v in izip(List[::2], List[1::2]):
if k not in dictionary:
dictionary[k] = set()
dictionary[k].add(v)
Or we could use collections.defaultdict so we don't have to check if a key is already initialized:
from itertools import izip
from collections import defaultdict
dictionary = defaultdict(set)
for k, v in izip(List[::2], List[1::2]):
dictionary[k].add(v)
We'll end with a dictionary where all the keys are sets, and the sets contain the values. This still may not be appropriate, because sets, like dictionaries, cannot hold duplicates, so if you need a single key to hold two of the same value, you'll need to change it to a tuple or a list. But this should get you started.

Categories

Resources