Related
I have a list like the following:
lst = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
and my desired result is to split the list into sublists like this:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a','a'],['start','b','b','b','end'],['a','a','a','a'],['start','b','b','end']]
so start and end are keywords, is there anyway you can use .split() by using particular keywords/if it matches?
So far I have made a function which finds the indices of 'start' i.e. starting_ind = [3, 9, 18] and ending_ind = [5, 13, 21] however if I do
temp=[]
for i in range(len(starting_ind)):
x = lst[starting_ind[i]: ending_ind[i]]
temp += x
print(temp)
the result is incorrect.
This solution doesn't require you to calculate indices beforehand:
lst = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a', 'a', 'start', 'b', 'b',
'b', 'end', 'a', 'a', 'a', 'a', 'start', 'b', 'b', 'end', 'a', 'a', 'a']
result = []
sublist = []
for el in range(len(lst)):
if lst[el] == 'start':
result.append(sublist.copy())
sublist.clear()
sublist.append(lst[el])
else:
sublist.append(lst[el])
if lst[el] == 'end':
result.append(sublist.copy())
sublist.clear()
if el == len(lst) - 1:
result.append(sublist)
print(result)
The result is:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a', 'a'], ['start', 'b', 'b', 'b', 'end'], ['a', 'a', 'a', 'a'], ['start', 'b', 'b', 'end'], ['a', 'a', 'a']]
Here's a possible way to use regular expression to extract the patterns, please check if it's acceptable:
import re
lst = ['a','a','a', 'start','b','end', 'a','a','a', 'start','b','b','b','end', 'a','a','a','a', 'start','b','b','end']
result = []
for e in re.findall('a_[a_]+|start[_b]+_end', '_'.join(lst)):
result.append(e.strip('_').split('_'))
print(result)
Output is as desired:
[['a', 'a', 'a'],
['start', 'b', 'end'],
['a', 'a', 'a'],
['start', 'b', 'b', 'b', 'end'],
['a', 'a', 'a', 'a'],
['start', 'b', 'b', 'end']]
A better way is this:
result = []
for e in re.split(r'(start[_b]+_end)', '_'.join(lst)):
result.append(e.strip('_').split('_'))
print([x for x in result if x != ['']])
Same output
You can write so:
lst = ['a', 'a', 'a', 'start', 'b', 'end',
'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
temp=[]
ind = [0, 3, 6, 9, 14, 18, 22]
for i in range(len(ind)-1):
x = lst[ind[i]: ind[i+1]]
temp.append(x)
print(temp)
and you will get:
[['a', 'a', 'a'], ['start', 'b', 'end'], ['a', 'a', 'a'], ['start', 'b', 'b', 'b', 'end'], ['a', 'a', 'a', 'a'], ['start', 'b', 'b', 'end']]
If you can be certain that your keywords will always appear in pairs, and in the right order (i.e. there will never be a 'start' without an 'end' that follows it, at some point in the list), this should work:
l = ['a', 'a', 'a', 'start', 'b', 'end', 'a', 'a','a','start','b','b','b','end','a','a','a','a','start','b','b','end']
def get_sublist(l):
try:
return l[:l.index('end') + 1] if l.index('start') == 0 else l[:l.index('start')]
except ValueError:
return l
result = []
while l:
sublist = get_sublist(l)
result.append(sublist)
l = l[len(sublist):]
print(result)
Gives the following result:
[['a', 'a', 'a'],
['start', 'b', 'end'],
['a', 'a', 'a'],
['start', 'b', 'b', 'b', 'end'],
['a', 'a', 'a', 'a'],
['start', 'b', 'b', 'end']]
My dataframe had a column of strings (col A). I tokenized it and now I have:
Input:
Col A
'A', B', 'C', 'dog', 'C', 'C', 'C', 'C'
'A', B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'
I want to get 2 itens right before and after the word 'dog' in a column B. Therefore, I want something like this:
Output:
Col B
'B', 'C', 'dog', 'C', 'C'
'B', 'B', 'dog', 'D', 'A'
How do I get that?
If there must exist one and only one dog in your column.
import pandas as pd
df = pd.DataFrame({'Col A': ["'A', 'B', 'C', 'dog', 'C', 'C', 'C', 'C'", "'A', 'B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'"]})
def extract(l):
l = [e.strip() for e in l]
idx = l.index("'dog'")
return l[(idx-2 if idx-2 >= 0 else 0):idx+3]
df['Col B'] = df['Col A'].str.split(',').apply(extract)
print(df)
Col A Col B
0 'A', 'B', 'C', 'dog', 'C', 'C', 'C', 'C' ['B', 'C', 'dog', 'C', 'C']
1 'A', 'B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D' ['B', 'B', 'dog', 'D', 'A']
This question already has answers here:
How do I use itertools.groupby()?
(15 answers)
Closed last month.
This is my list:
nab = ['b', 'b', 'a', 'b', 'b', 'b', 'a', 'a', 'a', 'a']
I want to combine the same elements which are adjacent into another list, and if they are not the same, just return the element itself.
The output that I am looking for is:
['b', 'a', 'b', 'a']
I mean:
two 'b' ---> 'b', one 'a' ---> 'a', three 'b' ---> 'b', four 'a' ---> 'a'
I want to know the length of the new list.
Thank you so much #tdelaney, I did it as below:
import itertools
nab = ['B', 'B', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'A', 'B', 'B', 'B', 'B', 'A']
U = []
key_func = lambda x: x[0]
for key, group in itertools.groupby(nab, key_func):
U.append(list(group))
print(U)
print(len(U))
Output:
[['B', 'B'], ['A'], ['B', 'B'], ['A', 'A', 'A', 'A'], ['B', 'B', 'B'], ['A', 'A'], ['B', 'B'], ['A', 'A'], ['B'], ['A'], ['B', 'B', 'B', 'B'], ['A']]
I am trying to create dictionary with a reapeating pattern like
{0:"A",
1:"B",
2:"C",
3:"D",
4:"A",
5:"B",
6:"C",
7:"D",}
and so on. How would I do that? I have tried using for loops, but couldn't figure it out.
I'm not even sure this is the right approach to my problem. I am solving a simulation numerous times with the same output, only changing 1 input for every loop of the simulation.
Basically I end up with a DataFrame that collects the output (4 different series) for every simulation with columns
[0, 1, 2, 3, 4, 5, 6, 7, 8, ...]
which I would like to rename
["A", "B", "C", "D", "A", "B", "C", "D",...]
Alternatively, is there some sort of datatype in Python, which can provide 2 levels of categorizing like
[Simulation 1: ["A", "B", "C", "D"],
Simulation 2: ["A", "B", "C", "D"],
Simulation 3: ["A", "B", "C", "D"],
Simulation 4: ["A", "B", "C", "D"],
Simulation 5: ["A", "B", "C", "D"],
and so on...]
where "A", "B", "C" and "D" each contains a column of data output, that is different for every simulation?
You can achieve this neatly with itertools.cycle:
In [1]: import itertools
In [2]: cols = [0, 1, 2, 3, 4, 5, 6, 7]
In [3]: dict(zip(cols, itertools.cycle('ABCD')))
Out[3]: {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'A', 5: 'B', 6: 'C', 7: 'D'}
If you'd rather not import modules you could use dictionary comprehension with a modulus operator (%)
print({i:'ABCD'[i%4] for i in range(12)})
{0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'A', 5: 'B', 6: 'C', 7: 'D', 8: 'A', 9: 'B', 10: 'C', 11: 'D'}
If you want to use a for-loop, you could use the modulo operator along with string.ascii_uppercase:
>>> from string import ascii_uppercase
>>> n = 8
>>> repeat_every = 4
>>> d = {i: ascii_uppercase[i % repeat_every] for i in range(n)}
>>> d
{0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'A', 5: 'B', 6: 'C', 7: 'D'}
Alternatively, is there some sort of datatype in Python, which can
provide 2 levels of categorizing like...
You could use itertoools.permutations inside a dict comprehension:
>>> from itertools import permutations
>>> from string import ascii_uppercase
>>>
>>> def pretty_print_simple_dict(d):
... print("{")
... for k, v in d.items():
... print(f"\t{k}: {v}")
... print("}")
...
>>> repeat_every = 4
>>> d = {
... f"Simulation {i + 1}": list(p)
... for i, p in enumerate(permutations(ascii_uppercase[:repeat_every]))
... }
>>>
>>> pretty_print_simple_dict(d)
{
Simulation 1: ['A', 'B', 'C', 'D']
Simulation 2: ['A', 'B', 'D', 'C']
Simulation 3: ['A', 'C', 'B', 'D']
Simulation 4: ['A', 'C', 'D', 'B']
Simulation 5: ['A', 'D', 'B', 'C']
Simulation 6: ['A', 'D', 'C', 'B']
Simulation 7: ['B', 'A', 'C', 'D']
Simulation 8: ['B', 'A', 'D', 'C']
Simulation 9: ['B', 'C', 'A', 'D']
Simulation 10: ['B', 'C', 'D', 'A']
Simulation 11: ['B', 'D', 'A', 'C']
Simulation 12: ['B', 'D', 'C', 'A']
Simulation 13: ['C', 'A', 'B', 'D']
Simulation 14: ['C', 'A', 'D', 'B']
Simulation 15: ['C', 'B', 'A', 'D']
Simulation 16: ['C', 'B', 'D', 'A']
Simulation 17: ['C', 'D', 'A', 'B']
Simulation 18: ['C', 'D', 'B', 'A']
Simulation 19: ['D', 'A', 'B', 'C']
Simulation 20: ['D', 'A', 'C', 'B']
Simulation 21: ['D', 'B', 'A', 'C']
Simulation 22: ['D', 'B', 'C', 'A']
Simulation 23: ['D', 'C', 'A', 'B']
Simulation 24: ['D', 'C', 'B', 'A']
}
I have the following list:
initial_list = [['B', 'D', 'A', 'C', 'E']]
On each element of the list I apply a function and put the results in a dictionary:
for state in initial_list:
next_dict[state] = move([state], alphabet)
This gives the following result:
next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
What I would like to do is separate the keys from initial_list based on their
values in the next_dict dictionary, basically group the elements of the first list to elements with the same value in the next_dict:
new_list = [['A', 'C'], ['B', 'E'], ['D']]
'A' and 'C' will stay in the same group because they have the same value 'C', 'B' and 'D' will also share the same group because their value is 'D' and then 'D' will be in it's own group.
How can I achieve this result?
You need groupby, after having sorted your list by next_dict values :
It generates a break or new group every time the value of the key
function changes (which is why it is usually necessary to have sorted
the data using the same key function).
from itertools import groupby
initial_list = ['B', 'D', 'A', 'C', 'E']
def move(letter):
return {'A': 'C', 'C': 'C', 'D': 'E', 'E': 'D', 'B': 'D'}.get(letter)
sorted_list = sorted(initial_list, key=move)
print [list(v) for k,v in groupby(sorted_list, key=move)]
#=> [['A', 'C'], ['B', 'E'], ['D']]
Simplest way to achieve this will be to use itertools.groupby with key as dict.get as:
>>> from itertools import groupby
>>> next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
>>> initial_list = ['B', 'D', 'A', 'C', 'E']
>>> [list(i) for _, i in groupby(sorted(initial_list, key=next_dict.get), next_dict.get)]
[['A', 'C'], ['B', 'E'], ['D']]
I'm not exactly sure that's what you want but you can group the values based on their values in the next_dict:
>>> next_dict = {'D': 'E', 'B': 'D', 'A': 'C', 'C': 'C', 'E': 'D'}
>>> # external library but one can also use a defaultdict.
>>> from iteration_utilities import groupedby
>>> groupings = groupedby(['B', 'D', 'A', 'C', 'E'], key=next_dict.__getitem__)
>>> groupings
{'C': ['A', 'C'], 'D': ['B', 'E'], 'E': ['D']}
and then convert that to a list of their values:
>>> list(groupings.values())
[['A', 'C'], ['D'], ['B', 'E']]
Combine everything into a one-liner (not really recommended but a lot of people prefer that):
>>> list(groupedby(['B', 'D', 'A', 'C', 'E'], key=next_dict.__getitem__).values())
[['A', 'C'], ['D'], ['B', 'E']]
Try this:
next_next_dict = {}
for key in next_dict:
if next_dict[key][0] in next_next_dict:
next_next_dict[next_dict[key][0]] += key
else:
next_next_dict[next_dict[key][0]] = [key]
new_list = next_next_dict.values()
Or this:
new_list = []
for value in next_dict.values():
new_value = [key for key in next_dict.keys() if next_dict[key] == value]
if new_value not in new_list:
new_list.append(new_value)
We can sort your list with your dictionary mapping, and then use itertools.groupby to form the groups. The only amendment I made here is making your initial list an actual flat list.
>>> from itertools import groupby
>>> initial_list = ['B', 'D', 'A', 'C', 'E']
>>> next_dict = {'D': ['E'], 'B': ['D'], 'A': ['C'], 'C': ['C'], 'E': ['D']}
>>> s_key = lambda x: next_dict[x]
>>> [list(v) for k, v in groupby(sorted(initial_list, key=s_key), key=s_key)]
[['A', 'C'], ['B', 'E'], ['D']]