I have some extremely large lists of character strings I need to parse. I need to break them into smaller lists based on a pre-defined character string, and I figured out a way to do it, but I worry that this will not be performant on my real data. Is there a better way to do this?
My goal is to turn this list:
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
Into this list:
[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]
What I tried:
# List that replicates my data. `string_to_split_on` is a fixed character string I want to break my list up on
my_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
# Inspect List
print(my_list)
# Create empty lists to store dat ain
new_list = []
good_letters = []
# Iterate over each string in the list
for i in my_list:
# If the string is the seporator, append data to new_list, reset `good_letters` and move to the next string
if i == 'string_to_split_on':
new_list.append(good_letters)
good_letters = []
continue
# Append letter to the list of good letters
else:
good_letters.append(i)
# I just like printing things thay because its easy to read
for item in new_list:
print(item)
print('-'*100)
### Output
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
['a', 'b']
----------------------------------------------------------------------------------------------------
['c', 'd', 'e', 'f', 'g']
----------------------------------------------------------------------------------------------------
['h', 'i', 'j', 'k']
----------------------------------------------------------------------------------------------------
You can also use one line of code:
original_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
split_string = 'string_to_split_on'
new_list = [sublist.split() for sublist in ' '.join(original_list).split(split_string) if sublist]
print(new_list)
This approach is more efficient when dealing with large data set:
import itertools
new_list = [list(j) for k, j in itertools.groupby(original_list, lambda x: x != split_string) if k]
print(new_list)
[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]
Related
This question already has answers here:
Transpose list of lists
(14 answers)
Closed last month.
I have a list of 4 list show below.
list1 = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', 'k', 'l']]
How do I create a list of list by element position so that the new list of list is as follows?
list2 = [['a', 'd', 'g', 'j'], ['b', 'e', 'h', 'k'], ['c', 'f', 'i', 'l']]
I tried using a for loop such as
res = []
for listing in list1:
for i in list:
res.append(i)
however it just created a single list.
Use zip with the * operator to zip all of the sublists together. The resulting tuples will have the list contents you want, so just use list() to convert them into lists.
>>> list1 = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', 'k', 'l']]
>>> [list(z) for z in zip(*list1)]
[['a', 'd', 'g', 'j'], ['b', 'e', 'h', 'k'], ['c', 'f', 'i', 'l']]
I have the following list:
lst = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
I would like to sort the list such that every sixth element comes after the fifth value, eleventh after the second, second after the third, so on. The list should be of the following output:
['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
What I tried so far?
lst = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
new_lst = [lst[0], lst[5], lst[10], lst[1], lst[6], lst[11], lst[2], lst[7], lst[12], lst[3], lst[8], lst[13] , lst[4], lst[9], lst[14]]
new_lst
This provides the desired output, but I am looking for an optimal script. How do I do that?
From the pattern, reshape as 2d then transpose and flatten
sum is convenient function where you can mention start point, in this case the identity is () or [], depending on type
### sol 1
import numpy as np
print('Using numpy')
x = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
np.array(x).reshape((-1, 5)).transpose().reshape(-1)
# array(['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O'], dtype='<U1')
# Sol 2
print('One more way without numpy')
list(
sum(
zip(x[:6], x[5:11], x[10:]),
()
)
)
# Sol 3
print('One more way without numpy')
sum(
[list(y) for y in zip(x[:6], x[5:11], x[10:])],
[]
)
# Sol 4
print('One more way without numpy')
list(
sum(
[y for y in zip(x[:6], x[5:11], x[10:])],
()
)
)
You can also use list comprehension if you want to avoid libraries:
lst = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
[x for t in zip(lst[:6], lst[5:11], lst[10:]) for x in t]
# ['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
If you want it repeating for every fifth and tenth element after current, then it would be
# Must consist of at least 14 values
input_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
output_list = []
for i in range(len(t) // 3):
output_list.append(t[i])
output_list.append(t[i + 5])
output_list.append(t[i + 10])
print(output_list)
No libraries used. It will give the desired result:
['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
If i have a list
lst = ['a', 'k', 'b', 'c', 'k', 'd', 'e', 'g']
and I want to split into new list without 'k', and turn it into a tuple. So I get
(['a'],['b', 'c'], ['d', 'e', 'g'])
I am thinking about first splitting them into different list by using a for loop.
new_lst = []
for element in lst:
if element != 'k':
new_ist.append(element)
This does remove all the 'k' but they are all together. I do not know how to split them into different list. To turn a list into a tuple I would need to make a list inside a list
a = [['a'],['b', 'c'], ['d', 'e', 'g']]
tuple(a) == (['a'], ['b', 'c'], ['d', 'e', 'g'])
True
So the question would be how to split the list into a list with sublist.
You are close. You can append to another list called sublist and if you find a k append sublist to new_list:
lst = ['a', 'k', 'b', 'c', 'k', 'd', 'e', 'g']
new_lst = []
sublist = []
for element in lst:
if element != 'k':
sublist.append(element)
else:
new_lst.append(sublist)
sublist = []
if sublist: # add the last sublist
new_lst.append(sublist)
result = tuple(new_lst)
print(result)
# (['a'], ['b', 'c'], ['d', 'e', 'g'])
If you're feeling adventurous, you can also use groupby. The idea is to group elements as "k" or "non-k" and use groupby on that property:
from itertools import groupby
lst = ['a', 'k', 'b', 'c', 'k', 'd', 'e', 'g']
result = tuple(list(gp) for is_k, gp in groupby(lst, "k".__eq__) if not is_k)
print(result)
# (['a'], ['b', 'c'], ['d', 'e', 'g'])
Thanks #YakymPirozhenko for the simpler generator expression
tuple(list(i) for i in ''.join(lst).split('k'))
Output:
(['a'], ['b', 'c'], ['d', 'e', 'g'])
Here's a different approach, using re.split from the re module, and map:
import re
lst = ['a', 'k', 'b', 'c', 'k', 'd', 'e', 'g']
tuple(map(list, re.split('k',''.join(lst))))
(['a'], ['b', 'c'], ['d', 'e', 'g'])
smallerlist = [l.split(',') for l in ','.join(lst).split('k')]
print(smallerlist)
Outputs
[['a', ''], ['', 'b', 'c', ''], ['', 'd', 'e', 'g']]
Then you could check if each sub lists contain ''
smallerlist = [' '.join(l).split() for l in smallerlist]
print(smallerlist)
Outputs
[['a'], ['b', 'c'], ['d', 'e', 'g']]
How about slicing, without appending and joining .
def isplit_list(lst, v):
while True:
try:
end = lst.index(v)
except ValueError:
break
yield lst[:end]
lst = lst[end+1:]
if len(lst):
yield lst
lst = ['a', 'k', 'b', 'c', 'k', 'd', 'e', 'g', 'k']
results = tuple(isplit_list(lst, 'k'))
Try this, works and doesn't need any imports!
>>> l = ['a', 'k', 'b', 'c', 'k', 'd', 'e', 'g']
>>> t = []
>>> for s in ''.join(l).split('k'):
... t.append(list(s))
...
>>> t
[['a'], ['b', 'c'], ['d', 'e', 'g']]
>>> t = tuple(t)
>>> t
(['a'], ['b', 'c'], ['d', 'e', 'g'])
Why don't you make a method which will take a list as an argument and return a tuple like so.
>>> def list_to_tuple(l):
... t = []
... for s in l:
... t.append(list(s))
... return tuple(t)
...
>>> l = ['a', 'k', 'b', 'c', 'k', 'd', 'e', 'g']
>>> l = ''.join(l).split('k')
>>> l = list_to_tuple(l)
>>> l
(['a'], ['b', 'c'], ['d', 'e', 'g'])
Another approach using itertools
import more_itertools
lst = ['a', 'k', 'b', 'c', 'k', 'd', 'e', 'g']
print(tuple(more_itertools.split_at(lst, lambda x: x == 'k')))
gives
(['a'], ['b', 'c'], ['d', 'e', 'g'])
This question already has answers here:
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
Closed 4 years ago.
I am currently writing a program but came across an error in the code. I got it down to this:
This is the basis of the problem:
original_list = ["a","b","c","d","e","f","g","h","i","j"]
value = "a"
new_list = original_list
print(original_list)
new_list.pop(new_list.index(value))
print(original_list)
print(new_list)
I would expect this to output:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
But instead it gives this, where the value "a" has been removed from the original list:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
I cannot seem to figure out why, does anyone know?
When you write new_list = original_list, you are just making another name for the same old list.
If you want to really create a new list, you need to clone the old one. One way is to use new_list = list(original_list). The best way depends on the contents of the list.
From a list of 8 possible letters I want to generate a random sequence where each element is separated from an identical element by at least six different elements.
sequence_list = []
target_list = ["a","b","c","d","e","f","g","h"]
for i in range(1,41):
sequence_list.append(random.choice(target_list))
print sequence_list
For example if the first letter in sequence_list is an a it should not be repeated for at least the next 6 items in the list. Same for every other item.
Appreciate your help.
this is probably not the most efficient way of doing it, but you can do it like this:
>>> target_list = list(string.ascii_letters[:8])
>>> target_list
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>> sequence_list = []
>>> for i in range(1,41):
... el_list = [x for x in target_list if x not in sequence_list[-6:]]
... sequence_list.append(random.choice(el_list))
...
>>>
>>> sequence_list
['e', 'h', 'g', 'a', 'c', 'f', 'd', 'b', 'e', 'h', 'g', 'c', 'f', 'd', 'a', 'e', 'b', 'g', 'c', 'f', 'h', 'a', 'e', 'b', 'g', 'd', 'f', 'c', 'h', 'a', 'b', 'g', 'd', 'f', 'e', 'c', 'a', 'b', 'h', 'd']