Split list into list of lists by regex - python

I want to split a character list into a list of lists, where the split point is defined by successful Regex match.
For instance, say I have an input list:
["file1","A","B","C","file2","D","E","F","G","H","I"]
I want to produce:
[["file1","A","B","C"],["file2","D","E","F","G","H","I"]]
Where the split points, being file1 and file2 were identified by a successful match to
re.search("file[0-9]+",<TEST STRING>)
It is NOT known in advance, the number of items between each split point, nor is it known how many 'fileXXX' terms are in the original vector.
In reality, my Regex matches are a lot more complicated than this, that is not the concern, what I need help with, if someone would be so kind, is the Pythonic way to execute the split logic?

Assumes the first element will be a proper header. If not, you will need to do some defensive clauses.
import re
result = []
pattern = re.compile(r'^file.*')
for el in input_list:
if pattern.match(el):
row = []
result.append(row)
row.append(el)

The following should work quite nicely:
import re
input_list = ["file1","A","B","C","file2","D","E","F","G","H","I"]
output_list = []
for item in input_list:
if re.match("file[0-9]+", item):
output_list.append([item])
else:
output_list[-1].append(item)
print output_list
Gives the following result:
[['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]
Note, this assumes the first item is a match.
Update
A second approach could be:
input_list = ["1", "2", "file1","A","B","C","file2","D","E","F","G","H","I"]
output_list = []
for item in input_list:
if re.match("file[0-9]+", item) or len(output_list) == 0:
output_list.append([item])
else:
output_list[-1].append(item)
print output_list
This would also cope with the non initial match case:
[['1', '2'], ['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]

You can find the indexes of file\d:
indeces = list(i for i,val in enumerate(my_list) if match('file\d', val))
And then simply group by these indexes:
output = [my_list[indeces[0]:indeces[1]], my_list[indeces[1]:]]
>>> from re import match
>>> my_list = ["file1","A","B","C","file2","D","E","F","G","H","I"]
>>> indeces = list(i for i,val in enumerate(my_list) if match('file\d', val))
>>> [my_list[indeces[0]:indeces[1]], my_list[indeces[1]:]]
[['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]

Related

Joining values of a list inside another list into a string

Im trying to join the letters as a string that's inside the list which is also inside the list. So for example, it looks like this [['a', 'b', 'c'], ['d', 'e', 'f']] however I want the result to look like 'ad be cf' which is basically taking the element that lies in the same position in the list. I know how to join the elements into a list that can look like 'abcdef', however, i don't know which I could add in order to return a string that looks like above.
Any advice would be thankful!
string = ''
new_grid = []
for a in grid:
for b in a:
string += b
return string
When you want to transpose lists into columns, you typically reach for zip(). For example:
l = [['a', 'b', 'c'], ['d', 'e', 'f']]
# make a list of columns
substrings = ["".join(sub) for sub in zip(*l)]
#['ad', 'be', 'cf']
print(" ".join(substrings))
# alternatively print(*substrings, sep=" ")
# ad be cf
This works:
my_list = [['a', 'b', 'c'], ['d', 'e', 'f']]
sorted_list = [list(pair) for pair in zip(my_list[0], my_list[1])]
for i in range(3):
string = ''.join(sorted_list[i])
print(string, end=" ")
First, we are pairing each individual list to its corresponding value using [zip][1], then we are joining it into a string, and printing it out.
This solution may not be the most efficient, but it's simple to understand.
Another quick solution without zip could look like this:
my_list = [['a', 'b', 'c'], ['d', 'e', 'f']]
sorted_list = list(map(lambda a, b: a + b, my_list[0], my_list[1]))
print(" ".join(sorted_list))

How do I create a new list with a nested list comprehension?

Say I have a list of words
word_list = ['cat','dog','rabbit']
and I want to end up with a list of letters (not including any repeated letters), like this:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
without a list comprehension the code would like this:
letter_list=[]
for a_word in word_list:
for a_letter in a_word:
if a_letter not in letter_list:
letter_list.append(a_letter)
print(letter_list)
is there a way to do this with a list comprehension?
I have tried
letter_list = [a_letter for a_letter in a_word for a_word in word_list]
but I get a
NameError: name 'a_word' is not defined
error. I have see answers for similar problems, but they usually iterate over a nested collection (list or tuple). Is there a way to do this from a non-nested list like a_word?
Trying
letter_list = [a_letter for a_letter in [a_word for a_word in word_list]]
Results in the initial list: ['cat','dog','rabbit']
And trying
letter_list = [[a_letter for a_letter in a_word] for a_word in word_list]
Results in:[['c', 'a', 't'], ['d', 'o', 'g'], ['r', 'a', 'b', 'b', 'i', 't']], which is closer to what I want except it's nested lists. Is there a way to do this and have just the letters be in letter_list?
Update. How about this:
word_list = ['cat','dog','rabbit']
new_list = [letter for letter in ''.join(word_list)]
new_list = sorted(set(new_list), key=new_list.index)
print(new_list)
Output:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
word_list = ['cat','dog','rabbit']
letter_list = list(set([letter for word in word_list for letter in word]))
This works and removes the duplicate letters, but the order is not preserved. If you want to keep the order you can do this.
from collections import OrderedDict
word_list = ['cat','dog','rabbit']
letter_list = list(OrderedDict.fromkeys("".join(word_list)))
you can do it by using list comprehension
l=[j for i in word_list for j in i ]
print(l)
output:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'a', 'b', 'b', 'i', 't']
You can use a list comprehension. It is faster than looping in cases like yours when you call .append on each iteration, as explained by this answer.
But if you want to keep only unique letters (i.e. without repeating any letter), you can use a set comprehension by changing the braces [] to curly braces {} as in
letter_set = {letter for letter in word for word in word_list}
This way you avoid checking the partial list on every iteration to see if the letter is already part of the set. Instead you make use of pythons embedded hashing algorithms and make your code a lot faster.
Another solution:
>>> s = set()
>>> word_list = ['cat', 'dog', 'rabbit']
>>> [c for word in word_list for c in word if (c not in s, s.add(c))[0]]
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
This will test whether the letter is already in the set or not, and it will unconditionally add it to the set (having no effect if it is already present). The None returned from s.add is stored in the temporary tuple but otherwise ignored. The first element of the temporary tuple (that is, the result of the c not in s) is used to filter the items.
This relies on the fact that the elements of the temporary tuple are evaluated from left to right.
Could be considered a bit hacky :-)

Take a formatted list in file to list python

I have a file with a list of letters corresponding to another letter:
A['B', 'D']
B['A', 'E']
C[]
D['A', 'G']
E['B', 'H']
F[]
G['D']
H['E']
I need to import these lists to their corresponding letter, to hopefully have variables that look like this:
vertexA = ['B', 'D']
vertexB = ['A', 'E']
vertexC = []
vertexD = ['A', 'G']
vertexE = ['B', 'H']
vertexF = []
vertexG = ['D']
vertexH = ['E']
What would be the best way to do this? I tried searching for an answer but was unlucky in doing so. Thanks for any help.
You can try using dictionaries rather than variables, and I think it makes it easier as well to populate your data from your textfile.
vertex = {}
vertex['A'] = ['B', 'D']
vertex['A']
>>> ['B', 'D']
When you read your input file, the inputs should look like this:
string='A["B","C"]'
So, we know that the first letter is the name of the list.
import ast
your_list=ast.literal_eval(string[1:])
your_list:
['B', 'C']
You can take care of the looping, reading file, and string manipulation for proper naming...
Building a dictionary would probably be best. Each letter of the alphabet would be a key, and then the value would be a list of associated letters. Here's a proof of concept (not tested):
from string import string.ascii_uppercase
vertices = {}
# instantiate dict with uppercase letters of alphabet
for c in ascii_uppercase:
vertices[c] = []
# iterate over file and populate dict
with open("out.txt", "rb") as f:
for i, line in enumerate(f):
if line[0].upper() not in ascii_uppercase:
# you probably want to do some additional error checking
print("Error on line {}: {}".format(i, line))
else: # valid uppercase letter at beginning of line
list_open = line.index('[')
list_close = line.rindex(']') + 1 # one past end
# probably would want to validate record is in correct format before getting here
# translate hack to remove unwanted chars
row_values = line[list_open:list_close].translate(None, "[] '").split(',')
# do some validation for cases where row_values is empty
vertices[line[0].upper()].extend([e for e in row_values if e.strip() != ''])
Using it would then be easy:
for v in vertices['B']:
# do something with v
File A.txt:
A['B', 'D']
B['A', 'E']
C[]
D['A', 'G']
E['B', 'H']
F[]
G['D']
H['E']
The code:
with open('A.txt','r') as file:
file=file.read().splitlines()
listy=[[elem[0],elem[1:].strip('[').strip(']').replace("'",'').replace(' ','').split(',')] for elem in file]
This makes a nested list, but as Christian Dean said, is a better way to go.
Result:
[['A', ['B', 'D']], ['B', ['A', 'E']], ['C', ['']], ['D', ['A', 'G']], ['E', ['B', 'H']], ['F', ['']], ['G', ['D']], ['H', ['E']]]

Removing every third item from list (Deleting entries at regular interval in list)

I want to remove every 3rd item from list.
For Example:
list1 = list(['a','b','c','d','e','f','g','h','i','j'])
After removing indexes which are multiple of three the list will be:
['a','b','d','e','g','h','j']
How can I achieve this?
You may use enumerate():
>>> x = ['a','b','c','d','e','f','g','h','i','j']
>>> [i for j, i in enumerate(x) if (j+1)%3]
['a', 'b', 'd', 'e', 'g', 'h', 'j']
Alternatively, you may create the copy of list and delete the values at interval. For example:
>>> y = list(x) # where x is the list mentioned in above example
>>> del y[2::3] # y[2::3] = ['c', 'f', 'i']
>>> y
['a', 'b', 'd', 'e', 'g', 'h', 'j']
[v for i, v in enumerate(list1) if (i + 1) % 3 != 0]
It seems like you want the third item in the list, which is actually at index 2, gone. This is what the +1 is for.

Python Remove SOME duplicates from a list while maintaining order?

I want to remove certain duplicates in my python list.
I know there are ways to remove all duplicates, but I wanted to remove only consecutive duplicates, while maintaining the list order.
For example, I have a list such as the following:
list1 = [a,a,b,b,c,c,f,f,d,d,e,e,f,f,g,g,c,c]
However, I want to remove the duplicates, and maintain order, but still keep the 2 c's and 2 f's, such as this:
wantedList = [a,b,c,f,d,e,f,g,c]
So far, I have this:
z = 0
j=0
list2=[]
for i in list1:
if i == "c":
z = z+1
if (z==1):
list2.append(i)
if (z==2):
list2.append(i)
else:
pass
elif i == "f":
j = j+1
if (j==1):
list2.append(i)
if (j==2):
list2.append(i)
else:
pass
else:
if i not in list2:
list2.append(i)
However, this method gives me something like:
wantedList = [a,b,c,c,d,e,f,f,g]
Thus, not maintaining the order.
Any ideas would be appreciated! Thanks!
Not completely sure if c and f are special cases, or if you want to compress consecutive duplicates only. If it is the latter, you can use itertools.groupby():
>>> import itertools
>>> list1
['a', 'a', 'b', 'b', 'c', 'c', 'f', 'f', 'd', 'd', 'e', 'e', 'f', 'f', 'g', 'g', 'c', 'c']
>>> [k for k, g in itertools.groupby(list1)]
['a', 'b', 'c', 'f', 'd', 'e', 'f', 'g', 'c']
To remove consecutive duplicates from a list, you can use the following generator function:
def remove_consecutive_duplicates(a):
last = None
for x in a:
if x != last:
yield x
last = x
With your data, this gives:
>>> list1 = ['a','a','b','b','c','c','f','f','d','d','e','e','f','f','g','g','c','c']
>>> list(remove_consecutive_duplicates(list1))
['a', 'b', 'c', 'f', 'd', 'e', 'f', 'g', 'c']
If you want to ignore certain items when removing duplicates...
list2 = []
for item in list1:
if item not in list2 or item in ('c','f'):
list2.append(item)
EDIT: Note that this doesn't remove consecutive items
EDIT
Never mind, I read your question wrong. I thought you were wanting to keep only certain sets of doubles.
I would recommend something like this. It allows a general form to keep certain doubles once.
list1 = ['a','a','b','b','c','c','f','f','d','d','e','e','f','f','g','g','c','c']
doubleslist = ['c', 'f']
def remove_duplicate(firstlist, doubles):
newlist = []
for x in firstlist:
if x not in newlist:
newlist.append(x)
elif x in doubles:
newlist.append(x)
doubles.remove(x)
return newlist
print remove_duplicate(list1, doubleslist)
The simple solution is to compare this element to the next or previous element
a=1
b=2
c=3
d=4
e=5
f=6
g=7
list1 = [a,a,b,b,c,c,f,f,d,d,e,e,f,f,g,g,c,c]
output_list=[list1[0]]
for ctr in range(1, len(list1)):
if list1[ctr] != list1[ctr-1]:
output_list.append(list1[ctr])
print output_list
list1 = ['a', 'a', 'b', 'b', 'c', 'c', 'f', 'f', 'd', 'd', 'e', 'e', 'f', 'f', 'g', 'g', 'c', 'c']
wantedList = []
for item in list1:
if len(wantedList) == 0:
wantedList.append(item)
elif len(wantedList) > 0:
if wantedList[-1] != item:
wantedList.append(item)
print(wantedList)
Fetch each item from the main list(list1).
If the 'temp_list' is empty add that item.
If not , check whether the last item in the temp_list is
not same as the item we fetched from 'list1'.
if items are different append into temp_list.

Categories

Resources