Nested Lists, Python from bash Output - python

How do we create a nested list by taking off the terminal output ?
Ex: I am querying the terminal to get some output (In this case, it's related to Yarn)
import subprocess
outputstring=subprocess.check_output("yarn application -list | grep " + user, shell=True)
mylist = (re.split("[\t\n]+",outputstring))=
This produces a output on each line for each Job that's running on Yarn.
Eg:
line1 = a,b,c,d,e
line2 = f,g,h,i,j
line3 = k,l,m,m,o
I am able to create a list off this output, but as a single list with all the words as comma separated values like
mylist = [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o] using the regex above.
but need to create a list as below:
mylist = [[a,b,c,d,e], [f,g,h,i,j], [k,l,m,n,o]]
i.e:
mylist = [[line1],[line2],[line3]]
can anyone please suggest how to achieve this ?
Regex I am currently using is:
mylist = (re.split("[\t\n]+",outputstring))

Try this list comprehension:
a="""a,b,c,d,e
f,g,h,i,j
k,l,m,m,o"""
mylist=[e.split(",") for e in a.split("\n")]

Can't you just do this?
my_list = [line1.split(','), line2.split(','), line3.split(',')]
or this
initial_list = []
initial_list.append(line1)
initial_list.append(line2)
initial_list.append(line3)
final_list = [x.split(',') for x in initial_list]
I know you surely have not only 3 lines, but if you can do ["a,b,c,d,e,f,g,h,j,k,l"] with your output, maybe you can do this as well.

You can also do it using the map function:
output = """a,b,c,d,e
f,g,h,i,j
k,l,m,m,o"""
output = list(map(lambda i: i.split(','), output.split('\n')))
print(output)
Output:
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j'], ['k', 'l', 'm', 'm', 'o']]

Related

Joining values of a list inside another list into a string

Im trying to join the letters as a string that's inside the list which is also inside the list. So for example, it looks like this [['a', 'b', 'c'], ['d', 'e', 'f']] however I want the result to look like 'ad be cf' which is basically taking the element that lies in the same position in the list. I know how to join the elements into a list that can look like 'abcdef', however, i don't know which I could add in order to return a string that looks like above.
Any advice would be thankful!
string = ''
new_grid = []
for a in grid:
for b in a:
string += b
return string
When you want to transpose lists into columns, you typically reach for zip(). For example:
l = [['a', 'b', 'c'], ['d', 'e', 'f']]
# make a list of columns
substrings = ["".join(sub) for sub in zip(*l)]
#['ad', 'be', 'cf']
print(" ".join(substrings))
# alternatively print(*substrings, sep=" ")
# ad be cf
This works:
my_list = [['a', 'b', 'c'], ['d', 'e', 'f']]
sorted_list = [list(pair) for pair in zip(my_list[0], my_list[1])]
for i in range(3):
string = ''.join(sorted_list[i])
print(string, end=" ")
First, we are pairing each individual list to its corresponding value using [zip][1], then we are joining it into a string, and printing it out.
This solution may not be the most efficient, but it's simple to understand.
Another quick solution without zip could look like this:
my_list = [['a', 'b', 'c'], ['d', 'e', 'f']]
sorted_list = list(map(lambda a, b: a + b, my_list[0], my_list[1]))
print(" ".join(sorted_list))

How do I create a new list with a nested list comprehension?

Say I have a list of words
word_list = ['cat','dog','rabbit']
and I want to end up with a list of letters (not including any repeated letters), like this:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
without a list comprehension the code would like this:
letter_list=[]
for a_word in word_list:
for a_letter in a_word:
if a_letter not in letter_list:
letter_list.append(a_letter)
print(letter_list)
is there a way to do this with a list comprehension?
I have tried
letter_list = [a_letter for a_letter in a_word for a_word in word_list]
but I get a
NameError: name 'a_word' is not defined
error. I have see answers for similar problems, but they usually iterate over a nested collection (list or tuple). Is there a way to do this from a non-nested list like a_word?
Trying
letter_list = [a_letter for a_letter in [a_word for a_word in word_list]]
Results in the initial list: ['cat','dog','rabbit']
And trying
letter_list = [[a_letter for a_letter in a_word] for a_word in word_list]
Results in:[['c', 'a', 't'], ['d', 'o', 'g'], ['r', 'a', 'b', 'b', 'i', 't']], which is closer to what I want except it's nested lists. Is there a way to do this and have just the letters be in letter_list?
Update. How about this:
word_list = ['cat','dog','rabbit']
new_list = [letter for letter in ''.join(word_list)]
new_list = sorted(set(new_list), key=new_list.index)
print(new_list)
Output:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
word_list = ['cat','dog','rabbit']
letter_list = list(set([letter for word in word_list for letter in word]))
This works and removes the duplicate letters, but the order is not preserved. If you want to keep the order you can do this.
from collections import OrderedDict
word_list = ['cat','dog','rabbit']
letter_list = list(OrderedDict.fromkeys("".join(word_list)))
you can do it by using list comprehension
l=[j for i in word_list for j in i ]
print(l)
output:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'a', 'b', 'b', 'i', 't']
You can use a list comprehension. It is faster than looping in cases like yours when you call .append on each iteration, as explained by this answer.
But if you want to keep only unique letters (i.e. without repeating any letter), you can use a set comprehension by changing the braces [] to curly braces {} as in
letter_set = {letter for letter in word for word in word_list}
This way you avoid checking the partial list on every iteration to see if the letter is already part of the set. Instead you make use of pythons embedded hashing algorithms and make your code a lot faster.
Another solution:
>>> s = set()
>>> word_list = ['cat', 'dog', 'rabbit']
>>> [c for word in word_list for c in word if (c not in s, s.add(c))[0]]
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
This will test whether the letter is already in the set or not, and it will unconditionally add it to the set (having no effect if it is already present). The None returned from s.add is stored in the temporary tuple but otherwise ignored. The first element of the temporary tuple (that is, the result of the c not in s) is used to filter the items.
This relies on the fact that the elements of the temporary tuple are evaluated from left to right.
Could be considered a bit hacky :-)

Take a formatted list in file to list python

I have a file with a list of letters corresponding to another letter:
A['B', 'D']
B['A', 'E']
C[]
D['A', 'G']
E['B', 'H']
F[]
G['D']
H['E']
I need to import these lists to their corresponding letter, to hopefully have variables that look like this:
vertexA = ['B', 'D']
vertexB = ['A', 'E']
vertexC = []
vertexD = ['A', 'G']
vertexE = ['B', 'H']
vertexF = []
vertexG = ['D']
vertexH = ['E']
What would be the best way to do this? I tried searching for an answer but was unlucky in doing so. Thanks for any help.
You can try using dictionaries rather than variables, and I think it makes it easier as well to populate your data from your textfile.
vertex = {}
vertex['A'] = ['B', 'D']
vertex['A']
>>> ['B', 'D']
When you read your input file, the inputs should look like this:
string='A["B","C"]'
So, we know that the first letter is the name of the list.
import ast
your_list=ast.literal_eval(string[1:])
your_list:
['B', 'C']
You can take care of the looping, reading file, and string manipulation for proper naming...
Building a dictionary would probably be best. Each letter of the alphabet would be a key, and then the value would be a list of associated letters. Here's a proof of concept (not tested):
from string import string.ascii_uppercase
vertices = {}
# instantiate dict with uppercase letters of alphabet
for c in ascii_uppercase:
vertices[c] = []
# iterate over file and populate dict
with open("out.txt", "rb") as f:
for i, line in enumerate(f):
if line[0].upper() not in ascii_uppercase:
# you probably want to do some additional error checking
print("Error on line {}: {}".format(i, line))
else: # valid uppercase letter at beginning of line
list_open = line.index('[')
list_close = line.rindex(']') + 1 # one past end
# probably would want to validate record is in correct format before getting here
# translate hack to remove unwanted chars
row_values = line[list_open:list_close].translate(None, "[] '").split(',')
# do some validation for cases where row_values is empty
vertices[line[0].upper()].extend([e for e in row_values if e.strip() != ''])
Using it would then be easy:
for v in vertices['B']:
# do something with v
File A.txt:
A['B', 'D']
B['A', 'E']
C[]
D['A', 'G']
E['B', 'H']
F[]
G['D']
H['E']
The code:
with open('A.txt','r') as file:
file=file.read().splitlines()
listy=[[elem[0],elem[1:].strip('[').strip(']').replace("'",'').replace(' ','').split(',')] for elem in file]
This makes a nested list, but as Christian Dean said, is a better way to go.
Result:
[['A', ['B', 'D']], ['B', ['A', 'E']], ['C', ['']], ['D', ['A', 'G']], ['E', ['B', 'H']], ['F', ['']], ['G', ['D']], ['H', ['E']]]

Split list into list of lists by regex

I want to split a character list into a list of lists, where the split point is defined by successful Regex match.
For instance, say I have an input list:
["file1","A","B","C","file2","D","E","F","G","H","I"]
I want to produce:
[["file1","A","B","C"],["file2","D","E","F","G","H","I"]]
Where the split points, being file1 and file2 were identified by a successful match to
re.search("file[0-9]+",<TEST STRING>)
It is NOT known in advance, the number of items between each split point, nor is it known how many 'fileXXX' terms are in the original vector.
In reality, my Regex matches are a lot more complicated than this, that is not the concern, what I need help with, if someone would be so kind, is the Pythonic way to execute the split logic?
Assumes the first element will be a proper header. If not, you will need to do some defensive clauses.
import re
result = []
pattern = re.compile(r'^file.*')
for el in input_list:
if pattern.match(el):
row = []
result.append(row)
row.append(el)
The following should work quite nicely:
import re
input_list = ["file1","A","B","C","file2","D","E","F","G","H","I"]
output_list = []
for item in input_list:
if re.match("file[0-9]+", item):
output_list.append([item])
else:
output_list[-1].append(item)
print output_list
Gives the following result:
[['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]
Note, this assumes the first item is a match.
Update
A second approach could be:
input_list = ["1", "2", "file1","A","B","C","file2","D","E","F","G","H","I"]
output_list = []
for item in input_list:
if re.match("file[0-9]+", item) or len(output_list) == 0:
output_list.append([item])
else:
output_list[-1].append(item)
print output_list
This would also cope with the non initial match case:
[['1', '2'], ['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]
You can find the indexes of file\d:
indeces = list(i for i,val in enumerate(my_list) if match('file\d', val))
And then simply group by these indexes:
output = [my_list[indeces[0]:indeces[1]], my_list[indeces[1]:]]
>>> from re import match
>>> my_list = ["file1","A","B","C","file2","D","E","F","G","H","I"]
>>> indeces = list(i for i,val in enumerate(my_list) if match('file\d', val))
>>> [my_list[indeces[0]:indeces[1]], my_list[indeces[1]:]]
[['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]

Unable to perform join in the inner lists of a list of lists

I have a list of list in which the inner lists has members that are strings of single character separated by a comma. I am trying to define a function that would iterate over the inner list and perform a join function.
I have a list such as [['E', 'F', 'J', 'A',], ['S', 'D', 'G', 'K], ['A', 'S', 'R', 'J',], ['H', 'E', 'A', 'N']]
My target list is ==> [['EFJA'], ['SDGK'], ['ASRJ'], ['HEAN']]
I used the following
def newlist(old_list):
for i in old_list:
sep = ('')
newlist = sep.join(i)
return newlist
By running the function I obtain a string which is the result of performing the join on only the first inner list in the list of list i.e. 'EFJA'
But using the ide directly i obtain this
d = [['E', 'F', 'J', 'A',], ['S', 'D', 'G', 'K'], ['A', 'S', 'R', 'J',], ['H', 'E', 'A', 'N']]
sep = ('')
for i in d:
new = sep.join(i)
print(new)
OUTPUT
EFJA
SDGK
ASRJ
HEAN
The desired output is obtained. I would like to obtain same output with the function i defined.
You should put all your individual results into another list and then only when you’re done, return the final list:
def newlist(old_list):
new_list = []
for i in old_list:
sep = ''
new_list.append(sep.join(i))
return new_list
Or you can shorten this to:
def newlist(old_list):
return [''.join(x) for x in old_list]
As you seem to want every string within its own single-element list (tbh. that does not make that much sense to me), you can just put the result of the join within extra brackets to create that extra list. So in the verbose function, you would do this:
new_list.append([sep.join(i)])
Or using the solution with the list comprehension, you would do
return [[''.join(x)] for x in old_list]

Categories

Resources