This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Longest strings from list
lst = [str1, str2, str3, ...]
max(lst, key=len)
This returns only one of the strings with max length. Is there any way to do that without defining another procedure?
How about:
maxlen = len(max(l, key=len))
maxlist = [s for s in l if len(s) == maxlen]
If you want to get all the values with the max length, you probably want to sort the list by length; then you just need to take all the values until the length changes. itertools provides multiple ways to do that—takewhile, groupby, etc. For example:
>>> l = ['abc', 'd', 'ef', 'ghi', 'j']
>>> l2 = sorted(l, key=len, reverse=True)
>>> groups = itertools.groupby(len, l2)
>>> maxlen, maxvalues = next(groups)
>>> print(maxlen, list(maxvalues))
3, ['abc', 'ghi']
If you want a one-liner:
>>> maxlen, maxvalues = next(itertools.groupby(len, sorted(l, key=len, reverse=True)))
>>> print(maxlen, list(maxvalues))
Of course you can always just make two passes over the list if you prefer—first to find the max length, then to find all matching values:
>>> maxlen = len(max(l, key=len))
>>> maxvalues = (value for value in l if len(value) == maxlen)
>>> print(maxlen, list(maxvalues))
Just for the sake of completeness, filter is also an option:
maxlens = filter(lambda s: len(s)==max(myList, key=len), myList)
Here is a one-pass solution, collecting longest-seen-so-far words as they are found.
def findLongest(words):
if not words:
return []
worditer = iter(words)
ret = [next(worditer)]
cur_len = len(ret[0])
for wd in worditer:
len_wd = len(wd)
if len_wd > cur_len:
ret = [wd]
cur_len = len_wd
else:
if len_wd == cur_len:
ret.append(wd)
return ret
Here are the results from some test lists:
tests = [
[],
"Four score and seven years ago".split(),
"To be or not to be".split(),
"Now is the winter of our discontent made glorious summer by this sun of York".split(),
]
for test in tests:
print test
print findLongest(test)
print
[]
[]
['Four', 'score', 'and', 'seven', 'years', 'ago']
['score', 'seven', 'years']
['To', 'be', 'or', 'not', 'to', 'be']
['not']
['Now', 'is', 'the', 'winter', 'of', 'our', 'discontent', 'made', 'glorious', 'summer', 'by', 'this', 'sun', 'of', 'York']
['discontent']
Related
I have a list of lists that I would like to iterate over using a for loop, and create a new list with only the unique words. This is similar to a question asked previously, but I could not get the solution to work for me for a list within a list
For example, the nested list is as follows:
ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']],
The desired output would be a single list:
List_Unique = [['is','and','so','he','his','run']]
I have tried the following two variations of code, but the output of all of them is a list of repeats:
unique_redundant = []
for i in redundant_search:
redundant_i = [j for j in i if not i in unique_redundant]
unique_redundant.append(redundant_i)
unique_redundant
unique_redundant = []
for list in redundant_search:
for j in list:
redundant_j = [i for i in j if not i in unique_redundant]
unique_redundant.append(length_j)
unique_redundant
Example output given for the above two (incorrect) variations
(I ran the code on my real set of data and it gave repeating lists within lists of the same pair of words, though this isn't the actual two words, just an example):
List_Unique = [['is','and'],['is','and'],['is','and']]
I'd suggest using the set() class union() in this way:
ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
set().union(*ListofList)
# => {'run', 'and', 'so', 'is', 'his', 'he'}
Explanation
It works like the following:
test_set = set().union([1])
print(test_set)
# => {1}
The asterisk operator before the list (*ListofList) unpacks the list:
lst = [[1], [2], [3]]
print(lst) #=> [[1], [2], [3]]
print(*lst) #=> [1] [2] [3]
First flatten the list with itertools.chain, then use set to return the unique elements and pass that into a list:
from itertools import chain
if __name__ == '__main__':
print([{list(chain(*list_of_lists))}])
Use itertools.chain to flatten the list and dict.fromkeys to keep the unique values in order:
ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
from itertools import chain
List_Unique = [list(dict.fromkeys(chain.from_iterable(ListofList)))]
Just index out nested list with the help of while and acquire all the values in new list while cnt<len(listoflist)
ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
list_new=[]
cnt=0
while cnt<len(ListofList):
for i in ListofList[cnt]:
if i in list_new:
continue
else:
list_new.append(i)
cnt+=1
print(list_new)
OUTPUT
['is', 'and', 'so', 'he', 'his', 'run']
flat_list = [item for sublist in ListofList for item in sublist]
# use this if order should not change
List_Unique = []
for item in flat_list:
if item not in List_Unique:
List_Unique.append(item)
# use this if order is not an issue
# List_Unique = list(set(flat_list))
You could try this:
ListofList = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
uniqueItems = []
for firstList in ListofList:
for item in firstList:
if item not in uniqueItems:
uniqueItems.append(item)
print(uniqueItems)
It uses a nested for loop to access each item and check whether it is in uniqueItems.
using basic set concept, set consists of unique elements
lst = [['is', 'and', 'is'], ['so', 'he', 'his'], ['his', 'run']]
new_list = []
for x in lst:
for y in set(x):
new_list.append(y)
print(list(set(new_list)))
['run', 'and', 'is', 'so', 'he', 'his']
I am having a Python program that creates lists of 3 elements.
['a', 'was', 'mother']
and adds them on an empty list,
output_text=[]
while True:
candidates = [t for t in lines if t[0:2] == last_two]
if not candidates:
break
triplet = random.choice(candidates)
last_two = triplet[1:3]
output_text.append(triplet)
print('\n Επιλογή Matching Τριάδας: \n',triplet)
print('\n Δύο Τελευταίες Λέξεις Matching Τριάδας: \n',last_two)
print(output_text)
I want to create an if statement that keeps adding the 3-element lists to output_text until 200 words (total elements) are being stored.
Any ideas?
You could e.g. combine itertools.cycle and .islice, or just use modulo %:
>>> from itertools import islice, cycle
>>> lst = ['a', 'was', 'mother']
>>> list(islice(cycle(lst), 10))
['a', 'was', 'mother', 'a', 'was', 'mother', 'a', 'was', 'mother', 'a']
>>> [lst[i % len(lst)] for i in range(10)]
['a', 'was', 'mother', 'a', 'was', 'mother', 'a', 'was', 'mother', 'a']
(Technically, this does not append to an empty list but creates the list in one go, but I assume that's okay.)
The idiomatic solution would involve itertools.cycle, which is an iterator that yields items from a given iterable indefinitely, and itertools.islice to grab the first 200 items from the cycle-iterator:
from itertools import cycle, islice
words = list(islice(cycle(("a", "was", "mother")), 200))
I believe it would be easier done with a while loop, like this:
lst = ['a', 'was', 'mother']
output_text = []
while len(output_text) < 200:
if len(output_text) - 200 > 3:
output_text += lst
else:
output_text += lst[:(200-len(output_text))%3]
print(output_text)
This question already has answers here:
Some built-in to pad a list in python
(14 answers)
Finding length of the longest list in an irregular list of lists
(10 answers)
Closed 6 months ago.
I have a list of lists of sentences and I want to pad all sentences so that they are of the same length.
I was able to do this but I am trying to find most optimal ways to do things and challenge myself.
max_length = max(len(sent) for sent in sents)
list_length = len(sents)
sents_padded = [[pad_token for i in range(max_length)] for j in range(list_length)]
for i,sent in enumerate(sents):
sents_padded[i][0:len(sent)] = sent
and I used the inputs:
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
Is my method an efficient way to do it or there are better ways to do it?
This is provided in itertools (in python3) for iteration, with zip_longest, which you can just invert normally with zip(*), and pass it to list if you prefer that over an iterator.
import itertools
from pprint import pprint
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
padded = zip(*itertools.zip_longest(*sents, fillvalue=pad_token))
pprint (list(padded))
[['Hello', 'World', 'Hi', 'Hi'],
['Where', 'are', 'you', 'Hi'],
['I', 'am', 'doing', 'fine']]
Here is how you can use str.ljust() to pad each string, and use max() with a key of len to find the number in which to pad each string:
lst = ['Hello World', 'Good day!', 'How are you?']
l = len(max(lst, key=len)) # The length of the longest sentence
lst = [s.ljust(l) for s in lst] # Pad each sentence with l
print(lst)
Output:
['Hello World ',
'Good day! ',
'How are you?']
Assumption:
The output should be the same as OP output (i.e. same number of words in each sublist).
Inputs:
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
Following 1-liner produces the same output as OP code.
sents_padded = [sent + [pad_token]*(max_length - len(sent)) for sent in sents]
print(sents_padded)
# [['Hello', 'World', 'Hi', 'Hi'], ['Where', 'are', 'you', 'Hi'], ['I', 'am', 'doing', 'fine']]
This seemed to be faster when I timed it:
maxi = 0
for sent in sents:
if sent.__len__() > maxi:
maxi = sent.__len__()
for sent in sents:
while sent.__len__() < maxi:
sent.append(pad_token)
print(sents)
I have a string s and a list of strings, arr.
The length of s is equal to the total length of strings in arr.
I need to split s into a list, such that each element in the list has the same length as the corresponding element in arr.
For example:
s = 'Pythonisanprogramminglanguage'
arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
expected == ['Python', 'is', 'an', 'programming', 'language']
It is much cleaner to use iter with next:
s = 'Pythonisanprogramminglanguage'
arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
new_s = iter(s)
result = [''.join(next(new_s) for _ in i) for i in arr]
Output:
['Python', 'is', 'an', 'programming', 'language']
One way would be to do this:
s = 'Pythonisanprogramminglanguage'
arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
expected = []
i = 0
for word in arr:
expected.append(s[i:i+len(word)])
i+= len(word)
print(expected)
Using a simple for loop this can be done as follows:
s = 'Pythonisanprogramminglanguage'
arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
start_index = 0
expected = list()
for a in arr:
expected.append(s[start_index:start_index+len(a)])
start_index += len(a)
print(expected)
In the future, an alternative approach will be to use an assignment expression (new in Python 3.8):
s = 'Pythonisanprogramminglanguage'
arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
i = 0
expected = [s[i:(i := i+len(word))] for word in arr]
You can use itertools.accumulate to get the positions where you want to split the string:
>>> s = 'Pythonisanprogramminglanguage'
>>> arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
>>> import itertools
>>> L = list(itertools.accumulate(map(len, arr)))
>>> L
[6, 8, 10, 21, 29]
Now if you zip the list with itself, you get the intervals:
>>> list(zip([0]+L, L))
[(0, 6), (6, 8), (8, 10), (10, 21), (21, 29)]
And you just have to use the intervals to split the string:
>>> [s[i:j] for i,j in zip([0]+L, L)]
['Python', 'is', 'an', 'programming', 'language']
The itertools module has a function named accumulate() (added in Py 3.2) which helps make this relatively easy:
from itertools import accumulate # added in Py 3.2
s = 'Pythonisanprogramminglanguage'
arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
cuts = tuple(accumulate(len(item) for item in arr))
words = [s[i:j] for i, j in zip((0,)+cuts, cuts)]
print(words) # -> ['Python', 'is', 'an', 'programming', 'language']
Create a simple loop and use the length of the words as your index:
s = 'Pythonisanprogramminglanguage'
arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
ctr = 0
words = []
for x in arr:
words.append(s[ctr:len(x) + ctr])
ctr += len(x)
print(words)
# ['Python', 'is', 'an', 'programming', 'language']
Here is another approach :
import numpy as np
ar = [0]+list(map(len, arr))
ar = list(np.cumsum(ar))
output_ = [s[i:ar[ar.index(i)+1]] for i in ar[:-1]]
Output :
['Python', 'is', 'an', 'programming', 'language']
One more way
a,l = 0,[]
for i in map(len,arr):
l.append(s[a:a+i])
a+=i
print (l)
#['Python', 'is', 'an', 'programming', 'language']
Props to the answer using iter. The accumulate answers are my favorite. Here is another accumulate answer using map instead of a list comprehension
import itertools
s = 'Pythonisanprogramminglanguage'
arr = ['lkjhgf', 'zx', 'qw', 'ertyuiopakk', 'foacdhlc']
ticks = itertools.accumulate(map(len, arr[0:]))
words = list(map(lambda i, x: s[i:len(x) + i], (0,) + tuple(ticks), arr))
Output:
['Python', 'is', 'an', 'programming', 'language']
You could collect slices off the front of s.
output = []
for word in arr:
i = len(word)
chunk, s = s[:i], s[i:]
output.append(chunk)
print(output) # -> ['Python', 'is', 'an', 'programming', 'language']
Yet another approach would be to create a regex pattern describing the desired length of words. You can replace every character by . (=any character) and surround the words with ():
arr = ['lkjhgf', 'zx', 'q', 'ertyuiopakk', 'foacdhlc']
import re
pattern = '(' + ')('.join(re.sub('.', '.', word) for word in arr) + ')'
#=> '(......)(..)(.)(...........)(........)'
If the pattern matches, you get the desired words in groups directly:
s = 'Pythonisaprogramminglanguage'
re.match(pattern, s).groups()
#=> ('Python', 'is', 'a', 'programming', 'language')
I have a large list of words:
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
I would like to be able to count the number of elements in between (and including) the [tag] elements across the whole list. The goal is to be able to see the frequency distribution.
Can I use range() to start and stop on a string match?
First, find all indices of [tag], the diff between adjacent indices is the number of words.
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
indices = [i for i, x in enumerate(my_list) if x == "[tag]"]
nums = []
for i in range(1,len(indices)):
nums.append(indices[i] - indices[i-1])
A faster way to find all indices is using numpy, like shown below:
import numpy as np
values = np.array(my_list)
searchval = '[tag]'
ii = np.where(values == searchval)[0]
print ii
Another way to get diff between adjacent indices is using itertools,
import itertools
diffs = [y-x for x, y in itertools.izip (indices, indices[1:])]
You can use .index(value, [start, [stop]]) to search through the list.
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
my_list.index('[tag']) # will return 0, as it occurs at the zero-eth element
my_list.index('[/tag]') # will return 6
That will get you your first group length, then on the next iteration you just need to remember what the last closing tag's index was, and use that as the start point, plus 1
my_list.index('[tag]', 7) # will return 7
my_list.index(['[/tag]'), 7) # will return 11
And do that in a loop till you've reached your last closing tag in your list.
Also remember, that .index will raise a ValueError if the value is not present, so you'll need to handle that exception when it occurs.
Solution using list comprehension and string manipulation.
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
# string together your list
my_str = ','.join(mylist)
# split the giant string by tag, gives you a list of comma-separated strings
my_tags = my_str.split('[tag]')
# split for each word in each tag string
my_words = [w.split(',') for w in my_tags]
# count up each list to get a list of counts for each tag, adding one since the first split removed [tag]
my_cnt = [1+len(w) for w in my_words]
Do it one line:
# all as one list comprehension starting with just the string
[1+len(t.split(',')) for t in my_str.split('[tag]')]
This should allow you to find the number of words between and including you tags:
MY_LIST = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]',
'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
def main():
ranges = find_ranges(MY_LIST, '[tag]', '[/tag]')
for index, pair in enumerate(ranges, 1):
print('Range {}: Start = {}, Stop = {}'.format(index, *pair))
start, stop = pair
print(' Size of Range =', stop - start + 1)
def find_ranges(iterable, start, stop):
range_start = None
for index, value in enumerate(iterable):
if value == start:
if range_start is None:
range_start = index
else:
raise ValueError('a start was duplicated before a stop')
elif value == stop:
if range_start is None:
raise ValueError('a stop was seen before a start')
else:
yield range_start, index
range_start = None
if __name__ == '__main__':
main()
This example will print out the following text so you can see how it works:
Range 1: Start = 0, Stop = 6
Size of Range = 7
Range 2: Start = 7, Stop = 11
Size of Range = 5
Range 3: Start = 12, Stop = 15
Size of Range = 4
I would go with the following since the OP wants to count the actual values. (No doubt he has figured out how to do that by now.)
i = [k for k, i in enumerate(my_list) if i == '[tag]']
j = [k for k, p in enumerate(my_list) if p == '[/tag]']
for z in zip(i,j):
print (z[1]-z[0])
Borrowing and slightly modifying the generator code from the selected answer to this question:
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
def group(seq, sep):
g = []
for el in seq:
g.append(el)
if el == sep:
yield g
g = []
counts = [len(x) for x in group(my_list,'[/tag]')]
I changed the generator they gave in that answer to not return the empty list at the end and to include the separator in the list instead of putting it in the next list. Note that this assumes there will always be a matching '[tag]' '[/tag'] pair in that order, and that all the elements in the list are between a pair.
After running this, counts will be [7,5,4]