'Clumping' a list in python - python

I've been trying to 'clump' a list
I mean putting items together depending on the item inbetween, so ['d','-','g','p','q','-','a','v','i'] becomes ['d-g','p','q-a','v','i'] when 'clumped' around any '-'
Here's my attempt:
def clump(List):
box = []
for item in List:
try:
if List[List.index(item) + 1] == "-":
box.append("".join(List[List.index(item):List.index(item)+3]))
else:
box.append(item)
except:
pass
return box
However, it outputs (for the example above)
['d-g', '-', 'g', 'p', 'q-a', '-', 'a', 'v']
As I have no idea how to skip the next two items
Also, the code is a complete mess, mainly due to the try and except statement (I use it, otherwise I get an IndexError, when it reaches the last item)
How can it be fixed (or completely rewritten)?
Thanks

Here's an O(n) solution that maintains a flag determining whether or not you are currently clumping. It then manipulates the last item in the list based on this condition:
def clump(arr):
started = False
out = []
for item in arr:
if item == '-':
started = True
out[-1] += item
elif started:
out[-1] += item
started = False
else:
out.append(item)
return out
In action:
In [53]: clump(x)
Out[53]: ['d-g', 'p', 'q-a', 'v', 'i']
This solution will fail if the first item in the list is a dash, but that seems like it should be an invalid input.

Here is a solution using re.sub
>>> import re
>>> l = ['d','-','g','p','q','-','a','v','i']
>>> re.sub(':-:', '-', ':'.join(l)).split(':')
['d-g', 'p', 'q-a', 'v', 'i']

And here is another solution using itertools.zip_longest
>>> from itertools import zip_longest
>>> l = ['d','-','g','p','q','-','a','v','i']
>>> [x+y+z if y=='-' else x for x,y,z in zip_longest(l, l[1:], l[2:], fillvalue='') if '-' not in [x,z]]
['d-g', 'g', 'q-a', 'a', 'v', 'i']

Related

How can method which evaluates a list to determine if it contains specific consecutive items be improved?

I have a nested list of tens of millions of lists (I can use tuples also). Each list is 2-7 items long. Each item in a list is a string of 1-5 characters and occurs no more than once per list. (I use single char items in my example below for simplicity)
#Example nestedList:
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
I need to find which lists in my nested list contain a pair of items so I can do stuff to these lists while ignoring the rest. This needs to be as efficient as possible.
I am using the following function but it seems pretty slow and I just know there has to be a smarter way to do this.
def isBadInList(bad, checkThisList):
numChecks = len(list) - 1
for x in range(numChecks):
if checkThisList[x] == bad[0] and checkThisList[x + 1] == bad[1]:
return True
elif checkThisList[x] == bad[1] and checkThisList[x + 1] == bad[0]:
return True
return False
I will do this,
bad = ['O', 'I']
for checkThisList in nestedLists:
result = isBadInList(bad, checkThisList)
if result:
doStuffToList(checkThisList)
#The function isBadInList() only returns true for the first and third list in nestedList and false for all else.
I need a way to do this faster if possible. I can use tuples instead of lists, or whatever it takes.
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
#first create a map
pairdict = dict()
for i in range(len(nestedList)):
for j in range(len(nestedList[i])-1):
pair1 = (nestedList[i][j],nestedList[i][j+1])
if pair1 in pairdict:
pairdict[pair1].append(i+1)
else:
pairdict[pair1] = [i+1]
pair2 = (nestedList[i][j+1],nestedList[i][j])
if pair2 in pairdict:
pairdict[pair2].append(i+1)
else:
pairdict[pair2] = [i+1]
del nestedList
print(pairdict.get(('e','z'),None))
create a value pair and store them into map,the key is pair,value is index,and then del your list(this maybe takes too much memory),
and then ,you can take advantage of the dict for look up,and print the indexes where the value appears.
I think you could use some regex here to speed this up, although it will still be a sequential operation so your best case is O(n) using this approach since you have to iterate through each list, however since we have to iterate over every sublist as well that would make it O(n^2).
import re
p = re.compile('[OI]{2}|[IO]{2}') # match only OI or IO
def is_bad(pattern, to_check):
for item in to_check:
maybe_found = pattern.search(''.join(item))
if maybe_found:
yield True
else:
yield False
l = list(is_bad(p, nestedList))
print(l)
# [True, False, True]

Counting values "in a row" in list of lists vertically or horizontally

I'm writing a function that counts the number of occurrences of a specific value of a list of lists in a row whether horizontal or vertical. Then it just needs to return the value of how many times it occurred. Here's an example
lst=[['.','.','.','e'],
['A','A','.','e'],
['.','.','.','e'],
['.','X','X','X'],
['.','.','.','.'],
['.','.','.','e']]
For this list of lists, the function should return 3 for e as it appears 3 times in a row, 2 for A, and 3 for X. Thank you for your time
My code so far:
def length_of_row(symbol,lot):
count = 0
for sublist in lot:
for x in sublist:
if x == symbol:
count += 1
continue
else:
continue
return count
You can try the following if you don't mind changing things a little bit:
from functools import reduce
from itertools import takewhile
def length_of_row(symbol, lot):
if symbol not in reduce(lambda x,y: x+y, lot):
return 0
elif symbol in lot[0]:
good_lot = map(lambda y: y.count(symbol),takewhile(lambda x: symbol in x, lot))
return sum(good_lot)
else:
return length_of_row(symbol, lot[1:])
This uses a combination of recursion and one of python's powerful itertools methods (takewhile). The idea is to count the number of symbols until you hit a list that does not contain that symbol. Also, it tries to make sure that it only counts the occurrences of the symbol if said symbol is in the list of lists.
Using it:
lst = [['.', '.', '.', 'e'],
['A', 'A', '.', 'e'],
['.', '.', '.', 'e'],
['.', 'X', 'X', 'X'],
['.', '.', '.', '.'],
['.', '.', '.', 'e']]
print(length_of_row('e', lst))
print(length_of_row('X', lst))
print(length_of_row('A', lst))
print(length_of_row('f', lst))
#3
#3
#2
#0
As you can see, if the symbol does not exist it returns 0.
Edit:
If you don't wish to import the takewhile function from itertools, you can use the approximate definition provided in the documentation. But just keep in mind that it is not as optimized as the itertools method:
def takewhile(predicate, iterable):
for x in iterable:
if predicate(x):
yield x
else:
break
Also, reduce should be available to you directly if you are using python2. However, you can define a function to reduce a list of lists into one list as follows:
def reduce_l_of_l(lst_of_lst):
out_lst = []
for lst in lst_of_lst:
out_lst += lst
return out_lst
Instead of using reduce, just replace it with reduce_l_of_l after it's been defined.
I hope this helps.
This is actually quite a messy problem to solve with basic principles, and will be especially hard if you've just started learning programming. Here's a concise but more advanced solution:
result = {}
for grid in [lst, zip(*lst)]:
for row in grid:
for key, group in itertools.groupby(row):
result[key] = max(len(list(group)), result.get(key, 0))
Then result is:
{'A': 2, 'X': 3, 'e': 3, '.': 4}

Iterating through multidimensional lists?

Sorry if obvious question, I'm a beginner and my google-fu has failed me.
I am writing a tool that searches through text for alliteration. I have a multi-dimensional list: [[e,a,c,h], [w,o,r,d], [l,o,o,k,s], [l,i,k,e], [t,h,i,s]]
What I want is to iterate through the items in the main list, checking the [0] index of each item to see if it is equal to the [0] index of the FOLLOWING item.
def alit_finder(multi_level_list):
for i in multi_level_list:
if i[0] == multi_level_list[i + 1][0] and i != multi_level_list[-1]:
print i, multi_level_list[i + 1]
I'm getting a TypeError: can only concatenate list (not "int") to list.
So [i + 1] is not the right way to indicate 'the item which has an index equal to the index of i plus one'. However, [ + 1] is not working, either: that seems to return ANY two words in the list that have the same letter at word[0].
How do I refer to 'the following item' in this for statement?
ETA: Thank you all! I appreciate your time and explanations as to what exactly I was doing wrong here!
In a normal for-each loop like you have, you only get access to one element at a time:
for x in lst:
print("I can only see", x)
So you need to iterate over the indexes instead, for example:
for i in range(len(lst) - 1):
print("current =", lst[i], "next =", lst[i+1])
By the way, as a convention, it's a good idea to use variables named i to always refer to loop indexes. In your original code, part of the confusion is that you tried to use i as the list element at first, and later as an index, and it can't be both!
I think you want something like this:
def alit_finder(multi_level_list):
l=len(multi_level_list)
for i in xrange(l-1):
if multi_level_list[i][0] == multi_level_list[i + 1][0]:
print multi_level_list[i], multi_level_list[i + 1]
li=[['e','a','c','h'], ['w','o','r','d'], ['l','o','o','k','s'], ['l','i','k','e'], ['t','h','i','s']]
alit_finder(li)
Result:
['l', 'o', 'o', 'k', 's'] ['l', 'i', 'k', 'e']
You could use i as the index and x as the element of an enumerated list:
def alit_finder(multi_level_list):
for i, x in enumerate(multi_level_list):
if i == len(multi_level_list) - 1:
break # prevent index out of range error
if x[0] == multi_level_list[i + 1][0] and x != multi_level_list[-1]:
return x, multi_level_list[i + 1]
word_list = [['e','a','c','h'], ['w','o','r','d'], ['l','o','o','k','s'],
['l','i','k','e'], ['t','h','i','s']]
print alit_finder(word_list)
# (['l', 'o', 'o', 'k', 's'], ['l', 'i', 'k', 'e'])
something like this will work:
matching_indices = [i for i, (w1, w2) in enumerate(zip(multi_level_list, multi_level_list[1:])) if w1[0] == w2[0]]

Removing an element from python string recursively?

I'm trying to figure out how to write a program that would remove a given element from a python string recursively. Here's what I have so far:
def remove(x,s):
if x == s[0]:
return ''
else:
return s[0] + remove(x,s[1:])
When testing this code on the input remove('t', 'wait a minute'), it seems to work up until it reaches the first 't', but the code then terminates instead of continuing to go through the string. Does anyone have any ideas of how to fix this?
In your code, you return '' when you run into the character you're removing.
This will drop the rest of the string.
You want to keep going through the string instead (also pass x in recursive calls and add a base case):
def remove(x, s):
if not s:
return ''
if x == s[0]:
return remove(x, s[1:])
else:
return s[0] + remove(x, s[1:])
Also, in case you didn't know, you can use str.replace() to achieve this:
>>> 'wait a minute'.replace('t', '')
'wai a minue'
def Remove(s,e):
return filter(lambda x: x!= e, s)
Here is an example for your test
sequence = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
RemoveElement = ['d','c']
print(filter(lambda x: x not in RemoveElement, sequence))
#['a', 'b', 'e', 'f', 'g', 'h']
if you are just replacing/removing a character like 't' you could just use a list comprehension:
s = 'wait a minute'
xs = ''.join(x for x in s if x != 't')

What is the python equivalent to perl "a".."azc"

In perl, to get a list of all strings from "a" to "azc", to only thing to do is using the range operator:
perl -le 'print "a".."azc"'
What I want is a list of strings:
["a", "b", ..., "z", "aa", ..., "az" ,"ba", ..., "azc"]
I suppose I can use ord and chr, looping over and over, this is simple to get for "a" to "z", eg:
>>> [chr(c) for c in range(ord("a"), ord("z") + 1)]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
But a bit more complex for my case, here.
Thanks for any help !
Generator version:
from string import ascii_lowercase
from itertools import product
def letterrange(last):
for k in range(len(last)):
for x in product(ascii_lowercase, repeat=k+1):
result = ''.join(x)
yield result
if result == last:
return
EDIT: #ihightower asks in the comments:
I have no idea what I should do if I want to print from 'b' to 'azc'.
So you want to start with something other than 'a'. Just discard anything before the start value:
def letterrange(first, last):
for k in range(len(last)):
for x in product(ascii_lowercase, repeat=k+1):
result = ''.join(x)
if first:
if first != result:
continue
else:
first = None
yield result
if result == last:
return
A suggestion purely based on iterators:
import string
import itertools
def string_range(letters=string.ascii_lowercase, start="a", end="z"):
return itertools.takewhile(end.__ne__, itertools.dropwhile(start.__ne__, (x for i in itertools.count(1) for x in itertools.imap("".join, itertools.product(letters, repeat=i)))))
print list(string_range(end="azc"))
Use the product call in itertools, and ascii_letters from string.
from string import ascii_letters
from itertools import product
if __name__ == '__main__':
values = []
for i in xrange(1, 4):
values += [''.join(x) for x in product(ascii_letters[:26], repeat=i)]
print values
Here's a better way to do it, though you need a conversion function:
for i in xrange(int('a', 36), int('azd', 36)):
if base36encode(i).isalpha():
print base36encode(i, lower=True)
And here's your function (thank you Wikipedia):
def base36encode(number, alphabet='0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', lower=False):
'''
Convert positive integer to a base36 string.
'''
if lower:
alphabet = alphabet.lower()
if not isinstance(number, (int, long)):
raise TypeError('number must be an integer')
if number < 0:
raise ValueError('number must be positive')
# Special case for small numbers
if number < 36:
return alphabet[number]
base36 = ''
while number != 0:
number, i = divmod(number, 36)
base36 = alphabet[i] + base36
return base36
I tacked on the lowercase conversion option, just in case you wanted that.
I generalized the accepted answer to be able to start middle and to use other than lowercase:
from string import ascii_lowercase, ascii_uppercase
from itertools import product
def letter_range(first, last, letters=ascii_lowercase):
for k in range(len(first), len(last)):
for x in product(letters, repeat=k+1):
result = ''.join(x)
if len(x) != len(first) or result >= first:
yield result
if result == last:
return
print list(letter_range('a', 'zzz'))
print list(letter_range('BA', 'DZA', ascii_uppercase))
def strrange(end):
values = []
for i in range(1, len(end) + 1):
values += [''.join(x) for x in product(ascii_lowercase, repeat=i)]
return values[:values.index(end) + 1]

Categories

Resources