Return at least X results from split

Return at least X results from split - python

split has a maxsplit parameter, which is useful when you want at most X results. If there something similar to return at least X results and populate the rest with Nones. I'd like to be able to write
a, b, c = 'foo,bar'.magic_split(',', 3)
and have a=foo, b=bar and c=None.
Any ideas how to write such a function?
Upd. I ended up with a solution which is a combination of this and this answers:
>>> def just(n, iterable, fill=None):
... return (list(iterable) + [fill] * n)[:n]
...
>>> just(3, 'foo,bar'.split(','))
['foo', 'bar', None]

One way would be:
from itertools import chain
from itertools import repeat
from itertools import islice
def magic_split(seq, sep, n, def_value=None):
return list(islice(chain(seq.split(sep), repeat(def_value)), n))
You could just return the return value of islice if you don't need the list.
If you don't want the values to be cut off when n is less than number of split elements in seq, the modification is trivial:
def magic_split(seq, sep, n, def_value=None):
elems = seq.split(sep)
if len(elems) >= n:
return elems
return list(islice(chain(elems, repeat(def_value)), n))

There is no such parameter to str.split(). A hack to achieve this would be
a, b, c = ('foo,bar'.split(',', 2) + [None] * 3)[:3]
Not sure if I recommend this code, though.

I would use a more general function for that:
def fill(iterable, n):
tmp = tuple(iterable)
return tmp + (None,)*(n - len(tmp))
Then:
a, b, c = fill('foo,bar'.split(','), 3)

Since you ask for a string method, you can start by deriving from str:
>>> class magicstr(str):
def magic_split(self, sep=None, mlen=0):
parts = self.split(sep)
return parts + [None]* (mlen - len(parts))
>>> test = magicstr("hello there, ok?")
>>> test.magic_split(",", 3)
['hello there', ' ok?', None]

Related

Evaluating a function over a list in Python - without using loops

There is a problem in Python which involves the evaluation of a function over a list of numbers which are provided as inputs to the following function:
f(y) = sin(3y + pi/3) + cos(4y - pi/7)
I don't think MathJax tools are available on StackOverflow so the above is the best I can do.
There are four outputs to the function: An array or list containing the values obtained by the function for each element of the input list, the minimum and maximum values in the output array / list, and an array or list of the differences between successive values obtained by the function.
Here is the code so far. We assume that only sensible inputs are passed to the function.
import sympy
def minMaxDiffValues(lst):
y = sympy.symbols('y')
f = sympy.sin(3*y + sympy.pi/3) + sympy.cos(4*y - sympy.pi/7)
values = []
for n in lst:
values.append(f.subs(y,n))
differences = []
for i in range(len(values) - 1):
differences.append(values[i + 1] - values[i])
print values
print min(values)
print max(values)
print differences
As far as I know, the above code gets the job done; I've opted to work with lists, even though I am familiar with numpy. I'll replace the print statements with a single return statement; for now I'm printing the outputs to make sure that they are correct.
The only issue is that the problem prevents the use of loops; thus I am uncertain as to how to approach such a problem for the first and last function outputs.
Is it possible to write the above function without using any loops?

You could use list comprehensions:
import sympy
def minMaxDiffValues(lst):
y = sympy.symbols('y')
f = sympy.sin(3*y + sympy.pi/3) + sympy.cos(4*y - sympy.pi/7)
values = [f.subs(y,n) for n in lst]
differences = [values[i+1] - values[i] for i in range(len(values)-1)]
print(values)
print(min(values))
print(max(values))
print(differences)
If you wanted to, you could also use the pairwise recipe from the itertools module docs:
import itertools
import sympy
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
def minMaxDiffValues(lst):
y = sympy.symbols('y')
f = sympy.sin(3*y + sympy.pi/3) + sympy.cos(4*y - sympy.pi/7)
values = [f.subs(y,n) for n in lst]
differences = [y - x for (x, y) in pairwise(values)]
print(values)
print(min(values))
print(max(values))
print(differences)

Using map is a way to apply a function to a list of values in a compact fashion:
>>> from sympy import y, pi
>>> f = lambda y: sin(3*y + pi/3) + cos(4*y - pi/7)
>>> vals = list(map(f, lst))
>>> d = lambda i: vals[i] - vals[i-1]
>>> difs = list(map(d, range(1, len(vals))))
And there is no visible 'for'. But as #hpaulj notes, there's one under the hood somewhere.

How to make all combinations of many functions in python?

So i have
x = 3
def first(x):
return x+2
def second(x):
return x/2
def third(x):
return x*4
I would like to make a pipe of functions like :
first -> second -> third
but all combinations of functions :
like first -> second , first -> third
and get each time the value of x for each combination.
And i don't need not only to multiply them but to be able to make multiple combination of various length.
Here it's just fixed number of combination :
How to multiply functions in python?
regards and thanks

First the combinations part:
>>> functions = [first, second, third]
>>> from itertools import combinations, permutations
>>> for n in range(len(functions)):
... for comb in combinations(functions, n + 1):
... for perm in permutations(comb, len(comb)):
... print('_'.join(f.__name__ for f in perm))
...
first
second
third
first_second
second_first
first_third
third_first
second_third
third_second
first_second_third
first_third_second
second_first_third
second_third_first
third_first_second
third_second_first
Next the composing part, steal the #Composable decorator from the question How to multiply functions in python? and use it to compose functions from each permutation.
from operator import mul
from functools import reduce
for n in range(len(functions)):
for comb in combinations(functions, n + 1):
for perm in permutations(comb, len(comb)):
func_name = '_'.join(f.__name__ for f in perm)
func = reduce(mul, [Composable(f) for f in perm])
d[func_name] = func
Now you have a namespace of functions (actually callable classes), demo:
>>> f = d['third_first_second']
>>> f(123)
254.0
>>> third(first(second(123)))
254.0
>>> ((123 / 2) + 2) * 4
254.0

compare 2 strings for common substring

i wish to find longest common substring of 2 given strings recursively .i have written this code but it is too inefficient .is there a way i can do it in O(m*n) here m an n are respective lengths of string.here's my code:
def lcs(x,y):
if len(x)==0 or len(y)==0:
return " "
if x[0]==y[0]:
return x[0] + lcs(x[1:],y[1:])
t1 = lcs(x[1:],y)
t2 = lcs(x,y[1:])
if len(t1)>len(t2):
return t1
else:
return t2
x = str(input('enter string1:'))
y = str(input('enter string2:'))
print(lcs(x,y))

You need to memoize your recursion. Without that, you will end up with an exponential number of calls since you will be repeatedly solving the same problem over and over again. To make the memoized lookups more efficient, you can define your recursion in terms of the suffix lengths, instead of the actual suffixes.
You can also find the pseudocode for the DP on Wikipedia.

Here is a naive non-recursive solution which uses the powerset() recipe from itertools:
from itertools import chain, combinations, product
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))
def naive_lcs(a, b):
return ''.join(max(set(powerset(a)) & set(powerset(b)), key=len))
It has problems:
>>> naive_lcs('ab', 'ba')
'b'
>>> naive_lcs('ba', 'ab')
'b'
There can be more than one solution for some pairs of strings, but my program picks one arbitrarily.
Also, since any of the combinations can be the longest common one, and since calculating these combinations takes O(2 ^ n) time, this solution doesn't compute in O(n * m) time. With Dynamic Programming and memoizing OTOH we can find a solution that, in theory, should perform better:
from functools import lru_cache
#lru_cache()
def _dynamic_lcs(xs, ys):
if not (xs and ys):
return set(['']), 0
elif xs[-1] == ys[-1]:
result, rlen = _dynamic_lcs(xs[:-1], ys[:-1])
return set(each + xs[-1] for each in result), rlen + 1
else:
xlcs, xlen = _dynamic_lcs(xs, ys[:-1])
ylcs, ylen = _dynamic_lcs(xs[:-1], ys)
if xlen > ylen:
return xlcs, xlen
elif xlen < ylen:
return ylcs, ylen
else:
return xlcs | ylcs, xlen
def dynamic_lcs(xs, ys):
result, _ = _dynamic_lcs(xs, ys)
return result
if __name__ == '__main__':
seqs = list(powerset('abcde'))
for a, b in product(seqs, repeat=2):
assert naive_lcs(a, b) in dynamic_lcs(a, b)
dynamic_lcs() also solves the problem that some pairs strings can have multiple common longest sub-sequences. The result is the set of these, instead of one string. Finding the set of all common sub-sequences though is still of exponential complexity.
Thanks to Pradhan for reminding me of Dynamic Programming and memoization.

How to find a missing number from a list?

How do I find the missing number from a sorted list the pythonic way?
a=[1,2,3,4,5,7,8,9,10]
I have come across this post but is there a more and efficient way to do this?

>>> a=[1,2,3,4,5,7,8,9,10]
>>> sum(xrange(a[0],a[-1]+1)) - sum(a)
6
alternatively (using the sum of AP series formula)
>>> a[-1]*(a[-1] + a[0]) / 2 - sum(a)
6
For generic cases when multiple numbers may be missing, you can formulate an O(n) approach.
>>> a=[1,2,3,4,7,8,10]
>>> from itertools import imap, chain
>>> from operator import sub
>>> print list(chain.from_iterable((a[i] + d for d in xrange(1, diff))
for i, diff in enumerate(imap(sub, a[1:], a))
if diff > 1))
[5, 6, 9]

This should work:
a = [1, 3, 4, 5, 7, 8, 9, 10]
b = [x for x in range(a[0], a[-1] + 1)]
a = set(a)
print(list(a ^ set(b)))
>>> [2, 6]

1 + 2 + 3 + ... + (n - 1) + n = (n) * (n + 1)/2
so the missing number is:
(a[-1] * (a[-1] + 1))/2 - sum(a)

set(range(a[len(a)-1])[1:]) - set(a)
Take the set of all numbers minus the set of given.

And another itertools way:
from itertools import count, izip
a=[1,2,3,4,5,7,8,9,10]
nums = (b for a, b in izip(a, count(a[0])) if a != b)
next(nums, None)
# 6

This will handle the cases when the first or last number is missing.
>>> a=[1,2,3,4,5,7,8,9,10]
>>> n = len(a) + 1
>>> (n*(n+1)/2) - sum(a)
6

If many missing numbers in list:
>>> a=[1,2,3,4,5,7,8,10]
>>> [(e1+1) for e1,e2 in zip(a, a[1:]) if e2-e1 != 1]
[6, 9]

def find(arr):
for x in range(0,len(arr) -1):
if arr[x+1] - arr[x] != 1:
print arr[x] + 1

Simple solution for the above problem, it also finds multiple missing elements.
a = [1,2,3,4,5,8,9,10]
missing_element = []
for i in range(a[0], a[-1]+1):
if i not in a:
missing_element.append(i)
print missing_element
o/p:
[6,7]

Here is the simple logic for finding mising numbers in list.
l=[-10,-5,2,4,5,9,20]
s=l[0]
e=l[-1]
x=sorted(range(s,e+1))
l_1=[]
for i in x:
if i not in l:
l_1.append(i)
print(l_1)

def findAllMissingNumbers(a):
b = sorted(a)
return list(set(range(b[0], b[-1])) - set(b))

L=[-5,1,2,3,4,5,7,8,9,10,13,55]
missing=[]
for i in range(L[0],L[-1]):
if i not in L:
missing.append(i)
print(missing)

A simple list comprehension approach that will work with multiple (non-consecutive) missing numbers.
def find_missing(lst):
"""Create list of integers missing from lst."""
return [lst[x] + 1 for x in range(len(lst) - 1)
if lst[x] + 1 != lst[x + 1]]

There is a perfectly working solution by #Abhiji. I would like to extent his answer by the option to define a granularity value. This might be necessary if the list should be checked for a missing value > 1:
from itertools import imap, chain
from operator import sub
granularity = 3600
data = [3600, 10800, 14400]
print list(
chain.from_iterable(
(data[i] + d for d in xrange(1, diff) if d % granularity == 0)
for i, diff in enumerate(imap(sub, data[1:], data))
if diff > granularity
)
)
The code above would produce the following output: [7200].
As this code snipped uses a lot of nested functions, I'd further like to provide a quick back reference, that helped me to understand the code:
Python enumerate()
Python imap()
Python chain.from_iterable()

Less efficient for very large lists, but here's my version for the Sum formula:
def missing_number_sum(arr):
return int((arr[-1]+1) * arr[-1]/2) - sum(arr)

If the range is known and the list is given, the below approach will work.
a=[1,2,3,4,5,7,8,9,10]
missingValues = [i for i in range(1, 10+1) if i not in a]
print(missingValues)
# o/p: [6]

set(range(1,a[-1])) | set(a)
Compute the union of two sets.

I used index position.
this way i compare index and value.
a=[0,1,2,3,4,5,7,8,9,10]
for i in a:
print i==a.index(i)

Best way to determine if a sequence is in another sequence?

This is a generalization of the "string contains substring" problem to (more) arbitrary types.
Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:
Example usage (Sequence in Sequence):
>>> seq_in_seq([5,6], [4,'a',3,5,6])
3
>>> seq_in_seq([5,7], [4,'a',3,5,6])
-1 # or None, or whatever
So far, I just rely on brute force and it seems slow, ugly, and clumsy.

I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/
It finds all the correct subsequences in a given sequence, and should be used as an iterator:
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s
3
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s
(nothing)

Here's a brute-force approach O(n*m) (similar to #mcella's answer). It might be faster than the Knuth-Morris-Pratt algorithm implementation in pure Python O(n+m) (see #Gregg Lind answer) for small input sequences.
#!/usr/bin/env python
def index(subseq, seq):
"""Return an index of `subseq`uence in the `seq`uence.
Or `-1` if `subseq` is not a subsequence of the `seq`.
The time complexity of the algorithm is O(n*m), where
n, m = len(seq), len(subseq)
>>> index([1,2], range(5))
1
>>> index(range(1, 6), range(5))
-1
>>> index(range(5), range(5))
0
>>> index([1,2], [0, 1, 0, 1, 2])
3
"""
i, n, m = -1, len(seq), len(subseq)
try:
while True:
i = seq.index(subseq[0], i + 1, n - m + 1)
if subseq == seq[i:i + m]:
return i
except ValueError:
return -1
if __name__ == '__main__':
import doctest; doctest.testmod()
I wonder how large is the small in this case?

A simple approach: Convert to strings and rely on string matching.
Example using lists of strings:
>>> f = ["foo", "bar", "baz"]
>>> g = ["foo", "bar"]
>>> ff = str(f).strip("[]")
>>> gg = str(g).strip("[]")
>>> gg in ff
True
Example using tuples of strings:
>>> x = ("foo", "bar", "baz")
>>> y = ("bar", "baz")
>>> xx = str(x).strip("()")
>>> yy = str(y).strip("()")
>>> yy in xx
True
Example using lists of numbers:
>>> f = [1 , 2, 3, 4, 5, 6, 7]
>>> g = [4, 5, 6]
>>> ff = str(f).strip("[]")
>>> gg = str(g).strip("[]")
>>> gg in ff
True

Same thing as string matching sir...Knuth-Morris-Pratt string matching

>>> def seq_in_seq(subseq, seq):
... while subseq[0] in seq:
... index = seq.index(subseq[0])
... if subseq == seq[index:index + len(subseq)]:
... return index
... else:
... seq = seq[index + 1:]
... else:
... return -1
...
>>> seq_in_seq([5,6], [4,'a',3,5,6])
3
>>> seq_in_seq([5,7], [4,'a',3,5,6])
-1
Sorry I'm not an algorithm expert, it's just the fastest thing my mind can think about at the moment, at least I think it looks nice (to me) and I had fun coding it. ;-)
Most probably it's the same thing your brute force approach is doing.

Brute force may be fine for small patterns.
For larger ones, look at the Aho-Corasick algorithm.

Here is another KMP implementation:
from itertools import tee
def seq_in_seq(seq1,seq2):
'''
Return the index where seq1 appears in seq2, or -1 if
seq1 is not in seq2, using the Knuth-Morris-Pratt algorithm
based heavily on code by Neale Pickett <neale#woozle.org>
found at: woozle.org/~neale/src/python/kmp.py
>>> seq_in_seq(range(3),range(5))
0
>>> seq_in_seq(range(3)[-1:],range(5))
2
>>>seq_in_seq(range(6),range(5))
-1
'''
def compute_prefix_function(p):
m = len(p)
pi = [0] * m
k = 0
for q in xrange(1, m):
while k > 0 and p[k] != p[q]:
k = pi[k - 1]
if p[k] == p[q]:
k = k + 1
pi[q] = k
return pi
t,p = list(tee(seq2)[0]), list(tee(seq1)[0])
m,n = len(p),len(t)
pi = compute_prefix_function(p)
q = 0
for i in range(n):
while q > 0 and p[q] != t[i]:
q = pi[q - 1]
if p[q] == t[i]:
q = q + 1
if q == m:
return i - m + 1
return -1

I'm a bit late to the party, but here's something simple using strings:
>>> def seq_in_seq(sub, full):
... f = ''.join([repr(d) for d in full]).replace("'", "")
... s = ''.join([repr(d) for d in sub]).replace("'", "")
... #return f.find(s) #<-- not reliable for finding indices in all cases
... return s in f
...
>>> seq_in_seq([5,6], [4,'a',3,5,6])
True
>>> seq_in_seq([5,7], [4,'a',3,5,6])
False
>>> seq_in_seq([4,'abc',33], [4,'abc',33,5,6])
True
As noted by Ilya V. Schurov, the find method in this case will not return the correct indices with multi-character strings or multi-digit numbers.

For what it's worth, I tried using a deque like so:
from collections import deque
from itertools import islice
def seq_in_seq(needle, haystack):
"""Generator of indices where needle is found in haystack."""
needle = deque(needle)
haystack = iter(haystack) # Works with iterators/streams!
length = len(needle)
# Deque will automatically call deque.popleft() after deque.append()
# with the `maxlen` set equal to the needle length.
window = deque(islice(haystack, length), maxlen=length)
if needle == window:
yield 0 # Match at the start of the haystack.
for index, value in enumerate(haystack, start=1):
window.append(value)
if needle == window:
yield index
One advantage of the deque implementation is that it makes only a single linear pass over the haystack. So if the haystack is streaming then it will still work (unlike the solutions that rely on slicing).
The solution is still brute-force, O(n*m). Some simple local benchmarking showed it was ~100x slower than the C-implementation of string searching in str.index.

Another approach, using sets:
set([5,6])== set([5,6])&set([4,'a',3,5,6])
True

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Return at least X results from split - python

There is no such parameter to str.split(). A hack to achieve this would be a, b, c = ('foo,bar'.split(',', 2) + [None] * 3)[:3] Not sure if I recommend this code, though.

I would use a more general function for that: def fill(iterable, n): tmp = tuple(iterable) return tmp + (None,)*(n - len(tmp)) Then: a, b, c = fill('foo,bar'.split(','), 3)

Related

Evaluating a function over a list in Python - without using loops

How to make all combinations of many functions in python?

compare 2 strings for common substring

How to find a missing number from a list?

Best way to determine if a sequence is in another sequence?

Categories

Resources