Is there a better way to write the following method in python? - python

I am writing a small program, in python, which will find a lone missing element from an arithmetic progression (where the starting element could be both positive and negative and the series could be ascending or descending).
so for example: if the input is 1 3 5 9 11, then the function should return 7 as this is the lone missing element in the above AP series.
The input format: the input elements are separated by 1 white space and not commas as is commonly done.
Here is the code:
def find_missing_elm_ap_series(n, series):
ap = series
ap = ap.split(' ')
ap = [int(i) for i in ap]
cd = []
for i in range(n-1):
cd.append(ap[i+1]-ap[i])
common_diff = 0
if len(set(cd)) == 1:
print 'The series is complete'
return series
else:
cd = [abs(i) for i in cd]
common_diff = min(cd)
if ap[0] > ap[1]:
common_diff = (-1)*common_diff
new_ap = []
for i in range(n+1):
new_ap.append(ap[0] + i*common_diff)
missing_element = set(new_ap).difference(set(ap))
return missing_element
where n is the length of the series provided (the series with the missing element:5 in the above example).
I am sure there are other shorter and more elegant way of writing this code in python. Can anybody help ?
Thanks
BTW: i am learning python by myself and hence the question.

Based on the fact that if an element is missing it is exactly expected-sum(series) - actual-sum(series). The expected sum for a series with n elements starting at a and ending at b is (a+b)*n/2. The rest is Python:
def find_missing(series):
A = map(int, series.split(' '))
a, b, n, sumA = A[0], A[-1], len(A), sum(A)
if (a+b)*n/2 == sumA:
return None #no element missing
return (a+b)*(n+1)/2-sumA
print find_missing("1 3 5 9") #7
print find_missing("-1 1 3 5 9") #7
print find_missing("9 6 0") #3
print find_missing("1 2 3") #None
print find_missing("-3 1 3 5") #-1

Well... You can do simpler, but it would completely change your algorithm.
First, you can prove that the step for the arithmetic progression is ap[1] - ap[0], unless ap[2] - ap[1] is lower in magnitude than it, in which case the missing element is between terms 0 and 1. (This is true as there is a single missing element.)
Then you can just take ap[0] + n * step and print the first one that doesn't match.
Here is the source code (also implementing some minor shortcuts, such as grouping your first three lines into one):
def find_missing_elm_ap_series(n, series):
ap = [int(i) for i in series.split(' ')]
step = ap[1] - ap[0]
if (abs(ap[2] - ap[1]) <= abs(step)): # Check missing elt is not between 0 and 1
return ap[0] + ap[2] - ap[1]
for (i, val) in zip(range(len(ap)), ap): # And check position of missing element
if ap[0] + i * step != val:
return ap[0] + i * step
return series # missing element not found

The code appears to be working. There is perhaps a slightly easier way to get it done. This is due to the fact that you don't have to attempt to look through all of the values to get the common difference. The following code simply looks at the difference between the 1st and 2nd as well as the last and second last.
This works in the event that only a single value is missing (and the length of the list is at least 3). As the min difference between the values will provide you the common difference.
def find_missing(prog):
# First we cast them to numbers.
items = [int(x) for x in prog.split()]
#Then we compare the first and second
first_to_second = items[1] - items[0]
#then we compare the last to second last
last_to_second_last = items[-1] - items[-2]
#Now we have to care about which one is closes
# to zero
if abs(first_to_second) < abs(last_to_second_last):
change = first_to_second
else:
change = last_to_second_last
#Iterate through the list. As soon as we find a gap
#that is larger than change, we fill in and return
for i in range(1, len(items)):
comp = items[i] - items[i-1]
if comp != change:
return items[i-1] + change
#There was no gap
return None
print(find_missing("1 3 5 9")) #7
print(find_missing("-1 1 3 5 9")) #7
print(find_missing("9 6 0")) #3
print(find_missing("1 2 3")) #None
The previous code shows this example. First of all attempting to find change between each of the values of the list. Then iterating till the change is missed, and returning the value that has been expected.

Here's the way I thought about it: find the position of the maximum difference between the elements of the array; then regenerate the expected number in the sequence from the other differences (which should be all the same and the minimum number in the differences list):
def find_missing(a):
d = [a[i+1] - a[i] for i in range(len(a)-1)]
i = d.index(max(d))
x = min(d)
return a[0] + (i+1)*x
print find_missing([1,3,5,9,11])
7
print find_missing([1,5,7,9,11])
3

Here are some ideas:
Passing the length of the series seems like a bad idea. The function can more easily calculate the length
There is no reason to assign series to ap, just do a function using series and assign the result to ap
When splitting the string, don't give the sep argument. If you don't give the argument, then consecutive white space will also be removed and leading and trailing white space will also be ignored. This is more friendly on the format of the data.
I've combined a few operations. For example the split and the list comprehension converting to integer make sense to group together. There is also no need to create cd as a list and then convert that to a set. Just build it as a set to start with.
I don't like that the function returns the original series in the case of no missing element. The value None would be more in keeping with the name of the function.
Your original function returned a one item set as the result. That seems odd, so I've used pop() to extract that item and return just the missing element.
The last item was more of an experiment with combining all of the code at the bottom into a single statement. Don't know if it is better, but it's something to think about. I built a set with all the correct numbers and a set with the given numbers and then subtracted them and returned the number that was missing.
Here's the code that I came up with:
def find_missing_elm_ap_series(series):
ap = [int(i) for i in series.split()]
n = len(ap)
cd = {ap[i+1]-ap[i] for i in range(n-1)}
if len(cd) == 1:
print 'The series is complete'
return None
else:
common_diff = min([abs(i) for i in cd])
if ap[0] > ap[1]:
common_diff = (-1)*common_diff
return set(range(ap[0],ap[0]+common_diff*n,common_diff)).difference(set(ap)).pop()

Assuming the first & last items are not missing, we can also make use of range() or xrange() with the step of the common difference, getting rid of the n altogether, it can also return more than 1 missing item (although not reliably depending on number of items missing):
In [13]: def find_missing_elm(series):
ap = map(int, series.split())
cd = map(lambda x: x[1]-x[0], zip(ap[:-1], ap[1:]))
if len(set(cd)) == 1:
print 'complete series'
return ap
mcd = min(cd) if ap[0] < ap[1] else max(cd)
sap = set(ap)
return filter(lambda x: x not in sap, xrange(ap[0], ap[-1], mcd))
....:
In [14]: find_missing_elm('1 3 5 9 11 15')
Out[14]: [7, 13]
In [15]: find_missing_elm('15 11 9 5 3 1')
Out[15]: [13, 7]

Related

How to add the last two elements in a list and add the sum to existing list

I am learning Python and using it to work thru a challenge found in Project Euler. Unfortunately, I cannot seem to get around this problem.
The problem:
Even Fibonacci numbers
Each new term in the Fibonacci sequence is generated by adding the
previous two terms. By starting with 1 and 2, the first 10 terms will
be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not
exceed four million, find the sum of the even-valued terms.
I created a for loop that adds the second to last element and the last element from the list x:
x = [1,2]
for i in x:
second_to_last = x[-2]
running_sum = i + second_to_last
If you run the above, you get 3. I am looking to add this new element back to the original list, x, and repeat the process. However, each time I try to use the append() function, the program crashes and keeps on running without stopping. I tried to use a while loop to stop this, but that was a complete failure. Why am I not able to add or append() the new element (running_sum) back to the original list (x)?
UPDATE:
I did arrive at the solution (4613732), but I the work to getting there did not seem efficient. Here is my solution:
while len(x) in range(1,32):
for i in x:
second_to_last = x[-2]
running_sum = i + second_to_last
x.append(running_sum)
print(x)
new_x = []
for i in x:
if i%2 == 0:
new_x.append(i)
sum(new_x)
I did have to check the range to see visually whether I did not exceed 4 million. But as I said, the process I took was not efficient.
If you keep adding elements to a list while iterating over that list, the iteration will never finish.
You will need some other criterion to abort the loop - for example, in this case
if running_sum > 4000000:
break
would work.
(Note that you don't strictly speaking need a list at all here; I'd suggest experimenting a bit with it.)
Here are two different ways to solve this. One of them builds the whole list, then sums the even elements. The other one only keeps the last two elements, without making the whole list.
fib = [1,2]
while fib[-1] < 4000000:
fib.append(fib[-2]+fib[-1])
# Get rid of the last one, since it was over the limit.
fib.pop(-1)
print( sum(i for i in fib if i % 2 == 0) )
fib = (1,2)
sumx = 2
while True:
nxt = fib[0]+fib[1]
if nxt >= 4000000:
break
if nxt % 2 == 0:
sumx += nxt
fib = (fib[1],nxt)
print(sumx)
I don't answer your question about list modification but the solution for your problem:
def sum_even_number_fibonacci(limit):
n0 = 0 # Since we don't care about index (n-th), we can use n0 = 0 or 1
n1 = 1
even_number_sum = 0
while n1 <= limit:
if n1 % 2 == 0:
even_number_sum += n1
n2 = n0 + n1
# Only store the last two number of the Fibonacci sequence to calculate the next one
n0 = n1
n1 = n2
return even_number_sum
sum_even_number_fibonacci(4_000_000)

Nth 1 in a sequence [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
A array of length t has all elements initialized by 1 .Now we can perform two types of queries on the array
to replace the element at ith index to 0 .This query is denoted by 0 index
find and print an integer denoting the index of the kth 1 in array A on a new line; if no such index exists print -1.This query is denoted by 1 k
Now suppose for array of length t=4 all its elements at the beginning are [1,1,1,1] now for query 0 2 the array becomes [1,0,1,1] and for query 1 3 the output comes out to be 4
I have used a brute force approach but how to make the code more efficient?
n,q=4,2
arr=[1]*4
for i in range(q):
a,b=map(int,input().split())
if a==0:
arr[b-1]=0
else:
flag=True
count=0
target=b
for i,j in enumerate(arr):
if j ==1:
count+=1
if count==target:
print(i+1)
flag=False
break
if flag:
print(-1)
I have also tried to first append all the indexes of 1 in a list and then do binary search but pop 0 changes the indices due to which the code fails
def binary_search(low,high,b):
while(low<=high):
mid=((high+low)//2)
#print(mid)
if mid+1==b:
print(stack[mid]+1)
return
elif mid+1>b:
high=mid-1
else:
low=mid+1
n=int(input())
q=int(input())
stack=list(range(n))
for i in range(q):
a,b=map(int,input().split())
if a==0:
stack.pop(b-1)
print(stack)
else:
if len(stack)<b:
print(-1)
continue
else:
low=0
high=len(stack)-1
binary_search(low,high,b)
You could build a binary tree where each node gives you the number of ones that are below and at the left of it. So if n is 7, that tree would initially look like this (the actual list with all ones is shown below it):
4
/ \
2 2
/ \ / \
1 1 1 1
----------------
1 1 1 1 1 1 1 -
Setting the array element at index 4 (zero-based) to 0, would change that tree to:
4
/ \
2 1*
/ \ / \
1 1 0* 1
----------------
1 1 1 1 0*1 1 -
Putting a 0 thus represents a O(log(n)) time complexity.
Counting the number of ones can then also be done in the same time complexity by summing up the node values while descending down the tree in the right direction.
Here is Python code you could use. It represents the tree in a list in breadth-first order. I have not gone to great lengths to further optimise the code, but it has the above time complexities:
class Ones:
def __init__(self, n): # O(n)
self.lst = [1] * n
self.one_count = n
self.tree = []
self.size = 1 << (n-1).bit_length()
at_left = self.size // 2
width = 1
while width <= at_left:
self.tree.extend([at_left//width] * width)
width *= 2
def clear_index(self, i): # O(logn)
if i >= len(self.lst) or self.lst[i] == 0:
return
self.one_count -= 1
self.lst[i] = 0
# Update tree
j = 0
bit = self.size >> 1
while bit >= 1:
go_right = (i & bit) > 0
if not go_right:
self.tree[j] -= 1
j = j*2 + 1 + go_right
bit >>= 1
def get_index_of_ith_one(self, num_ones): # O(logn)
if num_ones <= 0 or num_ones > self.one_count:
return -1
j = 0
k = 0
bit = self.size >> 1
while bit >= 1:
go_right = num_ones > self.tree[j]
if go_right:
k |= bit
num_ones -= self.tree[j]
j = j*2 + 1 + go_right
bit >>= 1
return k
def is_consistent(self): # Only for debugging
# Check that list can be derived by calling get_index_of_ith_one for all i
lst = [0] * len(self.lst)
for i in range(1, self.one_count+1):
lst[self.get_index_of_ith_one(i)] = 1
return lst == self.lst
# Example use
ones = Ones(12)
print('tree', ones.tree)
ones.clear_index(5)
ones.clear_index(2)
ones.clear_index(1)
ones.clear_index(10)
print('tree', ones.tree)
print('lst', ones.lst)
print('consistent = ', ones.is_consistent())
Be aware that this treats indexes as zero-based, while the method get_index_of_ith_one expects an argument that is at least 1 (but it returns a zero-based index).
It should be easy to adapt to your needs.
Complexity
Creation: O(n)
Clear at index: O(logn)
Get index of one: O(logn)
Space complexity: O(n)
Let's start with some general tricks:
Check if the n-th element is too big for the list before iterating. If you also keep a "counter" that stores the number of zeros, you could even check if nth >= len(the_list) - number_of_zeros (not sure if >= is correct here, it seems like the example uses 1-based indices so I could be off-by-one). That way you save time whenever too big values are used.
Use more efficient functions.
So instead of input you could use sys.stdin.readline (note that it will include the trailing newline).
And, even though it's probably not useful in this context, the built-in bisect module would be better than the binary_search function you created.
You could also use for _ in itertools.repeat(None, q) instead of for i in range(q), that's a bit faster and you don't need that index.
Then you can use some more specialized facts about the problem to improve the code:
You only store zeros and ones, so you can use if j to check for ones and if not j to check for zeros. These will be a bit faster than manual comparisons especially in when you do that in a loop.
Every time you look for the nth 1, you could create a temporary dictionary (or a list) that contains the encountered ns + index. Then re-use that dict for subsequent queries (dict-lookup and list-random-access is O(1) while your search is O(n)). You could even expand it if you have subsequent queries without change in-between.
However if a change happens you either need to discard that dictionary (or list) or update it.
A few nitpicks:
The variable names are not very descriptive, you could use for index, item in enumerate(arr): instead of i and j.
You use a list, so arr is a misleading variable name.
You have two i variables.
But don't get me wrong. It's a very good attempt and the fact that you use enumerate instead of a range is great and shows that you already write pythonic code.
Consider something akin to the interval tree:
root node covers the entire array
children nodes cover left and right halves of the parent range respectively
each node holds the number of ones in its range
Both replace and search queries could be completed in logarithmic time.
Refactored with less lines, so more efficient in terms of line count but run time probably the same O(n).
n,q=4,2
arr=[1]*4
for i in range(q):
query, target = map(int,input('query target: ').split())
if query == 0:
arr[target-1] = 0
else:
count=0
items = enumerate(arr, 1)
try:
while count < target:
index, item = next(items)
count += item
except StopIteration as e:
index = -1
print(index)
Assumes arr contains ONLY ones and zeroes - you don't have to check if an item is one before you add it to count, adding zero has no affect.
No flags to check, just keep calling next on the enumerate object (items) till you reach your target or the end of arr.
For runtime efficiency, using an external library but basically the same process (algorithm):
import numpy as np
for i in range(q):
query, target = map(int,input('query target: ').split())
if query == 0:
arr[target-1] = 0
else:
index = -1
a = np.array(arr).cumsum() == target
if np.any(a):
index = np.argmax(a) + 1
print(index)

Function used to remove numbers with a repeating integer skips one entry and i dont know why

This is my function:
def repeat(x,Y):
A = list(str(x)) #makes a list, A, of each digit: 101 becomes ['1','0','1']
A = map(int,A) #converts each value of the new list to integers
for i in range(0,10):
b = A.count(i) #counts how many times each digit is present
if b>1: #if there is repetition
Y.remove(x)
This seems to be fine when run in idle for a single number, however when applied to a list using a for loop, the function misses one value.
B = []
for i in range(100,1000): #needs to be a 3 digit number (100 until 999)
if i%17 == 0:
B.append(i) #creates list of factors of 17
for j in B: #removes any values that have digits that occur more than once
repeat(j,B)
This returns a list which includes the number 663. When the function is re-run in the new list, that value is removed. Also when it is applied to a different list, 3 digit numbers with 13 as a factor, the same occurs, one value with a repeating digit.
Its not a major inconvenience, just a really annoying one.
255 which comes immediately before 272 is removed but 272 gets skipped. Similarly, 663 is skipped as 646 directly before it is removed.
I suspect it could do with in-place modification of the array as #interjay says.
ETA: With debugging statements put in, you can see that the numbers that come immediately after numbers that were removed, are skipped over:
def repeat(x,Y):
A = list(str(x)) #makes a list, A, of each digit: 101 becomes ['1','0','1']
A = map(int,A) #converts each value of the new list to integers
print 'Proceessing', x
for i in range(0,10):
b = A.count(i) #counts how many times each digit is present
if b>1: #if there is repetition
print 'Removed', x
Y.remove(x)
B = []
for i in range(100,1000): #needs to be a 3 digit number (100 until 999)
if i%17 == 0:
B.append(i) #creates list of factors of 17
print B
for j in B: #removes any values that have digits that occur more than once
repeat(j,B)
print B
As the comments suggest, don't modify a list while iterating over it. Also, Python provides Counter to count iterables, so you don't need to implement it yourself. Finally, since you are repeatedly filtering an iterable, it is sensible to use filter.
import collections
def norepeats(x):
counts = collections.Counter(str(x))
return not any(ci > 1 for ci in counts.values())
threedigit = range(100,1000)
b1 = filter(lambda x: 0==x%17, threedigit)
b2 = filter(norepeats, b1)
print b2 #the result you expected

Find repeats with certain length within a string using python

I am trying to use the regex module to find non-overlapping repeats (duplicated sub-strings) within a given string (30 char), with the following requirements:
I am only interested in non-overlapping repeats that are 6-15 char long.
allow 1 mis-match
return the positions for each match
One way I thought of is that for each possible repeat length, let python loop through the 30char string input. For example,
string = "ATAGATATATGGCCCGGCCCATAGATATAT" #input
#for 6char repeats, first one in loop would be for the following event:
text = "ATAGAT"
text2 ="(" + text + ")"+ "{e<=1}" #this is to allow 1 mismatch later in regex
string2="ATATGGCCCGGCCCATAGATATAT" #string after excluding text
for x in regex.finditer(text2,string2,overlapped=True):
print x.span()
#then still for 6char repeats, I will move on to text = "TAGATA"...
#after 6char, loop again for 7char...
There should be two outputs for this particular string = "ATAGATATATGGCCCGGCCCATAGATATAT". 1. The bold two "ATAGATATAT" + 1 mismatch: "ATAGATATATG" &"CATAGATATAT" with position index returned as (0,10)&(19, 29); 2. "TGGCCC" & "GGCCCA" (need add one mismatch to be at least 6 char), with index (9,14)&(15,20). Numbers can be in a list or table.
I'm sorry that I didn't include a real loop, but I hope the idea is clear...As you can see, this is a very less efficient method, not to mention it would create redundancy --- e.g. 10char repeats will be counted more than once, because it would suit for 9,8,7 and 6 char repeats loops. Moreover, I have a lot of such 30 char strings to work with, so I would appreciate your advice on some cleaner methods.
Thank you very much:)
I'd try straightforward algorithm instead of regex (which are quite confusing in this instance);
s = "ATAGATATATGGCCCGGCCCATAGATATAT"
def fuzzy_compare(s1, s2):
# sanity check
if len(s1) != len(s2):
return False
diffs = 0
for a, b in zip(s1, s2):
if a != b:
diffs += 1
if diffs > 1:
return False
return True
slen = len(s) # 30
for l in range(6, 16):
i = 0
while (i + l * 2) <= slen:
sub1 = s[i:i+l]
for j in range(i+l, slen - l):
sub2 = s[j:j+l]
if fuzzy_compare(sub1, sub2):
# checking if this could be partial
partial = False
if i + l < j and j + l < slen:
extsub1 = s[i:i+l+1]
extsub2 = s[j:j+l+1]
# if it is partial, we'll get it later in the main loop
if fuzzy_compare(extsub1, extsub2):
partial = True
if not partial:
print (i, i+l), (j, j+l)
i += 1
It's a first draft, so feel free to experiment with it. It also seems to be clunky and not optimal, but try running it first - it may be sufficient enough.

Thinking Python

I've been learning Python lately and tonight I was playing around with a couple of examples and I just came up with the following for fun:
#!/usr/bin/env python
a = range(1,21) # Range of numbers to print
max_length = 1 # String length of largest number
num_row = 5 # Number of elements per row
for l in a:
ln = len(str(l))
if max_length <= ln:
max_length = ln
for x in a:
format_string = '{:>' + str(max_length) + 'd}'
print (format_string).format(x),
if not x % num_row and x != 0:
print '\n',
Which outputs the following:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
The script is doing what I want, which is to print aligned rows of 5 numbers per row, calculating the largest width plus one; but I'm almost convinced that there is either a:
more "pythonic" way to do this
more efficient way to do this.
I'm not an expert in big O by any means but I believe that my two for loops change this from an O(n) to at least O(2n), so I would really like to see if it's possible to combine them somehow. I'm also not too keen on my format_string declaration, is there a better way to do that? You aren't helping me cheat on homework or anything, I think this would pass most Python classes, I just want to wrap my head more around the Python way of thinking as I'm coming primarily from Perl (not sure if it shows :). Thanks in advance!
You don't need to make format_string every time. Using str.rjust, you don't need to use format string.
Instead of using x % num_row (an element of list), use i (1-based index using enumerate(a, 1)). Think about a case a = range(3, 34).
You can drop i == 0 becaue i will never be 0.
not x % num_row is hard to understand. Use x % num_row == 0 instead.
a = range(1,21)
num_row = 5
a = map(str, a)
max_length = len(max(a, key=len))
for i, x in enumerate(a, 1):
print x.rjust(max_length),
if i % num_row == 0:
print
I think you could do more pythonic calculation of maxlength :)
max_length = len(str(max(a)))
if your numbers could be negative or float
max_length = max([len(str(x)) for x in a])
Another entry. Just to add one with a little functional programming. :-)
n = 20
f = lambda x: str(x).rjust(len(str(n+1))) + (" " if x % 5 else "\n")
print "".join(map(f, range(1,n+1))),
I'm not sure this is not more of a pessimization :-) but building on falsetru's answer a bit, we can use itertools.groupby to group each row by its row index. Because groupby needs a key, we have to enumerate the values, and then discard the enumeration index afterward:
a = range(1,21)
num_row = 5
a = map(str, a)
max_length = len(max(a, key=len))
(same as before, but now:)
from itertools import groupby
# assumes / is integer division - use // if needed
# (I had // but SO formats it as a comment)
for _, g in groupby(enumerate(a), lambda x: x[0] / num_row):
print ' '.join(x.rjust(max_length) for _, x in g)
Here each group g consists of all the (enumerated) values that make up each row, with their row number in front, so the inner generator for ' '.join needs to discard the row index (for _, x in g). That leaves just the string x, which gets right adjusted as before, and then the right-adjusted strings are joined with spaces between them. The resulting string is ready to be printed as a complete line.

Categories

Resources