How to make my program run faster for larger string inputs? - python

I am trying to operate on the number of occurences of each element and subtract an integer from it.
For now this is my code it takes a lot of time when the inputsize becomes very large.I have used Counter and it performs worse with it.
s=input()
C=int(input())
total=0
for i in set(s):
if s.count(i)>C:
total=total+s.count(i)-C
print(total)

You have collections.Counter class.
Counter is a subclass of dictionary to count distinct elements. And I think you don't care about its keys, but you need only count values.
from collections import Counter
s = input()
C = int(input())
total = sum(c - C for c in Counter(s).values() if c > C)
print(total)
explanation:
Counter(s).values() gives iterator for count values.
(c - C for c in ...) makes generator object with substracting and filtering.
sum(...) returns sum of it.

Related

Generating all possibilities with letters in python and exploiting results in python3

First, I got this problem: how many words are there (counting all of them, even those that don't make sense) of 5 letters that have at least one I and at least two T's, but no K or Y?
First, I defined the alphabet, which has 24 letters ( k and y aren't counted). After that, i made a code to generate all possibilites
alphabet = list(range(1, 24))
for L in range(0, len(alphabet)+1):
for subset in itertools.permutations(alphabet, L):
I don't know how to use the data.
If a “brute force” method is enough for you, this will work:
import string
import itertools
alphabet = string.ascii_uppercase.replace("K", "").replace("Y", "")
count = 0
for word in itertools.product(alphabet, repeat = 5):
if "I" in word and word.count("T") >= 2:
count += 1
print (count)
It will print the result 15645.
Note that you have to use itertools.product(), because itertools.permutations() will not contain repeated occurences, so it will never contain “T” twice.
Edit: Alternatively, you can calculate the count with a list comprehension or a generator expression. It takes advantage of the fact that boolean True and False are equivalent to integer values 1 and 0, respectively.
count = sum(
"I" in word and word.count("T") >= 2
for word in itertools.product(alphabet, repeat = 5)
)
NB: Interestingly, the first solution (explicit for loop with counter += 1) runs about 15 % faster on my computer than the second solution (generator expression with sum()). Both require the same amount of memory (this is expected).

Sorting a string to make a new one

Here I had to remove the most frequent alphabet of a string(if frequency of two alphabets is same, then in alphabetical order) and put it into new string.
Input:
abbcccdddd
Output:
dcdbcdabcd
The code I wrote is:
s = list(sorted(<the input string>))
a = []
for c in range(len(s)):
freq =[0 for _ in range(26)]
for x in s:
freq[ord(x)-ord('a')] += 1
m = max(freq)
allindices = [p for p,q in enumerate(freq) if q == m]
r = chr(97+allindices[0])
a.append(r)
s.remove(r)
print''.join(a)
But it passed the allowed runtime limit maybe due to too many loops.(There's another for loop which seperates the strings from user input)
I was hoping if someone could suggest a more optimised version of it using less memory space.
Your solution involves 26 linear scans of the string and a bunch of unnecessary
conversions to count the frequencies. You can save some work by replacing all those linear scans with a linear count step, another linear repetition generation, then a sort to order your letters and a final linear pass to strip counts:
from collections import Counter # For unsorted input
from itertools import groupby # For already sorted input
from operator import itemgetter
def makenewstring(inp):
# When inp not guaranteed to be sorted:
counts = Counter(inp).iteritems()
# Alternative if inp is guaranteed to be sorted:
counts = ((let, len(list(g))) for let, g in groupby(inp))
# Create appropriate number of repetitions of each letter tagged with a count
# and sort to put each repetition of a letter in correct order
# Use negative n's so much more common letters appear repeatedly at start, not end
repeats = sorted((n, let) for let, cnt in counts for n in range(0, -cnt, -1))
# Remove counts and join letters
return ''.join(map(itemgetter(1), repeats))
Updated: It occurred to me that my original solution could be made much more concise, a one-liner actually (excluding required imports), that minimizes temporaries, in favor of a single sort-by-key operation that uses a trick to sort each letter by the count of that letter seen so far:
from collections import defaultdict
from itertools import count
def makenewstring(inp):
return ''.join(sorted(inp, key=lambda c, d=defaultdict(count): (-next(d[c]), c)))
This is actually the same basic logic as the original answer, it just accomplishes it by having sorted perform the decoration and undecoration of the values implicitly instead of doing it ourselves explicitly (implicit decorate/undecorate is the whole point of sorted's key argument; it's doing the Schwartzian transform for you).
Performance-wise, both approaches are similar; they both (in practice) scale linearly for smaller inputs (the one-liner up to inputs around 150 characters long, the longer code, using Counter, up to inputs in the len 2000 range), and while the growth is super-linear above that point, it's always below the theoretical O(n log_2 n) (likely due to the data being not entirely random thanks to the counts and limited alphabet, ensuring Python's TimSort has some existing ordering to take advantage of). The one-liner is somewhat faster for smaller strings (len 100 or less), the longer code is somewhat faster for larger strings (I'm guessing it has something to do with the longer code creating some ordering by grouping runs of counts for each letter). Really though, it hardly matters unless the input strings are expected to be huge.
Since the alphabet will always be a constant 26 characters,
this will work in O(N) and only takes a constant amount of space of 26
from collections import Counter
from string import ascii_lowercase
def sorted_alphabet(text):
freq = Counter(text)
alphabet = filter(freq.get, ascii_lowercase) # alphabet filtered with freq >= 1
top_freq = max(freq.values()) if text else 0 # handle empty text eg. ''
for top_freq in range(top_freq, 0, -1): # from top_freq to 1
for letter in alphabet:
if freq[letter] >= top_freq:
yield letter
print ''.join(sorted_alphabet('abbcccdddd'))
print ''.join(sorted_alphabet('dbdd'))
print ''.join(sorted_alphabet(''))
print ''.join(sorted_alphabet('xxxxaaax'))
dcdbcdabcd
ddbd
xxaxaxax
What about this?
I am making use of in-built python functions to eliminate loops and improve efficiency.
test_str = 'abbcccdddd'
remaining_letters = [1] # dummy initialisation
# sort alphabetically
unique_letters = sorted(set(test_str))
frequencies = [test_str.count(letter) for letter in unique_letters]
out = []
while(remaining_letters):
# in case of ties, index takes the first occurence, so the alphabetical order is preserved
max_idx = frequencies.index(max(frequencies))
out.append(unique_letters[max_idx])
#directly update frequencies instead of calculating them again
frequencies[max_idx] -= 1
remaining_letters = [idx for idx, freq in enumerate(frequencies) if freq>0]
print''.join(out) #dcdbcdabcd

How to check if sum of 3 integers in list matches another integer? (python)

Here's my issue:
I have a large integer (anywhere between 0 and 2^32-1). Let's call this number X.
I also have a list of integers, unsorted currently. They are all unique numbers, greater than 0 and less than X. Assume that there is a large amount of items in this list, let's say over 100,000 items.
I need to find up to 3 numbers in this list (let's call them A, B and C) that add up to X.
A, B and C all need to be inside of the list, and they can be repeated (for example, if X is 4, I can have A=1, B=1 and C=2 even though 1 would only appear once in the list).
There can be multiple solutions for A, B and C but I just need to find one possible solution for each the quickest way possible.
I've tried creating a for loop structure like this:
For A in itemlist:
For B in itemlist:
For C in itemlist:
if A + B + C == X:
exit("Done")
But since my list of integers contains over 100,000 items, this uses too much memory and would take far too long.
Is there any way to find a solution for A, B and C without using an insane amount of memory or taking an insane amount of time? Thanks in advance.
you can reduce the running time from n^3 to n^2 by using set something like that
s = set(itemlist)
for A in itemlist:
for B in itemlist:
if X-(A+B) in s:
print A,B,X-(A+B)
break
you can also sort the list and use binary search if you want to save memory
import itertools
nums = collections.Counter(itemlist)
target = t # the target sum
for i in range(len(itemlist)):
if itemlist[i] > target: continue
for j in range(i+1, len(itemlist)):
if itemlist[i]+itemlist[j] > target: continue
if target - (itemlist[i]+itemlist[j]) in nums - collections.Counter([itemlist[i], itemlist[j]]):
print("Found", itemlist[i], itemlist[j], target - (itemlist[i]+itemlist[j]))
Borrowing from #inspectorG4dget's code, this has two modifications:
If C < B then we can short-circuit the loop.
Use bisect_left() instead of collections.Counter().
This seems to run more quickly.
from random import randint
from bisect import bisect_left
X = randint(0, 2**32 - 1)
itemset = set(randint(0,X) for _ in range(100000))
itemlist = sorted(list(itemset)) # sort the list for binary search
l = len(itemlist)
for i,A in enumerate(itemlist):
for j in range(i+1, l): # use numbers above A
B = itemlist[j]
C = X - A - B # calculate C
if C <= B: continue
# see https://docs.python.org/2/library/bisect.html#searching-sorted-lists
i = bisect_left(itemlist, C)
if i != l and itemlist[i] == C:
print("Found", A, B, C)
To reduce the number of comparisons, we enforce A < B < C.

Using enumerate function in while loops

I have several mathematical algorithms that use iteration to search for the right answer. Here is one example:
def Bolzano(fonction, a, b, tol=0.000001):
while abs(b - a) > tol:
m = (a + b) / 2
if sign(fonction(m)) == sign(fonction(a)):
a = m
else:
b = m
return a, b
I want to count how many times the algorithm goes through the loop to get a and b. However this is not a for function and it isn't a list, so I can't clearly indicate what objects do I want to count if I use enumerate. Is there a way to count those loops?
Note: I am not trying to change the code itself. I am really searching for a way to count iterations in a while loop, which I can then use for other cases.
The simplest answer if you need a counter outside of a for loop is to count manually using a simple variable and addition inside your while loop:
count = 0
while condition:
...
count += 1
There is an alternative case - if each step of your iteration has a meanigful value to yield, you may want your loop to be a generator, and then use for loop and enumerate(). This makes most sense where you care about which step you are on, not just the count at the end. E.g:
def produce_steps():
while condition:
...
yield step_value
for count, step_value in enumerate(produce_steps()):
...
For a counter I use count from itertools:
from itertools import count
c = count(1)
>>>next(c)
1
>>>next(c)
2
and so on...
Syntax
count(start=0, step=1)
Docs
Make an iterator that returns evenly spaced values starting with
number start.
Ref. itertools.count

Python: simplifying nested FOR loop?

I am wondering if there is a way to simplify the nested loop below. The difficulty is that the iterator for each loop depends on things from the previous loops. Here is the code:
# Find the number of combinations summing to 200 using the given list of coin
coin=[200,100,50,20,10,5,2,1]
total=[200,0,0,0,0,0,0,0]
# total[j] is the remaining sum after using the first (j-1) types of coin
# as specified by i below
count=0
# count the number of combinations
for i in range(int(total[0]/coin[0])+1):
total[1]=total[0]-i*coin[0]
for i in range(int(total[1]/coin[1])+1):
total[2]=total[1]-i*coin[1]
for i in range(int(total[2]/coin[2])+1):
total[3]=total[2]-i*coin[2]
for i in range(int(total[3]/coin[3])+1):
total[4]=total[3]-i*coin[3]
for i in range(int(total[4]/coin[4])+1):
total[5]=total[4]-i*coin[4]
for i in range(int(total[5]/coin[5])+1):
total[6]=total[5]-i*coin[5]
for i in range(int(total[6]/coin[6])+1):
total[7]=total[6]-i*coin[6]
count+=1
print count
I recommend looking at http://labix.org/python-constraint which is a Python constraint library. One of its example files is actually permutations of coinage to reach a specific amount, and it all handles it for you once you specify the rules.
You can get rid of all the int casting. An int/int is still an int in python ie integer division.
it looks like Recursion would clean this up nicly
count = 0
coin=[200,100,50,20,10,5,2,1]
total=[200,0,0,0,0,0,0,0]
def func(i):
global count,total,coin
for x in range(total[i-1]/coin[i-1]+1):
total[i]=total[i-1]-x*coin[i-1]
if (i == 7):
count += 1
else:
func(i+1)
func(1)
print count
combinations = set()
for i in range(len(coins)):
combinations = combinations | set(for x in itertools.combinations(coins, i) if sum(x) == 200)
print len(combinations)
It's a little slow, but it should work.

Categories

Resources