Finding the highest sum of a string in a list - python

In this task, we creating a function called highsum. It looks at a list of strings and sums up all the numerical characters. We then return the location of the element of the list with the highest value.
For example given list highestSum([“jx72i9r”, “9ch37#r2”, “8rgku3op8”]).
We then must find [17, 21,19] which is all the numerical values added up. Because 21 is the highest value we need the function to return 1 because that is the location of 9ch37#r2 in the list.
This is what I have so far:-
def highestSum(stringList):
number= 0
for xinlist in stringList:
print(xinlist)
for yoflist in xinlist:
print (yoflist)
if yoflist in "1234567890":
number+=int(yoflist)
print(number)
The first for loop cycles through each element while the second for loop cycles through each character in the elements. My accumulator variable work but the problem is I don't know how to let it know when it moves on to a new element.
Another example highestSum([“90gw1xy3”, “12432”, “hfkvbd2”, “*&hdj!4”])
this would return 0 as it as the highest sum of digit characters.

Homemade version
Basically, we gather turn each element in the given list to there digits sum. So like [18,21,19]. Then pair this with original list using zip(). Then use .index() in order to get the corresponding index.
hilarious one-liner
def highestSum(stringList):
return stringList.index({k:v for k,v in zip([sum(list(map(int,[character for character in stringEx if character.isdigit()]))) for stringEx in stringList],stringList)}[max(summed)])
List comprehension & dict comprehension
def highestSum(stringList):
summed = [sum(list(map(int,[character for character in stringEx if character.isdigit()]))) for stringEx in stringList]
highest = max(summed)
pair = {k:v for k,v in zip(summed,stringList)}
return stringList.index(pair[highest])
print(highestSum(["jx72i9r", "9ch37#r2", "8rgku3op8"]))
Easier to understand for the human eye.
def highestSum(stringList):
summed = []
for stringEx in stringList:
gathered = []
for character in stringEx:
if character.isdigit():
gathered.append(character)
gathered = sum(list(map(int,gathered)))
summed.append(gathered)
highest = max(summed)
pair = {}
for k,v in zip(summed,stringList):
pair[k] = v
return stringList.index(pair[highest])
print(highestSum(["jx72i9r", "9ch37#r2", "8rgku3op8"]))
output
1

I have modified your code. Look at this solution. You need a condition to for the highest number in the list and a variable to keep track of its index. See solution below:
def highestSum(stringList):
index = 0
highestValue = 0
for xinlist in stringList:
number= 0
for yoflist in xinlist:
if yoflist in "1234567890":
number+=int(yoflist)
if number > highestValue:
index = stringList.index(xinlist)
highestValue = number
print(index)
stringList = ['jx72i9r', '9ch37#r2', '8rgku3op8']
highestSum(stringList)

def highestSum(stringList):
res = list()
for xinlist in stringList:
number = 0
for yoflist in xinlist:
if yoflist in "1234567890":
number += int(yoflist)
res.append(number)
# find the max position form the accumulator
maxNumPos = 0
for i in range(len(res)):
if res[i] > res[maxNumPos]:
maxNumPos = i
print(maxNumPos)
highestSum(['90gw1xy3', '12432', 'hfkvbd2', '*&hdj!4'])

I'll use list comprehension, maybe it's difficult to read.
code:
x = ["jx72i9r", "9ch37#r2", "8rgku3op8"]
x_to_int = [sum([int(char) for char in s if char.isdigit()]) for s in x]
print(x_to_int.index(max(x_to_int)))
result:
1

Related

Finding the substring with the most repeats in a dictionary with dna sequences

The substring has to be with 6 characters. The number I'm gettig is smaller than it should be.
first I've written code to get the sequences from a file, then put them in a dictionary, then written 3 nested for loops: the first iterates over the dictionary and gets a sequence in each iteration. The second takes each sequence and gets a substring with 6 characters from it. In each iteration, the second loop increases the index of the start of the string (the long sequence) by 1. The third loop takes each substring from the second loop, and counts how many times it appears in each string (long sequence).
I tried rewriting the code many times. I think I got very close. I checked if the loops actually do their iterations, and they do. I even checked manually to see if the counts for a substring in random sequences are the same as the program gives, and they are. Any idea? maybe a different approach? what debugger do you use for Python?
I added a file with 3 shortened sequences for testing. Maybe try smaller substring: say with 3 characters instead of 6: rep_len = 3
The code
matches = []
count = 0
final_count = 0
rep_len = 6
repeat = ''
pos = 0
seq_count = 0
seqs = {}
f = open(r"file.fasta")
# inserting each sequences from the file into a dictionary
for line in f:
line = line.rstrip()
if line[0] == '>':
seq_count += 1
name = seq_count
seqs[name] = ''
else:
seqs[name] += line
for key, seq in seqs.items(): # getting one sequence in each iteration
for pos in range(len(seq)): # setting an index and increasing it by 1 in each iteration
if pos <= len(seq) - rep_len: # checking no substring from the end of the sequence are selected
repeat = seq[pos:pos + rep_len] # setting a substring
if repeat not in matches: # checking if the substring was already scanned
matches.append(repeat) # adding the substring to previously checked substrings' list
for key1, seq2 in seqs.items(): # iterating over each sequence
count += seq2.count(repeat) # counting the substring's repetitions
if count > final_count: # if the count is greater than the previously saved greatest number
final_count = count # the new value is saved
count = 0
print('repetitions: ', final_count) # printing
sequences.fasta
The code is not very clear, so it is a bit difficult to debug. I suggest rewriting.
Anyway, I (currently) just noted one small mistake:
if pos < len(seq) - rep_len:
Should be
if pos <= len(seq) - rep_len:
Currently, the last character in each sequence is ignored.
EDIT:
Here some rewriting of your code that is clearer and might help you investigate the errors:
rep_len = 6
seq_count = 0
seqs = {}
filename = "dna2.txt"
# Extract the data into a dictionary
with open(filename, "r") as f:
for line in f:
line = line.rstrip()
if line[0] == '>':
seq_count += 1
name = seq_count
seqs[name] = ''
else:
seqs[name] += line
# Store all the information, so that you can reuse it later
counter = {}
for key, seq in seqs.items():
for pos in range(len(seq)-rep_len):
repeat = seq[pos:pos + rep_len]
if repeat in counter:
counter[repeat] += 1
else:
counter[repeat] = 1
# Sort the counter to have max occurrences first
sorted_counter = sorted(counter.items(), key = lambda item:item[1], reverse=True )
# Display the 5 max occurrences
for i in range(5):
key, rep = sorted_counter[i]
print("{} -> {}".format(key, rep))
# GCGCGC -> 11
# CCGCCG -> 11
# CGCCGA -> 10
# CGCGCG -> 9
# CGTCGA -> 9
It might be easier to use Counter from the collections module in Python. Also check out the NLTK library.
An example:
from collections import Counter
from nltk.util import ngrams
sequence = "cggttgcaatgagcgtcttgcacggaccgtcatgtaagaccgctacgcttcgatcaacgctattacgcaagccaccgaatgcccggctcgtcccaacctg"
def reps(substr):
"Counts repeats in a substring"
return sum([i for i in Counter(substr).values() if i>1])
def make_grams(sent, n=6):
"splits a sentence into n-grams"
return ["".join(seq) for seq in (ngrams(sent,n))]
grams = make_grams(sequence) # splits string into substrings
max_length = max(list(map(reps, grams))) # gets maximum repeat count
result = [dna for dna in grams if reps(dna) == max_length]
print(result)
Output: ['gcgtct', 'cacgga', 'acggac', 'tgtaag', 'agaccg', 'gcttcg', 'cgcaag', 'gcaagc', 'gcccgg', 'cccggc', 'gctcgt', 'cccaac', 'ccaacc']
And if the question is look for the string with the most repeated character:
repeat_count = [max(Counter(a).values()) for a in result] # highest character repeat count
result_dict = {dna:ct for (dna,ct) in zip(result, repeat_count)}
another_result = [dna for dna in result_dict.keys() if result_dict[dna] == max(repeat_count)]
print(another_result)
Output: ['cccggc', 'cccaac', 'ccaacc']

indexError: list indexing error and wrongful tracking of counters

The goal of the program is to define a procedure that takes in a string of numbers from 1-9 and outputs a list with the following parameters:
Every number in the string should be inserted into the list.
If a number x in the string is less than or equal to the preceding number y, the number x should be inserted into a sublist. Continue adding the following numbers to the sublist until reaching a number z that is greater than the number y.
Then add this number z to the normal list and continue.
#testcases
string = '543987'
numbers_in_lists(string)
result = [5,[4,3],9,[8,7]]
def numbers_in_lists(string):
# Convert the sequence of strings into an array of numbers
i = 0
conv_str_e = []
while i < len(string):
conv_str_e.append(int(string[i]))
i += 1
#official code starts here
normal_list = []
list_of_small_nums = [[]]
# This will help me keep track of elements of the normal_list.
previous_e_pos = -1
e_pos = 0
# this one will be used to decide if the element should go into the
#normal_list or list_of_small_nums
check_point = 0
for e in conv_str_e:
#The first element and every elements bigger the element with
#check_point as it's index
#will be put into the normal_list as long as the list inside
#list_of_small_nums is empty
if e_pos == 0 or e > conv_str_e[check_point]:
#If the list inside list_of_small_nums is not empty
if list_of_small_nums[0] != []:
#concatenate the normal_list and list_of_small_nums
normal_list = normal_list + list_of_small_nums[0]
#Clear the list inside list_of_small_nums
list_of_small_nums[0] = []
#Add the element in the normal_list
normal_list.append(e)
# Update my trackers
e_pos += 1
previous_e_pos += 1
# (not sure this might be the error)
check_point = e_pos
#The curent element is not bigger then the element with the
#check_point as index position
#Therefor it goes into the sublist.
list_of_small_nums[0].append(e)
e_pos += 1
previous_e_pos += 1
return list
What you were doing wrong was exactly what you pointed out in your comments. You just kept increasing e_pos and so check_point eventually was greater than the length of the list.
I took the liberty of changing some things to simplify the process. Simple programs make it easier to figure out what is going wrong with them. Make sure you always try to think about the simplest way first to solve your problem! Here, I replaced the need for e_pos and previous_e_pos by using enumerate:
string = '543987'
# Convert the sequence of strings into an array of numbers
conv_str_e = [int(i) for i in string]
#official code starts here
normal_list = []
list_of_small_nums = []
# this one will be used to decide if the element should go into the
#normal_list or list_of_small_nums
check_point = 0
for ind, e in enumerate(conv_str_e):
#The first element and every elements bigger the element with
#check_point as it's index
#will be put into the normal_list as long as the list inside
#list_of_small_nums is empty
if ind == 0 or e > conv_str_e[check_point]:
#If the list inside list_of_small_nums is not empty
if list_of_small_nums != []:
#concatenate the normal_list and list_of_small_nums
normal_list.append(list_of_small_nums)
#Clear the list inside list_of_small_nums
list_of_small_nums = []
#Add the element in the normal_list
normal_list.append(e)
# Update my trackers
check_point = ind
else:
#The curent element is not bigger then the element with the
#check_point as index position
#Therefore it goes into the sublist.
list_of_small_nums.append(e)
# If there is anything left, add it to the list
if list_of_small_nums != []:
normal_list.append(list_of_small_nums)
print(normal_list)
Result:
[5, [4, 3], 9, [8, 7]]
I am sure you can change it appropriately from here to put it back in your function.

In Python, how to count the number '1's in a list of email addresses?

Here's what I have so far:
emails = ['james1#example.com', 'januline12#januline.com', 'fillip.morris#pm.com',
'ray#bradbury.org', 'me+you#hotmail.com', 'seven11#gmail.com',
'noreply#msd.com', 'cars4u#tesla.com']
def email_security_scan(a, b):
numbers = []
for item in a:
for subitem in item.split():
if subitem == b:
numbers.append +=1
print(numbers)
email_security_scan(emails, 1)
It doesn't work, returns [].
There are some problems with your logic:
You are using str.split on strings with no whitespace. There's no need to split your string.
You are comparing integers with strings. This won't work.
You are appending to a list instead of incrementing a counter variable.
So this would work if you need to count the number of email addresses containing '1':
def email_security_scan(a, b):
count = 0
for item in a:
if str(b) in item:
count += 1
print(count)
email_security_scan(emails, 1)
# 3
More simply, you can use sum with a generator expression:
def email_security_scan(a, b):
print(sum(str(b) in item for item in a))
email_security_scan(emails, 1)
# 3
Or to count the total number of '1's across all email addresses, you can use str.join and then str.count:
def email_security_scan(a, b):
print(''.join(a).count(str(b)))
email_security_scan(emails, 1)
# 4
You can represent it like so:
def checker(x):
if "1" in x:
return 1
else:
return 0
Now getting a count can be as easy as:
emails = [checker(x) for x in emails]
sum(emails)
What's going on here?
checker looks for one or more "1" in an entry of your list.
emails = [checker(x) for x in emails] applies checker to each
entry of your list and overwrites the emails variable with our new list comprehension.
sum does just what you would expect.
The below code will count the number of 1's in each of the items in the list and give the result.
emails = ['james1#example.com', 'januline12#januline.com', 'fillip.morris#pm.com',
'ray#bradbury.org', 'me+you#hotmail.com', 'seven11#gmail.com',
'noreply#msd.com', 'cars4u#tesla.com']
def email_security_scan(a, b):
numbers = 0
for item in a:
numbers += item.count(b)
print(numbers)
email_security_scan(emails, '1')
For the above example, the output will be 4.
Here is another way using sum() on generator:
emails = [ 'james1#example.com', 'januline12#januline.com', 'fillip.morris#pm.com',
'ray#bradbury.org', 'me+you#hotmail.com', 'seven11#gmail.com',
'noreply#msd.com', 'cars4u#tesla.com' ]
print(sum(1 for email in emails if '1' in email))
# 3
Or:
print(sum('1' in email for email in emails))
# 3
emails = [ 'james1#example.com', 'januline12#januline.com', 'fillip.morris#pm.com',
'ray#bradbury.org', 'me+you#hotmail.com', 'seven11#gmail.com',
'noreply#msd.com', 'cars4u#tesla.com' ]
sum([i.count('1') for i in emails])
Output : 4

Check the most frequent letter(s) in a word. Python

My task is:
To write a function that gets a string as an argument and returns the letter(s) with the maximum appearance in it.
Example 1:
s = 'Astana'
Output:
a
Example 2:
s = 'Kaskelen'
Output:
ke
So far, I've got this code(click to run):
a = input()
def most_used(w):
a = list(w)
indexes = []
g_count_max = a.count(a[0])
for letter in a:
count = 0
i = int()
for index in range(len(a)):
if letter == a[index] or letter == a[index].upper():
count += 1
i = index
if g_count_max <= count: //here is the problem.
g_count_max = count
if i not in indexes:
indexes.append(i)
letters = str()
for i in indexes:
letters = letters + a[i].lower()
return letters
print(most_used(a))
The problem is that it automatically adds first letter to the array because the sum of appearance of the first element is actually equal to the starter point of appearance(which is basically the first element).
Example 1:
s = 'hheee'
Output:
he
Example 2:
s = 'malaysia'
Output:
ma
I think what you're trying to can be much simplified by using the standard library's Counter object
from collections import Counter
def most_used(word):
# this has the form [(letter, count), ...] ordered from most to least common
most_common = Counter(word.lower()).most_common()
result = []
for letter, count in most_common:
if count == most_common[0][1]:
result.append(letter) # if equal largest -- add to result
else:
break # otherwise don't bother looping over the whole thing
return result # or ''.join(result) to return a string
You can use a dictionary comprehension with a list comprehension and max():
s = 'Kaskelen'
s_lower = s.lower() #convert string to lowercase
counts = {i: s_lower.count(i) for i in s_lower}
max_counts = max(counts.values()) #maximum count
most_common = ''.join(k for k,v in counts.items() if v == max_counts)
Yields:
'ke'
try this code using list comprehensions:
word = input('word=').lower()
letters = set(list(word))
max_w = max([word.count(item) for item in letters])
out = ''.join([item for item in letters if word.count(item)==max_w])
print(out)
Also you can import Counter lib:
from collections import Counter
a = "dagsdvwdsbd"
print(Counter(a).most_common(3)[0][0])
Then it returns:
d

Reading a text file to print frequency of letters in decreasing order - Python 3

I am doing python basic challenges this is one of them. What all I needed to do is to read through a file and print out the frequency of letters in decreasing order. I am able to do this but I wanted to enhance the program by also printing out the frequency percentage alongside with the letter - frequency - freq%. Something like this: o - 46 - 10.15%
This is what I did so far:
def exercise11():
import string
while True:
try:
fname = input('Enter the file name -> ')
fop = open(fname)
break
except:
print('This file does not exists. Please try again!')
continue
counts = {}
for line in fop:
line = line.translate(str.maketrans('', '', string.punctuation))
line = line.translate(str.maketrans('', '', string.whitespace))
line = line.translate(str.maketrans('', '', string.digits))
line = line.lower()
for ltr in line:
if ltr in counts:
counts[ltr] += 1
else:
counts[ltr] = 1
lst = []
countlst = []
freqlst = []
for ltrs, c in counts.items():
lst.append((c, ltrs))
countlst.append(c)
totalcount = sum(countlst)
for ec in countlst:
efreq = (ec/totalcount) * 100
freqlst.append(efreq)
freqlst.sort(reverse=True)
lst.sort(reverse=True)
for ltrs, c, in lst:
print(c, '-', ltrs)
exercise11()
As you can see I am able to calculate and sort the freq% on a different list but I am not able to include it in the tuple of the lst[] list alongside with the letter, freq. Is there any way to solve this problem?
Also if you have any other suggestions for my code. Please do mention.
Output Screen
Modification
Applying a simple modification as mentioned by #wwii I got the desired output. All I had to do is add one more parameter to the print statement while iterating the lst[] list. Previously I tried to make another list for the freq%, sort and then tried to insert it to the letters-count tuple in a list which didn't work out.
for ltrs, c, in lst:
print(c, '-', ltrs, '-', round(ltrs/totalcount*100, 2), '%')
Output Screen
Your count data is in a dictionary of {letter:count} pairs.
You can use the dictionary to calculate the total count like this:
total_count = sum(counts.values())
Then don't calculate the percentage till you are iterating over the counts...
for letter, count in counts.items():
print(f'{letter} - {count} - {100*count/total}') #Python v3.6+
#print('{} - {} - {}'.format(letter, count, 100*count/total) #Python version <3.6+
Or if you want to put it all in a list so you can sort it:
data = []
for letter, count in counts.items():
data.append((letter,count,100*count/total)
Using operator.itemgetter for the sort key function can help code readability.
import operator
letter = operator.itemgetter(0)
count = operator.itemgetter(1)
frequency = operator.itemgetter(2)
data.sort(key=letter)
data.sort(key=count)
data.sort(key=frequency)
Tuples are immutable which is probably the issue you are finding. The other issue is the simple form of the sort function; A more-advanced sort function would serve you well. See below:
The list-of-tuples format of lst, but because tuples are immutable whereas lists are mutable, opting to change lst to a list-of-lists is a valid approach. Then, since lst is a list-of-lists with each element consisting of 'letter,count,frequency%', the sort function with lambda can be used to sort by whichever index you'd like. The following is to be inserted after your for line in fop: loop.
lst = []
for ltrs, c in counts.items():
lst.append([ltrs,c])
totalcount = sum([x[1] for x in lst]) # sum all 'count' values in a list comprehension
for elem in lst:
elem.append((elem[1]/totalcount)*100) # now that each element in 'lst' is a mutable list, you can append the calculated frequency to the respective element in lst
lst.sort(reverse=True,key=lambda lst:lst[2]) # sort in-place in reverse order by index 2.
The items in freqlst,countlist, and lst are related to each other by their position. If any are sorted that relationship is lost.
zipping the lists together before sorting will maintain the relationship.
Will pick up from your list initialization lines.
lst = []
countlst = []
freqlst = []
for ltr, c in counts.items():
#change here, lst now only contains letters
lst.append(ltr)
countlst.append(c)
totalcount = sum(countlst)
for ec in countlst:
efreq = (ec/totalcount) * 100
freqlst.append(efreq)
#New stuff here: Note this only works in python 3+
zipped = zip(lst, countlst, freqlst)
zipped = sorted(zipped, key=lambda x: x[1])
for ltr, c, freq in zipped:
print("{} - {} - {}%".format(ltr, c, freq)) # love me the format method :)
Basically, zip combines lists together into a list of tuples. Then you can use a lambda function to sort those tuples (very common stack question)
I think I was able to achieve what you wanted by using lists instead of tuples. Tuples cannot be modified, but if you really want to know how click here
(I also added the possibility to quit the program)
Important: Never forget to comment your code
The code:
def exercise11():
import string
while True:
try:
fname = input('Enter the file name -> ')
print('Press 0 to quit the program') # give the User the option to quit the program easily
if fname == '0':
break
fop = open(fname)
break
except:
print('This file does not exists. Please try again!')
continue
counts = {}
for line in fop:
line = line.translate(str.maketrans('', '', string.punctuation))
line = line.translate(str.maketrans('', '', string.whitespace))
line = line.translate(str.maketrans('', '', string.digits))
line = line.lower()
for ltr in line:
if ltr in counts:
counts[ltr] += 1
else:
counts[ltr] = 1
lst = []
countlst = []
freqlst = []
for ltrs, c in counts.items():
# add a zero as a place holder &
# use square brakets so you can use a list that you can modify
lst.append([c, ltrs, 0])
countlst.append(c)
totalcount = sum(countlst)
for ec in countlst:
efreq = (ec/totalcount) * 100
freqlst.append(efreq)
freqlst.sort(reverse=True)
lst.sort(reverse=True)
# count the total of the letters
counter = 0
for ltrs in lst:
counter += ltrs[0]
# calculate the percentage for each letter
for letter in lst:
percentage = (letter[0] / counter) * 100
letter[2] += float(format(percentage, '.2f'))
for i in lst:
print('The letter {} is repeated {} times, which is {}% '.format(i[1], i[0], i[2]))
exercise11()
<?php
$fh = fopen("text.txt", 'r') or die("File does not exist");
$line = fgets($fh);
$words = count_chars($line, 1);
foreach ($words as $key=>$value)
{
echo "The character <b>' ".chr($key)." '</b> was found <b>$value</b> times. <br>";
}
?>

Categories

Resources