How to iterate over string after using split function

How to iterate over string after using split function - python

Here is the string
ds = 'Java; Python; Ruby; SQL'
I have used slip function to split each language and found the count of one language.
if I want to find if any of 2 languages occur in a string it's returning zero.
example: in the above variable as both java and python are present it has to return count as one.
def language_both_jp(y):
count = 0
prog = (y.split(";"))
for i in range(0,len(prog)):
if(prog[i] == 'Java'):
for i in range(0,len(prog)):
if(prog[i] == 'Python'):
count += 1
return count

You could so something like this, instead use a dictionary to store the counts of each language:
ds = 'Java; Python; Ruby; SQL'
counts = {}
for chunk in ds.split(';'):
language = chunk.strip()
counts[language] = counts.get(language, 0) + 1
print(counts)
Output
{'Java': 1, 'SQL': 1, 'Python': 1, 'Ruby': 1}
A more pythonic approach will be to use collections.Counter:
from collections import Counter
ds = 'Java; Python; Ruby; SQL'
counts = Counter(language.strip() for language in ds.split(';'))
print(counts)
Output
Counter({'Java': 1, 'Ruby': 1, 'Python': 1, 'SQL': 1})
Once you have a mapping-like object with the counts of each language, iterate over the key, value pairs and output those with count above 1, for example:
from collections import Counter
ds = 'Java; Python; Ruby; SQL; Python'
counts = Counter(language.strip() for language in ds.split(';'))
for language, count in counts.items():
if count > 1:
print(language, count)
Output
Python 2
Note that the input string in the above example was slightly modified to include Python twice.

The problem is that you should be splitting by "; ", so you should have:
def language_both_jp(y):
count = 0
prog = (y.split("; "))
for i in range(0,len(prog)):
if(prog[i] == 'Java'):
for i in range(0,len(prog)):
if(prog[i] == 'Python'):
count += 1
return count
language_both_jp(ds)
#1
A simpler approach would be:
def language_both_jp(x, l):
return 1 if [i for i in ds.split("; ") if i in l] else 0
language_both_jp(ds, ['Python','Java'])
#1

your requirement is not clearly understandable...
however please try below solutions...
i. if you want to find the count of occurrence of a word just pass the string and the wordas argument..
try below code...
def language_both_jp(y, word):
count = 0
prog = (y.split(";"))
for i in range(0,len(prog)):
if(prog[i] == word):
count += 1
return count
string = 'java;python;java;python;c;python'
print(language_both_jp(string, 'java'))
it will print the count of occurrence of the word
ii. If you want to fine the occurrence of two words..
try below code..
def language_both_jp(y, word1,word2):
count1 = 0
count2 = 0
prog = (y.split(";"))
for i in range(0,len(prog)):
if(prog[i] == word1):
count1 += 1
if(prog[i] == word2):
count2 += 1
return 'occurrence of '+word1+'='+str(count1)+'\n'+'occurrence of '+word2+'='+str(count2)
args = 'java;python;java;python;c;python'
print(language_both_jp(args, 'java','python'))
iii. If you want to find the presence of any two words...
try below code
def language_both_jp(y, word1,word2):
count = ''
prog = (y.split(";"))
for i in range(0,len(prog)):
if(prog[i] == word1):
for i in range(0, len(prog)):
if(prog[i] == word2):
count = 'yes'
else:
count = 'no'
return count
args = 'java;python;java;python;c;python'
print(language_both_jp(args, 'java','python'))
please ask if you have any doubts...

Related

Python: how to replace a substring with a number of its occurences?

Let's say I have a string presented in the following fashion:
st = 'abbbccccaaaAAbccc'
The task is to encode it so that single characters are followed by a number of their occurences:
st = 'a1b3c4a3A2b1c3'
I know one possible solution but it's too bulky and primitive.
s = str(input())
l = len(s)-1
c = 1
t = ''
if len(s)==1:
t = t +s+str(c)
else:
for i in range(0,l):
if s[i]==s[i+1]:
c +=1
elif s[i]!=s[i+1]:
t = t + s[i]+str(c)
c = 1
for j in range(l,l+1):
if s[-1]==s[-2]:
t = t +s[j]+str(c)
elif s[-1]!=s[-2]:
t = t +s[j]+str(c)
c = 1
print(t)
Is there any way to solve this shortly and elegantly?
P.S: I'm an unexperienced Python user and a new StackOverflow member, so I beg my pardon if the question is asked incorrectly.

Take advantage of the standard library:
from itertools import groupby
st = "abbbccccaaaAAbccc"
print("".join("{}{}".format(key, len(list(group))) for key, group in groupby(st)))
Output:
a1b3c4a3A2b1c3
>>>

just loop through and count. There are more graceful snippets but this will get the job done and is clear:
count = 1
char = st[0]
new_st = []
for c in st[1:]:
if c == char:
count += 1
else:
new_st.append(char + str(count))
char = c
count = 1
new_st.append(char + str(count))
s2= "".join(new_st)
print(s2) # 'a1b3c4a3A2b1c3'
If you want a fancy recursive solution:
def compose(s):
if not s:
return ""
count = 1
for char in s[1:]:
if s[0] != char:
break
count += 1
return s[0] + str(count) + compose(s[count:])

How do I count if a string is within the substrings of a list?

For example:
list_strings = 'dietcoke', 'dietpepsi', 'sprite'
Here's what I did:
count = 0
list =
for ch in list_strings:
if ch == sub:
count += 1
print((list_strings, 'diet') == 2) is suppose to return True but it returns False.

I hope I understood you correctly.
Just use in to check if your substring is present in the mainstring.
list = ['dietcoke', 'dietpepsi', 'sprite']
Your function should look like this:
def myfuncname(list_strings, sub_string):
count = 0
for ch in list_strings:
if sub_string in ch:
count += 1
return count
If we call count now, we get count == 2
>>> print(myfuncname(list_strings, 'diet') == 2)
True
Hope that solved your problem.

Python | find method

def count_substring(string, sub_string):
count = 0
for i in range(0 , len(string)):
if ( string[i: ].find(sub_string)) == True:
count = count +1
return count
STRING = 'ininini'
SUB_STRING = 'ini'
CORRECT OUTPUT : 3
MY OUTPUT : 2
it is not detecting the last substring.

the problem is that
string[i:].find(sub_string)
returns -1 if not found or the position if found. You want to test for 0 you're testing for position 1 (aka True) (https://docs.python.org/3/library/stdtypes.html#str.find).
It's not "not detecting the last substring", it's detecting bogus matches.
You could use startswith instead:
def count_substring(string, sub_string):
count = 0
for i in range(0,len(string)):
if string[i:].startswith(sub_string):
count += 1
return count
Note that using find isn't a bad idea at all, since you don't have to slice the string (it's faster), there's a start position parameter which is handy here:
def count_substring(string, sub_string):
count = 0
for i in range(0,len(string)):
if string.find(sub_string,i) == i:
count += 1
return count
or in one line:
def count_substring(string, sub_string):
return sum(1 for i in range(len(string)) if string.find(sub_string,i) == i)
note that string.count(sub_string) doesn't yield the same result because it doesn't consider overlapping strings, like your solution does.

Trying to output the x most common words in a text file

I'm trying to write a program that will read in a text file and output a list of most common words (30 as the code is written now) along with their counts. so something like:
word1 count1
word2 count2
word3 count3
... ...
... ...
wordn countn
in order of count1 > count2 > count3 >... >countn. This is what I have so far but I cannot get the sorted function to perform what I want. The error I get now is:
TypeError: list indices must be integers, not tuple
I'm new to python. Any help would be appreciated. Thank you.
def count_func(dictionary_list):
return dictionary_list[1]
def print_top(filename):
word_list = {}
with open(filename, 'r') as input_file:
count = 0
#best
for line in input_file:
for word in line.split():
word = word.lower()
if word not in word_list:
word_list[word] = 1
else:
word_list[word] += 1
#sorted_x = sorted(word_list.items(), key=operator.itemgetter(1))
# items = sorted(word_count.items(), key=get_count, reverse=True)
word_list = sorted(word_list.items(), key=lambda x: x[1])
for word in word_list:
if (count > 30):#19
break
print "%s: %s" % (word, word_list[word])
count += 1
# This basic command line argument parsing code is provided and
# calls the print_words() and print_top() functions which you must define.
def main():
if len(sys.argv) != 3:
print 'usage: ./wordcount.py {--count | --topcount} file'
sys.exit(1)
option = sys.argv[1]
filename = sys.argv[2]
if option == '--count':
print_words(filename)
elif option == '--topcount':
print_top(filename)
else:
print 'unknown option: ' + option
sys.exit(1)
if __name__ == '__main__':
main()

Use the collections.Counter class.
from collections import Counter
for word, count in Counter(words).most_common(30):
print(word, count)
Some unsolicited advice: Don't make so many functions until everything is working as one big block of code. Refactor into functions after it works. You don't even need a main section for a script this small.

Using itertools' groupby:
from itertools import groupby
words = sorted([w.lower() for w in open("/path/to/file").read().split()])
count = [[item[0], len(list(item[1]))] for item in groupby(words)]
count.sort(key=lambda x: x[1], reverse = True)
for item in count[:5]:
print(*item)
This will list the file's words, sort them and list unique words and their occurrence. Subsequently, the found list is sorted by occurrence by:
count.sort(key=lambda x: x[1], reverse = True)
The reverse = True is to list the most common words first.
In the line:
for item in count[:5]:
[:5] defines the number of most occurring words to show.

First method as others have suggested i.e. by using most_common(...) doesn't work according to your needs cause it returns the nth first most common words and not the words whose count is less than or equal to n:
Here's using most_common(...): note it just print the first nth most common words:
>>> import re
... from collections import Counter
... def print_top(filename, max_count):
... words = re.findall(r'\w+', open(filename).read().lower())
... for word, count in Counter(words).most_common(max_count):
... print word, count
... print_top('n.sh', 1)
force 1
The correct way would be as follows, note it prints all the words whose count is less than equal to count:
>>> import re
... from collections import Counter
... def print_top(filename, max_count):
... words = re.findall(r'\w+', open(filename).read().lower())
... for word, count in filter(lambda x: x[1]<=max_count, sorted(Counter(words).items(), key=lambda x: x[1], reverse=True)):
... print word, count
... print_top('n.sh', 1)
force 1
in 1
done 1
mysql 1
yes 1
egrep 1
for 1
1 1
print 1
bin 1
do 1
awk 1
reinstall 1
bash 1
mythtv 1
selections 1
install 1
v 1
y 1

Here is my python3 solution. I was asked this question in an interview and the interviewer was happy this solution, albeit in a less time-constrained situation the other solutions provided above seem a lot nicer to me.
dict_count = {}
lines = []
file = open("logdata.txt", "r")
for line in file:# open("logdata.txt", "r"):
lines.append(line.replace('\n', ''))
for line in lines:
if line not in dict_count:
dict_count[line] = 1
else:
num = dict_count[line]
dict_count[line] = (num + 1)
def greatest(words):
greatest = 0
string = ''
for key, val in words.items():
if val > greatest:
greatest = val
string = key
return [greatest, string]
most_common = []
def n_most_common_words(n, words):
while len(most_common) < n:
most_common.append(greatest(words))
del words[(greatest(words)[1])]
n_most_common_words(20, dict_count)
print(most_common)

Comparing occurrences of characters in strings

code
def jottoScore(s1,s2):
n = len(s1)
score = 0
sorteds1 = ''.join(sorted(s1))
sorteds2 = ''.join(sorted(s2))
if sorteds1 == sorteds2:
return n
if(sorteds1[0] == sorteds2[0]):
score = 1
if(sorteds2[1] == sorteds2[1]):
score = 2
if(sorteds2[2] == sorteds2[2]):
score = 3
if(sorteds2[3] == sorteds2[3]):
score = 4
if(sorteds2[4] == sorteds2[4]):
score = 5
return score
print jottoScore('cat', 'mattress')
I am trying to write a jottoScore function that will take in two strings and return how many character occurrences are shared between two strings.
I.E jottoScore('maat','caat') should return 3, because there are two As being shared and one T being shared.
I feel like this is a simple enough independent practice problem, but I can't figure out how to iterate over the strings and compare each character(I already sorted the strings alphabetically).

If you are on Python2.7+ then this is the approach I would take:
from collections import Counter
def jotto_score(str1, str2):
count1 = Counter(str1)
count2 = Counter(str2)
return sum(min(v, count2.get(k, 0)) for k, v in count1.items())
print jotto_score("caat", "maat")
print jotto_score("bigzeewig", "ringzbuz")
OUTPUT
3
4

in case they are sorted and the order matters:
>>> a = "maat"
>>> b = "caat"
>>> sum(1 for c1,c2 in zip(a,b) if c1==c2)
3

def chars_occur(string_a, string_b):
list_a, list_b = list(string_a), list(string_b) #makes a list of all the chars
count = 0
for c in list_a:
if c in list_b:
count += 1
list_b.remove(c)
return count
EDIT: this solution doesn't take into account if the chars are at the same index in the string or that the strings are of the same length.

A streamlined version of #sberry answer.
from collections import Counter
def jotto_score(str1, str2):
return sum((Counter(str1) & Counter(str2)).values())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to iterate over string after using split function - python

Related

Python: how to replace a substring with a number of its occurences?

How do I count if a string is within the substrings of a list?

Python | find method

Trying to output the x most common words in a text file

Comparing occurrences of characters in strings

Categories

Resources