How do I implement word count in Python? - python

x="I use computers"
print (x)
y=x[0:1]
y1=x[2:5]
y2=x[6:15]
n=(y+y1+y2)
print len(n)
I know this counts the number of letters but how do I count the number of words in the sentence?

If you're just interested in counting the words, not in splitting the string into words, split() does unnecessary work. By counting the number of spaces and adding one, you get the number of words much faster. Though this does assume that all words are separated by a single space, not more.
Proof:
>>>import timeit
>>> timeit.timeit("len(x.split())", setup='x="I use computers"' , number=10**6)
0.28843931717636195
>>> timeit.timeit("x.count(' ')+1", setup='x="I use computers"' , number=10**6)
0.19020372901493232

Try this piece of code
x = "I use computers"
print len(x.split())

Related

Most frequent words from text excluding words that are of certain length

I'm having I'm having trouble with finding a solution to my problem, maybe someone will be able to help. I have a poem and I'm able to display the most common words, although I want all strings that are less than 5 characters long to not be displayed in my lets say top 20 most common list.
import collections
import re
words = re.findall(r'\w+', open('some_poem.txt').read().lower())
most_common = collections.Counter(words).most_common(20)
print(most_common)
Is there a short and clean way to add such functionality? To not display strings that are 5 characters or less? Thanks in advance
A Counter is just a dictionary, so we can use a dict comprehension to filter the results we need:
{ k: v for k, v in most_common.items() if v > 5 }
If by "shorter than 5" you mean "less than 5 characters long" you can just change your regex to not return those words in the first place, using {5,} (five or more) instead of + (one or more):
words = re.findall(r'\w{5,}', open('some_poem.txt').read().lower())
The following is not what you asked for, but depending on what you really want, it might be more useful, to get a set of stop words and filter those from the list of words, since there may very well be "relevant" words with fewer than five letters, and irrelevant ones with more.
stop_words = set("a,able,about,across,...,you,your".split(","))
words = re.findall(r'\w+', open('some_poem.txt').read().lower())
words = [word for word in words if word not in stop_words]
Also, just for completeness, as noted in comments, you should make it a habit to use with to open files to make sure they are properly closed afterwards.
with open('some_poem.txt') as f:
words = re.findall(r'\w{5,}', f.read().lower())

How to attach a string to an integer in Python 3

I've got a two letter word that I'd like to attach to a double digit number. The word is an integer and the number is a string.
Say the name of the number is "number" and the name of the word is "word".
How would you make it print both of them together without spaces. When I try it right now it still has a space between them regardless of what I try.
Thanks !
'{}{}'.format(word, number)
For example,
In [19]: word='XY'
In [20]: number=123
In [21]: print('{}{}'.format(word, number))
XY123
The print function has a sep parameter that controls spacing between items:
print(number, word, sep="")
If you need a string, rather than printing, than unutbu's answer with string formatting is better, but this may get you to your desired results with fewer steps.
In python 3 the preferred way to construct strings is by using format
To print out a word and a number joined together you would use:
print("{}{}".format(word, number))

Adding prefix to string in a file

Well i have a sort of telephone directory in a .txt file,
what i want to do is find all the numbers with this pattern e.g. 829-2234 and append the number 5 to the beginning of the numbers.
so the result now becomes 5829-2234.
my code begins like this:
import os
import re
count=0
#setup our regex
regex=re.compile("\d{3}-\d{4}\s"}
#open file for scanning
f= open("samplex.txt")
#begin find numbers matching pattern
for line in f:
pattern=regex.findall(line)
#isolate results
for word in pattern:
print word
count=count+1 #calculate number of occurences of 7-digit numbers
# replace 7-digit numbers with 8-digit numbers
word= '%dword' %5
well i don't really know how to append the prefix 5 and then overwrite the 7-digit number with 7-digit number with 5 prefix. I tried a few things but all failed :/
Any tip/help would be greatly appreciated :)
Thanks
You're almost there, but you got your string formatting the wrong way. As you know that 5 will always be in the string (because you're adding it), you do:
word = '5%s' % word
Note that you can also use string concatenation here:
word = '5' + word
Or even use str.format():
word = '5{}'.format(word)
If you're doing it with regex then use re.sub:
>>> strs = "829-2234 829-1000 111-2234 "
>>> regex = re.compile(r"\b(\d{3}-\d{4})\b")
>>> regex.sub(r'5\1', strs)
'5829-2234 5829-1000 5111-2234 '

Using list comprehension and sets

Create and print a list of words for which both the following criteria are all met:
the word is at least 8 characters long;
the word formed from the odd-numbered letter is in the set of lower-case words; and
the word formed from the even-numbered letters is in the set of lower-case words.
For example, the word "ballooned" should be included in your list because the word formed from the odd-numbered letters, "blond", and the word formed from the even-numbered letters, "aloe", are both in the set of lower-case words. Similarly, "triennially" splits into "tinily" and "renal", both of which are in the word list.
My teacher told us we should use a set: s=set(lowers) because this would be faster.
what i have so far:
s=set(lowers)
[word for word in lowers if len(word)>=8
and list(word)(::2) in s
and list(word)(::-2) in s]
I do not think I am using the set right. can someone help me get this to work
The problem is that you cast word to a list (unnecessary), your slices are not in brackets (you used parenthesis), and your second slice uses the wrong indices (should be 1::2, not ::-2).
Here are the slices done correctly:
>>> word = "ballooned"
>>> word[::2]
'blond'
>>> word[1::2]
'aloe'
Note that s is an odd name for a collection of lowercase words. A better name would be words.
Your use of set is correct. The reason your teacher wants you to use a set is it is much faster to test membership of a set than it is for a list.
Putting it together:
words = set(lowers)
[word for word in words if len(word) >= 8
and word[::2] in words
and word[1::2] in words]
Here is a quick example of how to structure your condition check inside of the list comprehension:
>>> word = 'ballooned'
>>> lowers = ['blond', 'aloe']
>>> s = set(lowers)
>>> len(word) >= 8 and word[::2] in s and word[1::2] in s
True
edit: Just realized that lowers contains both the valid words and the "search" words like 'ballooned' and 'triennially', in any case you should be able to use the above condition inside of your list comprehension to get the correct result.
list(word)(::2)
First, the syntax to access index ranges is using squared parentheses, also, you don’t need to cast word to a list first, you can directly do that on the string:
>>> 'ballooned'[::2]
'blond'
Also, [::-2] won’t give you the uneven word, but a reversed version of the other one. You need to use [1::2] (i.e. skip the first, and then every second character):
>>> 'ballooned'[::-2]
'dnolb'
>>> 'ballooned'[1::2]
'aloe'
In general it is always a good idea to test certain parts separately to see if they really do what you think they do.
this should do it:
s=set(lowers)
[word for word in lowers if len(word)>=8 and word[::2] in s and word[1::2] in s]
or using all():
In [166]: [word for word in lowers if all((len(word)>=8,
word[::2] in s,
word[1::2] in s))]
use [::] not (::) and there's no need of list() here, plus to get the word formed by letters placed at odd position use [1::2].
In [151]: "ballooned"[::2]
Out[151]: 'blond'
In [152]: "ballooned"[1::2]
Out[152]: 'aloe'

Need to divide string inside list comprehension

Asked of me: By filtering the lowers list, create and print a list of the words for which the first half of the word matches the second half of the word. Examples include "bonbon", "froufrou", "gaga", and "murmur".
What i have so far:
[word for word in lowers if list.(word)(1:len(word)/2:)==list.(word)(len(word)/2::)]
Im not sure how to make word a list so I can only use certain characters for this filter. I know this will not work but its my logic so far.
Logical Error: You're slicing from index 1 instead of 0 in list.(word)(1:len(word)/2:)
Syntax Errors: list.(word) is incorrect syntax, and list slicing uses [ ] not ( ). Simply use word[start:stop] to break it up
Use:
[word for word in lowers if word[:len(word)//2]==word[len(word)//2:]]
Edit: Thanks to Ignacio Vazquez-Abrams' comment - Use integer division ( // operator ) for Python 3 compatibility
Try this:
[word for word in lowers if word[len(word)//2:] == word[:len(word)//2]]

Categories

Resources