The dataframe column contains few words with repetitive letters. I want to remove words that are entirely made up of same letters from the dataframe column and keep the first occurrence of the letter in other cases where the letters repeat more than 2 times consecutively.
df-
id text
1 aaaa
2 bb
3 wwwwwwww
4 Hellooooo
5 See youuuu
Output
id text
1
2
3
4 Hello
5 See you
if you don't like regex same as me, you can go old school, it might be inefficient but you will get the idea
s = 'Seee youuuu sooooon'
def word_precess(s):
c = ''
flag = ''
counter = 0
for i, letter in enumerate(s):
if letter == flag:
counter += 1
if counter > 2:
continue # start the loop from beginning
else:
flag = letter
counter = 1
c = c + letter
return c
print(word_precess(s))
output>>>
See youu soon
note: the result has "Youu" with double "u" and is not "See You soon" simply because i couldn't make the script understand words meaning.
Related
I have the following set of instructions:
Create a variable to store the given string "You can have data without information, but you cannot have information without data."
Convert the given string to lowercase
Create a list containing every lowercase letter of the English alphabet
for every letter in the alphabet list:
Create a variable to store the frequency of each letter in the string and assign it an initial value of zero
for every letter in the given string:
if the letter in the string is the same as the letter in the alphabet list
increase the value of the frequency variable by one.
if the value of the frequency variable does not equal zero:
print the letter in the alphabet list followed by a colon and the value of the frequency variable
I am currently stuck in the Bold points.
So far, my code looks as follows:
import string
sentence = "You can have data without information, but you cannot have information without data."
sentence = sentence. Lower()
alphabet_string = string.ascii_lowercase
alphabet = list(alphabet_string)
for i in alphabet:
frequency = {i: 0}
for i in sentence:
if i in frequency. Keys():
frequency[i] = frequency[i] + 1
The thing you are looking for is an extra condition statement for key - value pairs that have non zero values:
import string
sentence = "You can have data without information, but you cannot have information without data."
sentence = sentence.lower()
alphabet_string = string.ascii_lowercase
alphabet = list(alphabet_string)
for i in alphabet:
frequency = {i: 0}
for j in sentence:
if j in frequency.keys():
frequency[i] = frequency[i] + 1
if frequency[i] != 0:
print(i, ',', frequency[i])
Outputs:
a , 10
b , 1
c , 2
d , 2
e , 2
f , 2
h , 4
i , 6
m , 2
n , 7
o , 9
r , 2
t , 10
u , 5
v , 2
w , 2
y , 2
As has already been pointed out, collections.Counter is ideal for this.
However, your specification states that you're only interested in lowercase letters and the sentence contains ',', '.' and ' '
So there are two approaches... Use the traditional Counter then ignore values returned that don't fit with your rules. The other option is to write a custom class that handles those rules internally such that the output will only reveal lowercase ASCII letters having ignored irrelevant characters.
So, here's an idea:
class MyCounter:
def __init__(self, iterable):
self._iterable = iterable
self._result = None
def items(self):
if self._result is None:
self._result = {}
_alphabet = set('abcdefghijklmnopqrstuvwxyz')
for v in self._iterable:
if v in _alphabet:
self._result[v] = self._result.get(v, 0) + 1
return self._result.items()
sentence = "You can have data without information, but you cannot have information without data."
for k, v in MyCounter(sentence.lower()).items():
print(f'{k}:{v}')
Output:
y:2
o:9
u:5
c:2
a:10
n:7
h:4
v:2
e:2
d:2
t:10
w:2
i:6
f:2
r:2
m:2
b:1
As simple as adding if frequency[i] != 0: print(f'{i}: {frequency[i]}')
Since != 0 consider as truthy condition, you can write like this:
import string
sentence = "You can have data without information, but you cannot have information without data."
sentence = sentence.lower()
alphabet = list(string.ascii_lowercase)
for i in alphabet:
frequency = {i: 0}
for j in sentence:
if j in frequency: frequency[i] += 1
if frequency[i]: print(f'{i}: {frequency[i]}')
You can also change frequency.keys() to frequency and change frequency[i] = frequency[i] + 1 to frequency[i] += 1
Specification: Read in characters until the user enters a full stop '.'. Show the number of lowercase vowels.
So far, I've succeeded in completing the read loop and printing out 'Vowel Count: '. However, vowel count always comes to 0.
I've just started. I'm struggling with placement for 'Show the number of lowercase vowels'
Should I define vowels = ... at the top? Or put it in a loop later? Do I create a new loop? I haven't been able to make it work.
Thanks
c = str(input('Character: '))
count = 0
while c != '.':
count += 1
c = str(input('Character: '))
print("Vowel count =", count)
Here is the solution of your problem. You should add 2 conditions in your code. You check If the character is vowel and if is lowercase:
c = str(input('Character: '))
count = 0
while c != '.':
if c in 'aeyio' and c.islower():
count += 1
c = str(input('Character: '))
print("Vowel count =", count)
In while Loop you can add if condition to check if letter is in lower case using string method islower() and is letter is vowel. if both condition satisfy then increase count by 1.
You don't have to convert input string into str type since input string is always in string type.
Also, outside while loop you can declare variable as empty string.
c =''
count = 0
while c != '.':
c = input('Character: ')
if c.islower() and c in ('a','e','i','o','u'):
count += 1
print("Vowel count =", count)
Lowercase Vowels correspond to a set of numbers in the ascii table, same case applies to the '.'
c = str(input('Character: '))
count = 0
lowerCaseVowel = [97, 101, 105, 111, 117]
for _ in c:
# check for .
if ord(_) == 46:
break
if ord(_) in lowerCaseVowel:
count += 1
print("Vowel count = ", count)
For my code, I have to make a function that counts the number of vowels in the odd positions of a string.
For example, the following will produce an output of 2.
st = "xxaeixxAU"
res = countVowelsOdd(st)
print (res)
For my code, the only problem I have is figuring out how to tell python to count the vowels in the ODD positions.
This is found in the second part of the "if statement" in my code where I tried to make the index odd by putting st[i] %2 == 1. I get all types of errors trying to fix this.
Any idea how to resolve this?
def countVowelsOdd(st):
vowels = "aeiouAEIOU"
count = 0
for i, ch in enumerate(st):
if i in vowels and st[i] % 2 == 1:
count += 1
return count
if i in vowels ...
i is the index, you want the letter
if ch in vowels ...
and then since you have the index, that is what you find the modulo on
if ch in vowels and i % 2 == 1:
enumerate provides you first argument i as position.
def countVowelsOdd(st):
vowels = "aeiouAEIOU"
count = 0
for i, ch in enumerate(st):
if ch in vowels and i % 2 == 1:
count += 1
return count
I don't know if your assignment/project precludes the use of regex, but if you are open to it, here is one option. We can first do a regex replacement to remove all even-positioned characters from the input. Then, do a second replacement to remove all non-vowel characters. Finally, what remains gives us correct vowel count.
st = "xxaeixxAU"
st = re.sub(r'(.).', '\\1', st)
print(st)
st = re.sub(r'[^aeiou]', '', st, flags=re.IGNORECASE)
print(len(st))
This prints:
xaixU
3
Please, have a look at this
In [1]: a = '01234567'
In [2]: print(*(c for c in a[0::2]))
0 2 4 6
In [3]: print(*(c for c in a[1::2]))
1 3 5 7
In [4]: print(*(c in '12345' for c in a[1::2]))
True True True False
In [5]: print(sum(c in '12345' for c in a[1::2]))
3
does it help with your problem?
I have a document of user survey:
Score Comment
8 Rapid bureaucratic affairs. Reports for policy...
4 There needs to be communication or feed back f...
7 service is satisfactory
5 Good
5 There is no
10 My main reason for the product is competition ...
9 Because I have not received the results. And m...
5 no reason
I want to determine which keywords correspond to a higher score, and which keywords correspond to a lower score.
My idea is to construct a table of the words (or, a "word vector" dictionary), which will contain the scores it is associated with, and the number of times that score has been associated with that sentence.
Something like the following:
Word Score Count
Word1: 7 1
4 2
Word2: 5 1
9 1
3 2
2 1
Word3: 9 3
Word4: 8 1
9 1
4 2
... ... ...
Then, for each word, the average score is average of all the scores that word is associated with.
To do this, my code is the following:
word_vec = {}
# col 1 is the word, col 2 is the score, col 3 is the number of times it occurs
for i in range(len(data)):
sentence = data['SurveyResponse'][i].split(' ')
for word in sentence:
word_vec['word'] = word
if word in word_vec:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':(word_vec[word]['NumberOfTimes'] += 1)}
else:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':1}
But this code gives me the following error:
File "<ipython-input-144-14b3edc8cbd4>", line 9
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':(word_vec[word]['NumberOfTimes'] += 1)}
^
SyntaxError: invalid syntax
Could someone please show me the correct way to do this?
Try this piece of code
word_vec = {}
# col 1 is the word, col 2 is the score, col 3 is the number of times it occurs
for i in range(len(data)):
sentence = data['SurveyResponse'][i].split(' ')
for word in sentence:
word_vec['word'] = word
if word in word_vec:
word_vec[word]['Score'] += data['SCORE'][i] # Keep accumulating the total score for each word, would be easier to find the average score later on
word_vec[word]['NumberOfTimes'] += 1
else:
word_vec[word] = {'Score':data['SCORE'][i], 'NumberOfTimes':1}
To increment the value of 'NumberOfTimes', you can directly increment like this word_vec[word]['NumberOfTimes'] += 1
You may use the collection Counter. It permits to count the number of occurences of each word.
Here an example:
from collections import Counter
c = Counter(["jsdf","ijoiuj","je","oui","je","non","oui","je"])
print(c)
Result:
Counter({'je': 3, 'oui': 2, 'ijoiuj': 1, 'jsdf': 1, 'non': 1})
You extract words from the document and put them into a list. Finally that list will be treated by the Counter to count the occurence of each word.
I'm just getting into Python and I'm building a program that analyzes a group of words and returns how many times each letter appears in the text. i.e 'A:10, B:3, C:5...etc'. So far it's working perfectly except that i am looking for a way to condense the code so i'm not writing out each part of the program 26 times. Here's what I mean..
print("Enter text to be analyzed: ")
message = input()
A = 0
b = 0
c = 0
...etc
for letter in message:
if letter == "a":
a += 1
if letter == "b":
b += 1
if letter == "c":
c += 1
...etc
print("A:", a, "B:", b, "C:", c...etc)
There are many ways to do this. Most use a dictionary (dict). For example,
count = {}
for letter in message:
if letter in count: # saw this letter before
count[letter] += 1
else: # first time we've seen this - make its first dict entry
count[letter] = 1
There are shorter ways to write it, which I'm sure others will point out, but study this way first until you understand it. This sticks to very simple operations.
At the end, you can display it via (for example):
for letter in sorted(count):
print(letter, count[letter])
Again, there are shorter ways to do this, but this way sticks to very basic operations.
You can use Counter but #TimPeters is probably right and it is better to stick with the basics.
from collections import Counter
c = Counter([letter for letter in message if letter.isalpha()])
for k, v in sorted(c.items()):
print('{0}: {1}'.format(k, v))