Counting three letter acronyms in a line with Regex Python [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I need to make a program in python which looks through a given file. Let's say acronyms.txt, and then returns a percentage value of how many lines contain at least 1 three letter acronym.
For example:
NSW is a very large state.
It's bigger than TAS.
but WA is the biggest!
After reading this it should return 66.7% as 66.7% of the lines contain a three letter acronym. It is also rounded to the first decimal place as you can see. I am not very familiar with regex but I think it would be simplest with regex.
EDIT:
I have finished the code but i need it to recognize acronyms with dots between them, EG N.S.W should be recognized as an acronym. How do i do this?
Any help would be appreciated!

You can do:
import re
cnt = 0
with open('acronyms.txt') as myfile:
lines = myfile.readlines()
length = len(lines)
for line in lines:
if re.search(r'\b[A-Z]{3}\b', line) is not None:
cnt += 1
print("{:.1f}%".format(cnt/length*100))
r'[A-Z]{3}' matches three (and only three) capital letters in a row. If a search is found, then we add a count.
Then we simply do the count divided by the length of lines, and print the result as you have shown.

You can do something like:
total_lines = 0
matched_lines = 0
for line in open("filename"):
total_lines += 1
matched_lines += bool(re.search(r"\b[A-Z]{3}\b", line))
print "%f%%" % (float(matched_lines) / total_lines * 100)
Note '\b' in search pattern -- it matches empty string in beginning or end of word. It helps you to prevent unwanted matches with acronyms longer than 3 ('asdf ASDF asdf') or with acronyms inside word ('asdfASDasdf').

Related

Counting words that start with capital letter on python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
**im a newbie here. Im kinda working on a program that counts words that starts with capital letter, per line inside a csv file using python. I used regex but i think it doesnt work. Here is the sample code that i made but unfortunately it doesnt give the output that i want. hope you could help me.
**
import re
line_details = []
result = []
count = 0
total_lines = 0
class CapitalW(): #F8 Word that starts with capital letter count
fh = open(r'20items.csv', "r", encoding = "ISO-8859-1").read()
#next(fh)
for line in fh.split("n"):
total_lines += 1
for line in re.findall('[A-Z]+[a-z]+$', fh):
count+=1
line_details.append("Line %d has %d Words that start with capital letter" %
(total_lines, count))
for line in line_details:
result7 = line
print (result7)
**- result should be as follows:
Line 1 has 2 Words that start with capital letter
Line 2 has 5 Words that start with capital letter
Line 3 has 1 Words that start with capital letter
Line 4 has 10 Words that start with capital letter**
In the regex you doens't need the $ character beacause [A-Z]+[a-z]+$ matches only if there is one word in the line. So [A-Z]+[a-z]+ instead.
The other, is, that I see from the encoding, that you maybe use characters what are not between a-z for example é. So you maybe have to add these also to the pattern. [A-ZÉÖ]+[a-zéö]+ and add all the other special characters.
Assuming a fixed indentation and in addition to matebende's answer, these are the required further corrections:
for line in fh.split("n"): is supposed to be for line in fh.split("\n"):.
The initialization count = 0 has to be inside this for loop.
The fh in for line in re.findall('[A-Z]+[a-z]+$', fh): is wrong and has to be line.

Checking whether last element of list is digit [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have a function generating random character [a-Z0-9] with whitespaces and appending each character to list:
words = []
while words.count(' ') < 10:
if len(words) == 0:
# append character
else:
if words[-1].isdigit(): # checking if last character is digit
# append only digit or whitespacce
else:
# append character
As you can see if last (previous) character was digit, I try to append digit or whitespace only, otherwhise append any character. The problem is, when I run the code I get error below:
'NoneType' object has no attribute 'isdigit' for line if words[-1].isdigit():. What I do wrong and why there is None instead of str?
Just make words string:
(It's biased towards chars, because whitespace is not digit, and hence it can turn to chars from digits, but not other way around- which is what you wanted, if I got you right)
import random
sample_digit=" 0123456789"
sample_char=" abcdefghi"
sample_any=sample_digit+sample_char
words = ""
while words.count(' ') <10:
if len(words) == 0:
words+=sample_any[random.randint(0, len(sample_any)-1)]
else:
if words[-1].isdigit():
words+=sample_digit[random.randint(0, len(sample_digit)-1)]
else:
words+=sample_char[random.randint(0, len(sample_char)-1)]
print(words)
A simple way of testing whether a character is a digit:
if words[-1] in '0123456789':
...

Python code that search for text and copy it to the next line [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I python code to read lines in a text file and to copy text between specific characters. For example, text between _ _.
Input
./2425/1/115_Lube_45484.jpg 45484
./2425/1/114_Spencerian_73323.jpg 73323
Output
./2425/1/115_Lube_45484.jpg 45484
Lube
./2425/1/114_Spencerian_73323.jpg 73323
Spencerian
Any suggestions?
Instead of regex i would use build in: split()
input = './2425/1/114_Spencerian_73323.jpg 73323'
output = input.split('_')[1]
print(output)
Of course if every line has double _ in input string
Try this:
import re
for line in your_text.splitlines():
result = re.match("_(.*)_", your_text)
print(match.group(0))
print(match.group(1))
Where your_text is a string containing your example as above.
test = './2425/1/114_Spencerian_73323.jpg_abc_ 73323'
result = test.split("_",1)[1].split("_")[0]
print(result)
.split('',1) splits the string in 2 parts i-e: 0 index will be left substring of '' and 1 index will be right substring of string. We again split the right part of string with '_' so that the text between _ will be extracted.
Note : this will be helpful only when there is single occurence of text between _ like test. It wont extract text if there exist this case multiple times in a string
Solved.
file_path = "text_file.txt"
with open(file_path) as f:
line = f.readline()
count= 1
while line:
print(line,line.split('_')[1])
line = f.readline()
count+= 1
Thank you all

Python script search a text file for a word [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm writing a Python script. I need to search a text file for a word that end by " s , es or ies " and the word must be greater than three letters , need to konw number of words and the word it-self .....it's hard task i cant work with it, please help me
I agree with the comment that you need to go work on the basics. Here are some ideas to get you started.
1) You say "search a file." Open a file and read line by line like this:
with open ('myFile.txt', 'r') as infile:
for line in infile:
# do something to each line
2) You probably want to store each line in a data structure, like a list:
# before you open the file...
lines = []
# while handling the file:
lines.append(line)
3) You'll need to work with each word. look into the 'split' function of lists.
4) You'll need to look at individual letters of each word. Look into 'string slicing.'
All said and done, you can probably do this with 10 - 15 lines of code.
Try to divide the task into different tasks if it feels overwhelming.
The following code is by no means good, but hopefully it is clear enough so you can get the point.
1 First you need to get your text. If your text is in a file in your computer you need to put it into something that python can use.
# this code takes the content of "text.txt" and store it into my_text
with open("text.txt") as file:
my_text = file.read()
2 Now you need to work with every individual word. All your words are together in a string called my_text, and you would like them separated (split) into a list so you can work with them individually. Usually words are separated by spaces, so that's what you use to separate them:
# take the text and split it into words
my_words = my_text.split(" ")
3 I don't know exactly what you want, but let's suppose you want to store separately the words in different lists. Then you will need those lists:
# three list to store the words:
words_s = []
words_es = []
words_ies = []
4 Now you need to iterate through the words and do stuff with them. For that the easiest thing to do is to use a for loop:
#iterate through each word
for word in my_words:
# you're not interested in short words:
if len(word) <= 3:
continue # this means: do nothing with this word
# now, if the word's length is greater than 3, you classify it:
if word.endswith("ies"):
words_ies.append(word) # add it to the list
if word.endswith("es"):
words_es.append(word) # add it to the list
if word.endswith("s"):
words_s.append(word) # add it to the list
4 Finally, outside the for loop, you can print the list of words and also get the length of the list:
print(words_s)
print(len(words_s))
Something that you need to consider is if you want the words repeated or not. Note that the condition 'word that end by "s", "es" or "ies"' is equivalent to 'word that end by "s"'. The code above will get the words distributed in different lists redundantly. If a word ends with "ies" it also ends with "es" and "s", so it'll be stored in the three lists. If you want to avoid overlapping, you can substitute the if statements by else if statements.
Keep learning the basics as other answers suggest and soon you'll be able to understand scary code like this :D
with open("text.txt") as myfile:
words = [word for word in myfile.read().split(" ") if word.endswith("s") and len(word) > 3]
print("There are {} words ending with 's' and longer than 3".format(len(words)))

Getting a word with mid frequency [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I have a wordlist containing numbers, English Words, and Bengali words in a column and in other column I have their frequencies. These columns have no headers. I need the words with frequencies between 5- 300. This is the code I am using. It is not working.
wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")
for word in wordlist:
if word[1] >= 3
print(word[0])
elif word[1] <= 300
print(word[0])
This is giving me a syntax error.
File "<stdin>", line 2
if word[1] >= 3
^
SyntaxError: invalid syntax
Can anyone please help?
You should add : after your if statements to fix this SyntaxError:
wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")
for word in wordlist:
if word[1] >= 3:
print word[0]
elif word[1] <= 300:
print word[0]
Read this:
https://docs.python.org/2/tutorial/controlflow.html
Also here it is one useful tip: when python gives you SyntaxError for some line, always look at the previous line, then at the following one.
There are few problems with your code, I add full explanation in an hour and so. See how it should look like and consult docs in the meantime:
First, it is safer to use with open() clause for opening files (see https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects)
filepath = 'C:/Python27/bengali_wordlist_full.txt'
with open(filepath) as f:
content = f.read().decode('string-escape').decode("utf-8")
# do you really need all of this decdcoding?
Now content holds text from file: this is one, long string, with '\n' characters to mark endlines. We can split it to list of lines:
lines = content.splitlines()
and parse one line at the time:
for line in lines:
try:
# split line into items, assign first to 'word', second to 'freq'
word, freq = line.split('\t') # assuming you have tab as separator
freq = float(freq) # we need to convert second item to numeric value from string
if 5 <= freq <= 300: # you can 'chain' comparisons like this
print word
except ValueError:
# this happens if split() gives more than two items or float() fails
print "Could not parse this line:", line
continue

Categories

Resources